Lab 5: Wide and Deep Networks

Prince Ndhlovu & Kirby Cravens

1. Data Preparation

In [833]:
import pandas as pd
import numpy as np
import warnings
%matplotlib inline
warnings.filterwarnings("ignore")

data_file = "/Users/princendhlovu/Downloads/dataset-of-10s.csv"

RawData = pd.read_csv(data_file)
RawData.head(5)
Out[833]:
track artist uri danceability energy key loudness mode speechiness acousticness instrumentalness liveness valence tempo duration_ms time_signature chorus_hit sections target
0 Wild Things Alessia Cara spotify:track:2ZyuwVvV6Z3XJaXIFbspeE 0.741 0.626 1 -4.826 0 0.0886 0.02000 0.000 0.0828 0.706 108.029 188493 4 41.18681 10 1
1 Surfboard Esquivel! spotify:track:61APOtq25SCMuK0V5w2Kgp 0.447 0.247 5 -14.661 0 0.0346 0.87100 0.814 0.0946 0.250 155.489 176880 3 33.18083 9 0
2 Love Someone Lukas Graham spotify:track:2JqnpexlO9dmvjUMCaLCLJ 0.550 0.415 9 -6.557 0 0.0520 0.16100 0.000 0.1080 0.274 172.065 205463 4 44.89147 9 1
3 Music To My Ears (feat. Tory Lanez) Keys N Krates spotify:track:0cjfLhk8WJ3etPTCseKXtk 0.502 0.648 0 -5.698 0 0.0527 0.00513 0.000 0.2040 0.291 91.837 193043 4 29.52521 7 0
4 Juju On That Beat (TZ Anthem) Zay Hilfigerrr & Zayion McCall spotify:track:1lItf5ZXJc1by9SbPeljFd 0.807 0.887 1 -3.892 1 0.2750 0.00381 0.000 0.3910 0.780 160.517 144244 4 24.99199 8 1
In [834]:
# drop the track, artist and uri columns
myData = RawData.drop(columns=['track','artist','uri'])
myData.head(5)
Out[834]:
danceability energy key loudness mode speechiness acousticness instrumentalness liveness valence tempo duration_ms time_signature chorus_hit sections target
0 0.741 0.626 1 -4.826 0 0.0886 0.02000 0.000 0.0828 0.706 108.029 188493 4 41.18681 10 1
1 0.447 0.247 5 -14.661 0 0.0346 0.87100 0.814 0.0946 0.250 155.489 176880 3 33.18083 9 0
2 0.550 0.415 9 -6.557 0 0.0520 0.16100 0.000 0.1080 0.274 172.065 205463 4 44.89147 9 1
3 0.502 0.648 0 -5.698 0 0.0527 0.00513 0.000 0.2040 0.291 91.837 193043 4 29.52521 7 0
4 0.807 0.887 1 -3.892 1 0.2750 0.00381 0.000 0.3910 0.780 160.517 144244 4 24.99199 8 1
In [835]:
myData.describe()
Out[835]:
danceability energy key loudness mode speechiness acousticness instrumentalness liveness valence tempo duration_ms time_signature chorus_hit sections target
count 6398.000000 6398.000000 6398.000000 6398.000000 6398.000000 6398.000000 6398.000000 6398.000000 6398.000000 6398.000000 6398.000000 6.398000e+03 6398.000000 6398.000000 6398.000000 6398.000000
mean 0.568163 0.667756 5.283526 -7.589796 0.645514 0.098018 0.216928 0.165293 0.196700 0.443734 122.353871 2.367042e+05 3.930916 41.028399 10.316505 0.500000
std 0.191103 0.240721 3.606216 5.234592 0.478395 0.097224 0.296835 0.318736 0.166148 0.245776 29.847389 8.563698e+04 0.377469 19.568827 3.776011 0.500039
min 0.062200 0.000251 0.000000 -46.655000 0.000000 0.022500 0.000000 0.000000 0.016700 0.000000 39.369000 2.985300e+04 0.000000 0.000000 2.000000 0.000000
25% 0.447000 0.533000 2.000000 -8.425000 0.000000 0.038825 0.008533 0.000000 0.096800 0.240000 98.091250 1.932068e+05 4.000000 28.059135 8.000000 0.000000
50% 0.588000 0.712500 5.000000 -6.096500 1.000000 0.057200 0.067050 0.000017 0.126000 0.434000 121.070000 2.212465e+05 4.000000 36.265365 10.000000 0.500000
75% 0.710000 0.857000 8.000000 -4.601250 1.000000 0.112000 0.311000 0.057650 0.249000 0.628000 141.085000 2.593165e+05 4.000000 48.292538 12.000000 1.000000
max 0.981000 0.999000 11.000000 -0.149000 1.000000 0.956000 0.996000 0.995000 0.982000 0.976000 210.977000 1.734201e+06 5.000000 213.154990 88.000000 1.000000
In [836]:
# create a data description table
data_des = pd.DataFrame()

data_des['Features'] = myData.columns
data_des['Descriptions']= ['How suitable a track is for dancing ',
                           'A perceptual measure of intensity and activity',
                           'The estimated overall key of the track',
                           'The overall loudness of a track in decibels',
                           'The modality (major or minor) of a track',
                           'The presence of spoken words in a track',
                           'Whether the track is acoustic',
                           'Predicts whether a track contains no vocals',
                           'The presence of an audience in the recording',
                           'Musical positiveness conveyed by a track',
                           'Beats per minute',
                           'The duration of the track in milliseconds',
                           'An estimated overall time signature of a track',
                           'Timestamp the third section of the track',
                           'The number of sections the particular track has',
                           'The target variable for the track']
data_des['Scales']= ['ratio','ratio','ordinal','ratio','nominal','ratio','ratio','ratio','ratio',
                     'ratio','ratio','ratio','ratio','ratio','ratio','nominal']
data_des['Discrete/Continuous'] = ['Continuous','Continuous','Discrete','Continuous','Discrete',
                                   'Continuous','Continuous','Continuous','Continuous','Continuous',
                                   'Continuous','Discrete','Discrete','Continuous','Discrete',
                                   'Discrete']
data_des['Range'] = ['0.062200-0.981000','0.000251-0.999000','0:C, 1:C#, 2:D, 3:Eb, 4:E, 5:F etc','-46.655000--0.149000','0 (Minor) and 1 (Major)',
                     '0.022500-0.956000','0-0.996000','0-0.995000','0.016700-0.982000','0-0.976000',
                     '39.369000-210.977000','29853-1734201','0-5','0-213.154990','2-88',
                     '0:flop, 1:hit']
data_des
Out[836]:
Features Descriptions Scales Discrete/Continuous Range
0 danceability How suitable a track is for dancing ratio Continuous 0.062200-0.981000
1 energy A perceptual measure of intensity and activity ratio Continuous 0.000251-0.999000
2 key The estimated overall key of the track ordinal Discrete 0:C, 1:C#, 2:D, 3:Eb, 4:E, 5:F etc
3 loudness The overall loudness of a track in decibels ratio Continuous -46.655000--0.149000
4 mode The modality (major or minor) of a track nominal Discrete 0 (Minor) and 1 (Major)
5 speechiness The presence of spoken words in a track ratio Continuous 0.022500-0.956000
6 acousticness Whether the track is acoustic ratio Continuous 0-0.996000
7 instrumentalness Predicts whether a track contains no vocals ratio Continuous 0-0.995000
8 liveness The presence of an audience in the recording ratio Continuous 0.016700-0.982000
9 valence Musical positiveness conveyed by a track ratio Continuous 0-0.976000
10 tempo Beats per minute ratio Continuous 39.369000-210.977000
11 duration_ms The duration of the track in milliseconds ratio Discrete 29853-1734201
12 time_signature An estimated overall time signature of a track ratio Discrete 0-5
13 chorus_hit Timestamp the third section of the track ratio Continuous 0-213.154990
14 sections The number of sections the particular track has ratio Discrete 2-88
15 target The target variable for the track nominal Discrete 0:flop, 1:hit
In [837]:
# find data type
print(myData.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6398 entries, 0 to 6397
Data columns (total 16 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   danceability      6398 non-null   float64
 1   energy            6398 non-null   float64
 2   key               6398 non-null   int64  
 3   loudness          6398 non-null   float64
 4   mode              6398 non-null   int64  
 5   speechiness       6398 non-null   float64
 6   acousticness      6398 non-null   float64
 7   instrumentalness  6398 non-null   float64
 8   liveness          6398 non-null   float64
 9   valence           6398 non-null   float64
 10  tempo             6398 non-null   float64
 11  duration_ms       6398 non-null   int64  
 12  time_signature    6398 non-null   int64  
 13  chorus_hit        6398 non-null   float64
 14  sections          6398 non-null   int64  
 15  target            6398 non-null   int64  
dtypes: float64(10), int64(6)
memory usage: 799.9 KB
None

There are no missing values, so we are going to check for duplicates.

In [838]:
#Find the duplicate instances 
index = myData.duplicated()

# find the number of duplicates
len(myData[index])
Out[838]:
139

Since there are 139 duplicates, we are going to drop them to improve our data quality since they could have been added due to human error

In [839]:
myData = myData.drop_duplicates()
idx = myData.duplicated()
len(myData[idx])
Out[839]:
0

We want to make those columns in array to_bin categorical. We want to be able to group songs by the their values in these columns. So, we are going to make the columns have a value of 1-10, so that it is easier to cross product each of them

In [840]:
to_bin = ['danceability','energy','speechiness','acousticness','instrumentalness','liveness','valence']
for idx,col in enumerate(to_bin):
    myData[col] = np.digitize(myData[col],bins=[0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1])

myData.describe()
Out[840]:
danceability energy key loudness mode speechiness acousticness instrumentalness liveness valence tempo duration_ms time_signature chorus_hit sections target
count 6259.000000 6259.000000 6259.000000 6259.000000 6259.000000 6259.000000 6259.000000 6259.000000 6259.000000 6259.000000 6259.000000 6.259000e+03 6259.000000 6259.000000 6259.000000 6259.000000
mean 6.190925 7.165362 5.275923 -7.573511 0.646110 1.501198 2.829525 2.479789 2.491293 4.950631 122.418008 2.358256e+05 3.930979 40.979194 10.290302 0.508707
std 1.926205 2.382752 3.607157 5.250005 0.478214 0.979719 2.838553 2.994407 1.688903 2.466385 29.921408 8.557350e+04 0.377157 19.558411 3.778070 0.499964
min 1.000000 1.000000 0.000000 -46.655000 0.000000 1.000000 1.000000 1.000000 1.000000 1.000000 39.369000 2.985300e+04 0.000000 0.000000 2.000000 0.000000
25% 5.000000 6.000000 2.000000 -8.381500 0.000000 1.000000 1.000000 1.000000 1.000000 3.000000 98.058000 1.928200e+05 4.000000 28.066490 8.000000 0.000000
50% 6.000000 8.000000 5.000000 -6.078000 1.000000 1.000000 1.000000 1.000000 2.000000 5.000000 121.189000 2.207810e+05 4.000000 36.246140 10.000000 1.000000
75% 8.000000 9.000000 8.000000 -4.603500 1.000000 2.000000 4.000000 1.000000 3.000000 7.000000 141.250500 2.579865e+05 4.000000 48.186325 12.000000 1.000000
max 10.000000 10.000000 11.000000 -0.149000 1.000000 10.000000 10.000000 10.000000 10.000000 10.000000 210.977000 1.734201e+06 5.000000 213.154990 88.000000 1.000000

Normalizing features in the array to_norm

In [841]:
from sklearn.preprocessing import StandardScaler

to_norm = ['loudness','tempo','duration_ms','chorus_hit','sections']

def normalize(df):
    result = df.copy()
    for feature in df.columns:
        max_val = df[feature].max()
        min_val = df[feature].min()
        result[feature] = (df[feature] - min_val)/(max_val - min_val) - 0.5
    return result

X = myData.copy()
X = X.drop(columns='target')
X[to_norm] = normalize(myData[to_norm]).astype(np.float32)

y = myData.target.astype(np.int)

X.head(10)
Out[841]:
danceability energy key loudness mode speechiness acousticness instrumentalness liveness valence tempo duration_ms time_signature chorus_hit sections
0 8 7 1 0.399432 0 1 1 1 1 8 -0.099902 -0.406920 4 -0.306775 -0.406977
1 5 3 5 0.187954 0 1 9 9 1 3 0.176658 -0.413734 3 -0.344335 -0.418605
2 6 5 9 0.362211 0 1 2 1 2 3 0.273251 -0.396964 4 -0.289395 -0.418605
3 6 7 0 0.380682 0 1 1 1 3 3 -0.194257 -0.404251 4 -0.361485 -0.441860
4 9 9 1 0.419516 1 3 1 1 4 8 0.205958 -0.432883 4 -0.382752 -0.430233
5 5 9 0 0.435578 1 1 1 1 5 8 0.232571 -0.391767 4 -0.349063 -0.383721
6 6 10 0 0.423558 1 2 1 1 2 5 0.086937 -0.363502 4 -0.401269 -0.360465
7 8 6 2 0.330753 1 2 1 1 2 4 -0.160983 -0.399942 4 -0.217528 -0.406977
8 2 10 7 0.441147 1 2 1 1 10 2 0.288751 -0.369197 4 -0.353460 -0.395349
9 4 8 8 0.380962 1 2 1 1 3 4 -0.271223 -0.368415 4 -0.390678 -0.418605
In [842]:
X['danceability'].unique()
Out[842]:
array([ 8,  5,  6,  9,  2,  4,  7,  3, 10,  1])
In [843]:
X['energy'].unique()
Out[843]:
array([ 7,  3,  5,  9, 10,  6,  8,  4,  1,  2])
In [844]:
X['acousticness'].unique()
Out[844]:
array([ 1,  9,  2,  5,  4,  8,  7, 10,  6,  3])
In [845]:
X['instrumentalness'].unique()
Out[845]:
array([ 1,  9, 10,  3,  7,  6,  8,  2,  4,  5])
In [846]:
X['valence'].unique()
Out[846]:
array([ 8,  3,  5,  4,  2,  1,  6, 10,  7,  9])
In [847]:
X['speechiness'].unique()
Out[847]:
array([ 1,  3,  2,  4,  7,  5,  8,  6, 10,  9])
In [848]:
X['liveness'].unique()
Out[848]:
array([ 1,  2,  3,  4,  5, 10,  6,  8,  9,  7])
In [849]:
y.unique()
Out[849]:
array([1, 0])

1.2 Cross Product Features

key, energy, valance: key measures the pitch of the track and often has to do with how upbeat, or how much energy, it has. Valence describes the musical positiveness and positive songs often correlates to key, as well.

danceability, liveness: Liveness detects the presence of an audience in a track. Having a crowd increases the chances of making someone want to dance

speechiness, acousticness, instrumentalness: Speechiness detects the presence of spoken words, acousticness determines if the track is acoustic or not, and instrumentalness determines if the track has any vocals. These three features all vocals and overall sound of the track, and thus should be crossed.

time signature, energy: time signature measures the beats per second of the track, and this heavily correlates to how much energy the track has.

1.3 Evaluation Criteria

For this data set, we are trying to predict if a song is going to be a hit or a flop. It is in the best interest of the artist to have this prior knowledge or prediction so that they know how to properly allocate resources for marketing their songs. If a song is going to flop they may discard it or spend less resources (money) marketing it whereas if its going to be a hit there has to be more financial resources in hand to be used for marketing the song so that it generates more in revenue sales. In our model we are trying to reduce and minimize the number of False Positives in which we predict a song to be a hit when it is going to flop causing the artist to lose a lot of money marketing a song which wont top the charts. We can afford to have False Negatives because the song or track can find its way to the top of the Bill Board charts and by then we would have noticed its potential and mobilised marketing resources to increase the reach. Our evaluation criteria would be precision since we cannot live with False Positives and it is given by: $ Precision(p) = \frac{True Positives}{True Positives + False Negatives} $

1.4 Splitting Data

In [850]:
#count the frequencies of classes
y.value_counts()
Out[850]:
1    3184
0    3075
Name: target, dtype: int64

From the target we note that we have almost an even balance of hit and non hit songs from the count of (1's and 0's). Therefor we are going use Stratified Split to split our data and scikit learn's train_test_split to divide the dataset into 80% training and 20% testing. Stratified Split would ensure that all classes are represented well during the training set and that no class is favoured over the other in our model. The train_test_split allows us to stratify the data by a column so that we split the data evenly.

In [851]:
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import StratifiedKFold, StratifiedShuffleSplit


X_train, X_test, y_train, y_test = train_test_split(X,y,stratify=y,test_size = 0.2)
X_train = pd.DataFrame(X_train)
X_train.columns = X.columns
X_test = pd.DataFrame(X_test)
X_test.columns = X.columns

2. Modelling

In [852]:
import tensorflow as tf
from tensorflow.keras.layers import Embedding
from tensorflow.keras.layers import concatenate
from tensorflow import keras
from tensorflow.keras.layers import Dense, Activation, Input
from tensorflow.keras.layers import Embedding, Flatten, Concatenate
from tensorflow.keras.models import Model
from sklearn.preprocessing import LabelEncoder
from functools import reduce



# possible crossing options:
#   'key','time_signature','danceability',
#   'energy','speechiness','acousticness',
#   'instrumentalness','liveness','valence'

cross_columns = [['key','time_signature','valence'],
                 ['danceability', 'energy','instrumentalness'],
                 ['speechiness','acousticness','liveness'],
#                  ['time_signature','energy']
                ]

# save categorical features
categorical_headers = ['key','time_signature']+to_bin

# cross each set of columns in the list above
cross_col_df_names = []
for cols_list in cross_columns:
    # encode as ints for the embedding
    enc = LabelEncoder()
    
    X_crossed_train = []
    X_crossed_test = []
    for row in X_train[cols_list].values:
        X_crossed_train.append(reduce((lambda x,y: x+y),row))
    for row in X_test[cols_list].values:
        X_crossed_test.append(reduce((lambda x,y: x+y),row))
    
    # get a nice name for this new crossed column
    cross_col_name = '_'.join(cols_list)
    
    # 2. encode as integers
#     enc.fit(np.hstack((X_crossed_train.to_numpy(),  X_crossed_test.to_numpy())))
    enc.fit(np.hstack((np.array(X_crossed_train),np.array(X_crossed_test))))
    
    # 3. Save into dataframe with new name
    X_train[cross_col_name] = enc.transform(X_crossed_train)
    X_test[cross_col_name] = enc.transform(X_crossed_test)
    
    # keep track of the new names of the crossed columns
    cross_col_df_names.append(cross_col_name)
In [853]:
# get crossed columns
X_train_crossed = X_train[cross_col_df_names].to_numpy()
X_test_crossed = X_test[cross_col_df_names].to_numpy()

# save categorical features
X_train_cat = X_train[categorical_headers].to_numpy() 
X_test_cat = X_test[categorical_headers].to_numpy() 

# and save off the numeric features
X_train_num = X_train.drop(columns=categorical_headers).to_numpy()
X_test_num = X_test.drop(columns=categorical_headers).to_numpy()


# we need to create separate lists for each branch
crossed_outputs = []

# CROSSED DATA INPUT
input_crossed = Input(shape=(X_train_crossed.shape[1],), dtype='int64', name='wide_inputs')
for idx,col in enumerate(cross_col_df_names):
    
    # track what the maximum integer value will be for this variable
    # which is the same as the number of categories
    N = max(X_train[col].max(),X_test[col].max())+1
    
    
    # this line of code does this: input_branch[:,idx]
    x = tf.gather(input_crossed, idx, axis=1)
    
    # now use an embedding to deal with integers as if they were one hot encoded
    x = Embedding(input_dim=N, 
                  output_dim=int(np.sqrt(N)), 
                  input_length=1, name=col+'_embed')(x)
    
    # save these outputs to concatenate later
    crossed_outputs.append(x)
    

# now concatenate the outputs and add a fully connected layer
wide_branch = concatenate(crossed_outputs, name='wide_concat')

# reset this input branch
all_deep_branch_outputs = []

# CATEGORICAL DATA INPUT
input_cat = Input(shape=(X_train_cat.shape[1],), dtype='int64', name='categorical_input')
for idx,col in enumerate(categorical_headers):
    
    # track what the maximum integer value will be for this variable
    # which is the same as the number of categories
    N = max(X_train[col].max(),X_test[col].max())+1
    
    # this line of code does this: input_branch[:,idx]
    x = tf.gather(input_cat, idx, axis=1)
    
    # now use an embedding to deal with integers as if they were one hot encoded
    x = Embedding(input_dim=N, 
                  output_dim=int(np.sqrt(N)), 
                  input_length=1, name=col+'_embed')(x)
    
    # save these outputs to concatenate later
    all_deep_branch_outputs.append(x)
    
# NUMERIC DATA INPUT
# create dense input branch for numeric
input_num = Input(shape=(X_train_num.shape[1],), name='numeric')
x_dense = Dense(units=15, activation='relu',name='num_1')(input_num)
    
all_deep_branch_outputs.append(x_dense)


# merge the deep branches together
deep_branch = concatenate(all_deep_branch_outputs,name='concat_embeds')
deep_branch = Dense(units=50,activation='relu', name='deep1')(deep_branch)
deep_branch = Dense(units=25,activation='relu', name='deep2')(deep_branch)
deep_branch = Dense(units=10,activation='relu', name='deep3')(deep_branch)
    
# merge the deep and wide branch
final_branch = concatenate([wide_branch, deep_branch],
                           name='concat_deep_wide')
final_branch = Dense(units=1,activation='sigmoid',
                     name='combined')(final_branch)

model = Model(inputs=[input_crossed,input_cat,input_num], 
              outputs=final_branch)

# model.summary()
In [854]:
%%time

model.compile(optimizer='sgd',
              loss='mean_squared_error',
              metrics=['Precision'])

# lets also add the history variable to see how we are doing
# and lets add a validation set to keep track of our progress
history = model.fit([X_train_crossed,X_train_cat,X_train_num],
                    y_train, 
                    epochs=15, 
                    batch_size=32, 
                    verbose=1, 
                    validation_data = ([X_test_crossed,X_test_cat,X_test_num],y_test))
Epoch 1/15
157/157 [==============================] - 1s 6ms/step - loss: 0.2448 - precision: 0.5787 - val_loss: 0.2310 - val_precision: 0.6390
Epoch 2/15
157/157 [==============================] - 0s 3ms/step - loss: 0.2354 - precision: 0.6086 - val_loss: 0.2262 - val_precision: 0.6692
Epoch 3/15
157/157 [==============================] - 0s 3ms/step - loss: 0.2326 - precision: 0.6191 - val_loss: 0.2242 - val_precision: 0.6259
Epoch 4/15
157/157 [==============================] - 0s 3ms/step - loss: 0.2308 - precision: 0.6273 - val_loss: 0.2244 - val_precision: 0.6178
Epoch 5/15
157/157 [==============================] - 0s 3ms/step - loss: 0.2299 - precision: 0.6244 - val_loss: 0.2205 - val_precision: 0.6590
Epoch 6/15
157/157 [==============================] - 0s 3ms/step - loss: 0.2286 - precision: 0.6279 - val_loss: 0.2198 - val_precision: 0.6444
Epoch 7/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2272 - precision: 0.6261 - val_loss: 0.2190 - val_precision: 0.6684
Epoch 8/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2259 - precision: 0.6293 - val_loss: 0.2182 - val_precision: 0.6396
Epoch 9/15
157/157 [==============================] - 0s 3ms/step - loss: 0.2249 - precision: 0.6332 - val_loss: 0.2162 - val_precision: 0.6595
Epoch 10/15
157/157 [==============================] - 0s 3ms/step - loss: 0.2235 - precision: 0.6373 - val_loss: 0.2160 - val_precision: 0.6424
Epoch 11/15
157/157 [==============================] - 0s 3ms/step - loss: 0.2225 - precision: 0.6378 - val_loss: 0.2139 - val_precision: 0.6667
Epoch 12/15
157/157 [==============================] - 0s 3ms/step - loss: 0.2215 - precision: 0.6426 - val_loss: 0.2137 - val_precision: 0.6487
Epoch 13/15
157/157 [==============================] - 0s 3ms/step - loss: 0.2204 - precision: 0.6478 - val_loss: 0.2135 - val_precision: 0.6396
Epoch 14/15
157/157 [==============================] - 0s 3ms/step - loss: 0.2190 - precision: 0.6410 - val_loss: 0.2113 - val_precision: 0.6567
Epoch 15/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2177 - precision: 0.6438 - val_loss: 0.2103 - val_precision: 0.6523
CPU times: user 9.06 s, sys: 974 ms, total: 10 s
Wall time: 8 s
In [855]:
from sklearn import metrics as mt
yhat = np.round(model.predict([X_test_crossed,X_test_cat,X_test_num]))
print(mt.confusion_matrix(y_test,yhat))
print(mt.precision_score(y_test,yhat))


y_pred_0 = model.predict([X_test_crossed,X_test_cat,X_test_num]).ravel()

#false positve and true postive rates using roc
fpr_0, tpr_0, thresholds_0 = mt.roc_curve(y_test, y_pred_0)

#area under the curve
auc_0 = mt.auc(fpr_0, tpr_0)
[[365 250]
 [168 469]]
0.6522948539638387
In [856]:
from matplotlib import pyplot as plt

%matplotlib inline

plt.figure(figsize=(10,4))
plt.subplot(2,2,1)
plt.plot(history.history['precision'])

plt.ylabel('Precision %')
plt.title('Training')
plt.subplot(2,2,2)
plt.plot(history.history['val_precision'])
plt.title('Validation')

plt.subplot(2,2,3)
plt.plot(history.history['loss'])
plt.ylabel('Training Loss')
plt.xlabel('epochs')

plt.subplot(2,2,4)
plt.plot(history.history['val_loss'])
plt.xlabel('epochs')
Out[856]:
Text(0.5, 0, 'epochs')
In [857]:
# possible crossing options:
#   'key','time_signature','danceability',
#   'energy','speechiness','acousticness',
#   'instrumentalness','liveness','valence'

cross_columns = [['danceability','energy','valence'],
                 ['key', 'danceability','liveness'],
                 ['speechiness','acousticness','instrumentalness'],
                 ['time_signature','energy']
                ]

# cross each set of columns in the list above
cross_col_df_names = []
for cols_list in cross_columns:
    # encode as ints for the embedding
    enc = LabelEncoder()
    
    X_crossed_train = []
    X_crossed_test = []
    for row in X_train[cols_list].values:
        X_crossed_train.append(reduce((lambda x,y: x+y),row))
    for row in X_test[cols_list].values:
        X_crossed_test.append(reduce((lambda x,y: x+y),row))
    
    # get a nice name for this new crossed column
    cross_col_name = '_'.join(cols_list)
    
    # 2. encode as integers
#     enc.fit(np.hstack((X_crossed_train.to_numpy(),  X_crossed_test.to_numpy())))
    enc.fit(np.hstack((np.array(X_crossed_train),np.array(X_crossed_test))))
    
    # 3. Save into dataframe with new name
    X_train[cross_col_name] = enc.transform(X_crossed_train)
    X_test[cross_col_name] = enc.transform(X_crossed_test)
    
    # keep track of the new names of the crossed columns
    cross_col_df_names.append(cross_col_name) 
In [858]:
# get crossed columns
X_train_crossed = X_train[cross_col_df_names].to_numpy()
X_test_crossed = X_test[cross_col_df_names].to_numpy()

# save categorical features
X_train_cat = X_train[categorical_headers].to_numpy() 
X_test_cat = X_test[categorical_headers].to_numpy() 

# and save off the numeric features
X_train_num = X_train.drop(columns=categorical_headers).to_numpy()
X_test_num = X_test.drop(columns=categorical_headers).to_numpy()


# we need to create separate lists for each branch
crossed_outputs = []

# CROSSED DATA INPUT
input_crossed = Input(shape=(X_train_crossed.shape[1],), dtype='int64', name='wide_inputs')
for idx,col in enumerate(cross_col_df_names):
    
    # track what the maximum integer value will be for this variable
    # which is the same as the number of categories
    N = max(X_train[col].max(),X_test[col].max())+1
    
    
    # this line of code does this: input_branch[:,idx]
    x = tf.gather(input_crossed, idx, axis=1)
    
    # now use an embedding to deal with integers as if they were one hot encoded
    x = Embedding(input_dim=N, 
                  output_dim=int(np.sqrt(N)), 
                  input_length=1, name=col+'_embed')(x)
    
    # save these outputs to concatenate later
    crossed_outputs.append(x)
    

# now concatenate the outputs and add a fully connected layer
wide_branch = concatenate(crossed_outputs, name='wide_concat')

# reset this input branch
all_deep_branch_outputs = []

# CATEGORICAL DATA INPUT
input_cat = Input(shape=(X_train_cat.shape[1],), dtype='int64', name='categorical_input')
for idx,col in enumerate(categorical_headers):
    
    # track what the maximum integer value will be for this variable
    # which is the same as the number of categories
    N = max(X_train[col].max(),X_test[col].max())+1
    
    # this line of code does this: input_branch[:,idx]
    x = tf.gather(input_cat, idx, axis=1)
    
    # now use an embedding to deal with integers as if they were one hot encoded
    x = Embedding(input_dim=N, 
                  output_dim=int(np.sqrt(N)), 
                  input_length=1, name=col+'_embed')(x)
    
    # save these outputs to concatenate later
    all_deep_branch_outputs.append(x)
    
# NUMERIC DATA INPUT
# create dense input branch for numeric
input_num = Input(shape=(X_train_num.shape[1],), name='numeric')
x_dense = Dense(units=15, activation='relu',name='num_1')(input_num)
    
all_deep_branch_outputs.append(x_dense)


# merge the deep branches together
deep_branch = concatenate(all_deep_branch_outputs,name='concat_embeds')
deep_branch = Dense(units=50,activation='relu', name='deep1')(deep_branch)
deep_branch = Dense(units=25,activation='relu', name='deep2')(deep_branch)
deep_branch = Dense(units=10,activation='relu', name='deep3')(deep_branch)
    
# merge the deep and wide branch
final_branch = concatenate([wide_branch, deep_branch],
                           name='concat_deep_wide')
final_branch = Dense(units=1,activation='sigmoid',
                     name='combined')(final_branch)

model = Model(inputs=[input_crossed,input_cat,input_num], 
              outputs=final_branch)

# model.summary()
In [859]:
%%time

model.compile(optimizer='sgd',
              loss='mean_squared_error',
              metrics=['Precision'])

# lets also add the history variable to see how we are doing
# and lets add a validation set to keep track of our progress
history = model.fit([X_train_crossed,X_train_cat,X_train_num],
                    y_train, 
                    epochs=15, 
                    batch_size=32, 
                    verbose=1, 
                    validation_data = ([X_test_crossed,X_test_cat,X_test_num],y_test))
Epoch 1/15
157/157 [==============================] - 1s 5ms/step - loss: 0.2462 - precision: 0.5645 - val_loss: 0.2224 - val_precision: 0.6439
Epoch 2/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2145 - precision: 0.6393 - val_loss: 0.2015 - val_precision: 0.6453
Epoch 3/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2028 - precision: 0.6459 - val_loss: 0.1944 - val_precision: 0.7080
Epoch 4/15
157/157 [==============================] - 0s 2ms/step - loss: 0.1950 - precision: 0.6544 - val_loss: 0.1832 - val_precision: 0.6918
Epoch 5/15
157/157 [==============================] - 0s 2ms/step - loss: 0.1911 - precision: 0.6639 - val_loss: 0.2135 - val_precision: 0.7403
Epoch 6/15
157/157 [==============================] - 0s 2ms/step - loss: 0.1871 - precision: 0.6710 - val_loss: 0.1763 - val_precision: 0.6717
Epoch 7/15
157/157 [==============================] - 0s 2ms/step - loss: 0.1824 - precision: 0.6849 - val_loss: 0.1760 - val_precision: 0.7523
Epoch 8/15
157/157 [==============================] - 0s 2ms/step - loss: 0.1806 - precision: 0.6908 - val_loss: 0.1958 - val_precision: 0.7973
Epoch 9/15
157/157 [==============================] - 0s 2ms/step - loss: 0.1785 - precision: 0.6968 - val_loss: 0.1646 - val_precision: 0.7182
Epoch 10/15
157/157 [==============================] - 0s 2ms/step - loss: 0.1781 - precision: 0.6998 - val_loss: 0.1717 - val_precision: 0.7618
Epoch 11/15
157/157 [==============================] - 0s 2ms/step - loss: 0.1744 - precision: 0.7023 - val_loss: 0.1617 - val_precision: 0.7178
Epoch 12/15
157/157 [==============================] - 0s 2ms/step - loss: 0.1759 - precision: 0.6981 - val_loss: 0.1631 - val_precision: 0.7143
Epoch 13/15
157/157 [==============================] - 0s 2ms/step - loss: 0.1727 - precision: 0.7040 - val_loss: 0.1609 - val_precision: 0.7064
Epoch 14/15
157/157 [==============================] - 0s 2ms/step - loss: 0.1728 - precision: 0.7040 - val_loss: 0.1587 - val_precision: 0.7445
Epoch 15/15
157/157 [==============================] - 0s 2ms/step - loss: 0.1696 - precision: 0.7154 - val_loss: 0.1570 - val_precision: 0.7446
CPU times: user 8.55 s, sys: 1.08 s, total: 9.62 s
Wall time: 6.42 s
In [860]:
yhat = np.round(model.predict([X_test_crossed,X_test_cat,X_test_num]))

yhat_best = yhat 
print(mt.confusion_matrix(y_test,yhat))
print(mt.precision_score(y_test,yhat))

y_pred_1 = model.predict([X_test_crossed,X_test_cat,X_test_num]).ravel()

#false positve and true postive rates using roc
fpr_1, tpr_1, thresholds_1 = mt.roc_curve(y_test, y_pred_1)

#area under the curve
auc_1 = mt.auc(fpr_1, tpr_1)
[[425 190]
 [ 83 554]]
0.7446236559139785
In [861]:
from matplotlib import pyplot as plt

%matplotlib inline


plt.figure(figsize=(10,4))
plt.subplot(2,2,1)
plt.plot(history.history['precision'])

plt.ylabel('Precision %')
plt.title('Training')
plt.subplot(2,2,2)
plt.plot(history.history['val_precision'])
plt.title('Validation')

plt.subplot(2,2,3)
plt.plot(history.history['loss'])
plt.ylabel('Training Loss')
plt.xlabel('epochs')

plt.subplot(2,2,4)
plt.plot(history.history['val_loss'])
plt.xlabel('epochs')




model10_hist_accur = history.history['precision']
model10_val_accur = history.history['val_precision']
model10_hist_loss = history.history['loss']
model10_val_loss = history.history['val_loss']
In [862]:
# get crossed columns
X_train_crossed = X_train[cross_col_df_names].to_numpy()
X_test_crossed = X_test[cross_col_df_names].to_numpy()

# save categorical features
X_train_cat = X_train[categorical_headers].to_numpy() 
X_test_cat = X_test[categorical_headers].to_numpy() 

# and save off the numeric features
X_train_num = X_train.drop(columns=categorical_headers).to_numpy()
X_test_num = X_test.drop(columns=categorical_headers).to_numpy()


# we need to create separate lists for each branch
crossed_outputs = []

# CROSSED DATA INPUT
input_crossed = Input(shape=(X_train_crossed.shape[1],), dtype='int64', name='wide_inputs')
for idx,col in enumerate(cross_col_df_names):
    
    # track what the maximum integer value will be for this variable
    # which is the same as the number of categories
    N = max(X_train[col].max(),X_test[col].max())+1
    
    
    # this line of code does this: input_branch[:,idx]
    x = tf.gather(input_crossed, idx, axis=1)
    
    # now use an embedding to deal with integers as if they were one hot encoded
    x = Embedding(input_dim=N, 
                  output_dim=int(np.sqrt(N)), 
                  input_length=1, name=col+'_embed')(x)
    
    # save these outputs to concatenate later
    crossed_outputs.append(x)
    

# now concatenate the outputs and add a fully connected layer
wide_branch = concatenate(crossed_outputs, name='wide_concat')

# reset this input branch
all_deep_branch_outputs = []

# CATEGORICAL DATA INPUT
input_cat = Input(shape=(X_train_cat.shape[1],), dtype='int64', name='categorical_input')
for idx,col in enumerate(categorical_headers):
    
    # track what the maximum integer value will be for this variable
    # which is the same as the number of categories
    N = max(X_train[col].max(),X_test[col].max())+1
    
    # this line of code does this: input_branch[:,idx]
    x = tf.gather(input_cat, idx, axis=1)
    
    # now use an embedding to deal with integers as if they were one hot encoded
    x = Embedding(input_dim=N, 
                  output_dim=int(np.sqrt(N)), 
                  input_length=1, name=col+'_embed')(x)
    
    # save these outputs to concatenate later
    all_deep_branch_outputs.append(x)
    
# NUMERIC DATA INPUT
# create dense input branch for numeric
input_num = Input(shape=(X_train_num.shape[1],), name='numeric')
x_dense = Dense(units=15, activation='relu',name='num_1')(input_num)
    
all_deep_branch_outputs.append(x_dense)


# merge the deep branches together
deep_branch = concatenate(all_deep_branch_outputs,name='concat_embeds')
deep_branch = Dense(units=50,activation='relu', name='deep1')(deep_branch)
deep_branch = Dense(units=25,activation='relu', name='deep2')(deep_branch)
deep_branch = Dense(units=10,activation='relu', name='deep3')(deep_branch)
    
# merge the deep and wide branch
final_branch = concatenate([wide_branch, deep_branch],
                           name='concat_deep_wide')
final_branch = Dense(units=1,activation='sigmoid',
                     name='combined')(final_branch)

model = Model(inputs=[input_crossed,input_cat,input_num], 
              outputs=final_branch)

# model.summary()
In [863]:
%%time

model.compile(optimizer='adagrad',
              loss='mean_squared_error',
              metrics=['Precision'])

# lets also add the history variable to see how we are doing
# and lets add a validation set to keep track of our progress
history = model.fit([X_train_crossed,X_train_cat,X_train_num],
                    y_train, 
                    epochs=15, 
                    batch_size=32, 
                    verbose=1, 
                    validation_data = ([X_test_crossed,X_test_cat,X_test_num],y_test))
Epoch 1/15
157/157 [==============================] - 1s 4ms/step - loss: 0.2628 - precision: 0.5120 - val_loss: 0.2535 - val_precision: 0.5365
Epoch 2/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2466 - precision: 0.5618 - val_loss: 0.2399 - val_precision: 0.5988
Epoch 3/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2369 - precision: 0.5964 - val_loss: 0.2307 - val_precision: 0.6211
Epoch 4/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2298 - precision: 0.6113 - val_loss: 0.2231 - val_precision: 0.6283
Epoch 5/15
157/157 [==============================] - 0s 3ms/step - loss: 0.2238 - precision: 0.6212 - val_loss: 0.2169 - val_precision: 0.6368
Epoch 6/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2189 - precision: 0.6258 - val_loss: 0.2118 - val_precision: 0.6458
Epoch 7/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2151 - precision: 0.6301 - val_loss: 0.2079 - val_precision: 0.6523
Epoch 8/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2121 - precision: 0.6325 - val_loss: 0.2049 - val_precision: 0.6568
Epoch 9/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2097 - precision: 0.6368 - val_loss: 0.2025 - val_precision: 0.6573
Epoch 10/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2076 - precision: 0.6361 - val_loss: 0.2003 - val_precision: 0.6600
Epoch 11/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2058 - precision: 0.6398 - val_loss: 0.1984 - val_precision: 0.6616
Epoch 12/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2042 - precision: 0.6393 - val_loss: 0.1968 - val_precision: 0.6631
Epoch 13/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2028 - precision: 0.6412 - val_loss: 0.1952 - val_precision: 0.6627
Epoch 14/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2015 - precision: 0.6419 - val_loss: 0.1939 - val_precision: 0.6635
Epoch 15/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2003 - precision: 0.6434 - val_loss: 0.1927 - val_precision: 0.6636
CPU times: user 11.9 s, sys: 1.35 s, total: 13.3 s
Wall time: 8.4 s
In [864]:
yhat = np.round(model.predict([X_test_crossed,X_test_cat,X_test_num]))
print(mt.confusion_matrix(y_test,yhat))
print(mt.precision_score(y_test,yhat))

y_pred_2 = model.predict([X_test_crossed,X_test_cat,X_test_num]).ravel()

#false positve and true postive rates using roc
fpr_2, tpr_2, thresholds_2 = mt.roc_curve(y_test, y_pred_2)

#area under the curve
auc_2 = mt.auc(fpr_2, tpr_2)
[[327 288]
 [ 69 568]]
0.6635514018691588
In [865]:
from matplotlib import pyplot as plt

%matplotlib inline

plt.figure(figsize=(10,4))
plt.subplot(2,2,1)
plt.plot(history.history['precision'])

plt.ylabel('Precision %')
plt.title('Training')
plt.subplot(2,2,2)
plt.plot(history.history['val_precision'])
plt.title('Validation')

plt.subplot(2,2,3)
plt.plot(history.history['loss'])
plt.ylabel('Training Loss')
plt.xlabel('epochs')

plt.subplot(2,2,4)
plt.plot(history.history['val_loss'])
plt.xlabel('epochs')
Out[865]:
Text(0.5, 0, 'epochs')
In [866]:
# get crossed columns
X_train_crossed = X_train[cross_col_df_names].to_numpy()
X_test_crossed = X_test[cross_col_df_names].to_numpy()

# save categorical features
X_train_cat = X_train[categorical_headers].to_numpy() 
X_test_cat = X_test[categorical_headers].to_numpy() 

# and save off the numeric features
X_train_num = X_train.drop(columns=categorical_headers).to_numpy()
X_test_num = X_test.drop(columns=categorical_headers).to_numpy()


# we need to create separate lists for each branch
crossed_outputs = []

# CROSSED DATA INPUT
input_crossed = Input(shape=(X_train_crossed.shape[1],), dtype='int64', name='wide_inputs')
for idx,col in enumerate(cross_col_df_names):
    
    # track what the maximum integer value will be for this variable
    # which is the same as the number of categories
    N = max(X_train[col].max(),X_test[col].max())+1
    
    
    # this line of code does this: input_branch[:,idx]
    x = tf.gather(input_crossed, idx, axis=1)
    
    # now use an embedding to deal with integers as if they were one hot encoded
    x = Embedding(input_dim=N, 
                  output_dim=int(np.sqrt(N)), 
                  input_length=1, name=col+'_embed')(x)
    
    # save these outputs to concatenate later
    crossed_outputs.append(x)
    

# now concatenate the outputs and add a fully connected layer
wide_branch = concatenate(crossed_outputs, name='wide_concat')

# reset this input branch
all_deep_branch_outputs = []

# CATEGORICAL DATA INPUT
input_cat = Input(shape=(X_train_cat.shape[1],), dtype='int64', name='categorical_input')
for idx,col in enumerate(categorical_headers):
    
    # track what the maximum integer value will be for this variable
    # which is the same as the number of categories
    N = max(X_train[col].max(),X_test[col].max())+1
    
    # this line of code does this: input_branch[:,idx]
    x = tf.gather(input_cat, idx, axis=1)
    
    # now use an embedding to deal with integers as if they were one hot encoded
    x = Embedding(input_dim=N, 
                  output_dim=int(np.sqrt(N)), 
                  input_length=1, name=col+'_embed')(x)
    
    # save these outputs to concatenate later
    all_deep_branch_outputs.append(x)
    
# NUMERIC DATA INPUT
# create dense input branch for numeric
input_num = Input(shape=(X_train_num.shape[1],), name='numeric')
x_dense = Dense(units=15, activation='relu',name='num_1')(input_num)
    
all_deep_branch_outputs.append(x_dense)

# merge the deep branches together
deep_branch = concatenate(all_deep_branch_outputs,name='concat_embeds')
deep_branch = Dense(units=50,activation='relu', name='deep1')(deep_branch)
deep_branch = Dense(units=25,activation='relu', name='deep2')(deep_branch)
deep_branch = Dense(units=10,activation='relu', name='deep3')(deep_branch)
deep_branch = Dense(units=5,activation='relu', name='deep4')(deep_branch)
    
# merge the deep and wide branch
final_branch = concatenate([wide_branch, deep_branch],
                           name='concat_deep_wide')
final_branch = Dense(units=1,activation='sigmoid',
                     name='combined')(final_branch)

model = Model(inputs=[input_crossed,input_cat,input_num], 
              outputs=final_branch)

# model.summary()
In [867]:
%%time

model.compile(optimizer='sgd',
              loss='mean_squared_error',
              metrics=['Precision'])

# lets also add the history variable to see how we are doing
# and lets add a validation set to keep track of our progress
history = model.fit([X_train_crossed,X_train_cat,X_train_num],
                    y_train, 
                    epochs=15, 
                    batch_size=32, 
                    verbose=1, 
                    validation_data = ([X_test_crossed,X_test_cat,X_test_num],y_test))
Epoch 1/15
157/157 [==============================] - 1s 4ms/step - loss: 0.2288 - precision: 0.6251 - val_loss: 0.2157 - val_precision: 0.6565
Epoch 2/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2153 - precision: 0.6360 - val_loss: 0.2037 - val_precision: 0.6588
Epoch 3/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2065 - precision: 0.6371 - val_loss: 0.1959 - val_precision: 0.6583
Epoch 4/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2009 - precision: 0.6409 - val_loss: 0.1913 - val_precision: 0.6734
Epoch 5/15
157/157 [==============================] - 0s 2ms/step - loss: 0.1969 - precision: 0.6428 - val_loss: 0.1891 - val_precision: 0.6487
Epoch 6/15
157/157 [==============================] - 0s 2ms/step - loss: 0.1936 - precision: 0.6456 - val_loss: 0.1846 - val_precision: 0.6616
Epoch 7/15
157/157 [==============================] - 0s 2ms/step - loss: 0.1912 - precision: 0.6501 - val_loss: 0.1858 - val_precision: 0.6469
Epoch 8/15
157/157 [==============================] - 0s 2ms/step - loss: 0.1890 - precision: 0.6550 - val_loss: 0.1848 - val_precision: 0.6495
Epoch 9/15
157/157 [==============================] - 0s 3ms/step - loss: 0.1863 - precision: 0.6636 - val_loss: 0.1785 - val_precision: 0.6644
Epoch 10/15
157/157 [==============================] - 0s 3ms/step - loss: 0.1849 - precision: 0.6643 - val_loss: 0.1763 - val_precision: 0.6895
Epoch 11/15
157/157 [==============================] - 0s 3ms/step - loss: 0.1833 - precision: 0.6696 - val_loss: 0.1780 - val_precision: 0.7151
Epoch 12/15
157/157 [==============================] - 0s 3ms/step - loss: 0.1814 - precision: 0.6785 - val_loss: 0.1730 - val_precision: 0.7022
Epoch 13/15
157/157 [==============================] - 0s 3ms/step - loss: 0.1798 - precision: 0.6810 - val_loss: 0.1749 - val_precision: 0.7227
Epoch 14/15
157/157 [==============================] - 0s 3ms/step - loss: 0.1783 - precision: 0.6837 - val_loss: 0.1698 - val_precision: 0.6923
Epoch 15/15
157/157 [==============================] - 1s 3ms/step - loss: 0.1770 - precision: 0.6838 - val_loss: 0.1767 - val_precision: 0.6718
CPU times: user 9.13 s, sys: 1.02 s, total: 10.1 s
Wall time: 7.24 s
In [868]:
yhat = np.round(model.predict([X_test_crossed,X_test_cat,X_test_num]))
print(mt.confusion_matrix(y_test,yhat))
print(mt.precision_score(y_test,yhat))
[[316 299]
 [ 25 612]]
0.6717892425905598
In [869]:
y_pred_3 = model.predict([X_test_crossed,X_test_cat,X_test_num]).ravel()

#false positve and true postive rates using roc
fpr_3, tpr_3, thresholds_3 = mt.roc_curve(y_test, y_pred_3)

#area under the curve
auc_3 = mt.auc(fpr_3, tpr_3)
In [870]:
from matplotlib import pyplot as plt

%matplotlib inline

plt.figure(figsize=(10,4))
plt.subplot(2,2,1)
plt.plot(history.history['precision'])

plt.ylabel('Precision %')
plt.title('Training')
plt.subplot(2,2,2)
plt.plot(history.history['val_precision'])
plt.title('Validation')

plt.subplot(2,2,3)
plt.plot(history.history['loss'])
plt.ylabel('Training Loss')
plt.xlabel('epochs')

plt.subplot(2,2,4)
plt.plot(history.history['val_loss'])
plt.xlabel('epochs')
Out[870]:
Text(0.5, 0, 'epochs')
In [871]:
# get crossed columns
X_train_crossed = X_train[cross_col_df_names].to_numpy()
X_test_crossed = X_test[cross_col_df_names].to_numpy()

# save categorical features
X_train_cat = X_train[categorical_headers].to_numpy() 
X_test_cat = X_test[categorical_headers].to_numpy() 

# and save off the numeric features
X_train_num = X_train.drop(columns=categorical_headers).to_numpy()
X_test_num = X_test.drop(columns=categorical_headers).to_numpy()


# we need to create separate lists for each branch
crossed_outputs = []

# CROSSED DATA INPUT
input_crossed = Input(shape=(X_train_crossed.shape[1],), dtype='int64', name='wide_inputs')
for idx,col in enumerate(cross_col_df_names):
    
    # track what the maximum integer value will be for this variable
    # which is the same as the number of categories
    N = max(X_train[col].max(),X_test[col].max())+1
    
    
    # this line of code does this: input_branch[:,idx]
    x = tf.gather(input_crossed, idx, axis=1)
    
    # now use an embedding to deal with integers as if they were one hot encoded
    x = Embedding(input_dim=N, 
                  output_dim=int(np.sqrt(N)), 
                  input_length=1, name=col+'_embed')(x)
    
    # save these outputs to concatenate later
    crossed_outputs.append(x)
    

# now concatenate the outputs and add a fully connected layer
wide_branch = concatenate(crossed_outputs, name='wide_concat')

# reset this input branch
all_deep_branch_outputs = []

# CATEGORICAL DATA INPUT
input_cat = Input(shape=(X_train_cat.shape[1],), dtype='int64', name='categorical_input')
for idx,col in enumerate(categorical_headers):
    
    # track what the maximum integer value will be for this variable
    # which is the same as the number of categories
    N = max(X_train[col].max(),X_test[col].max())+1
    
    # this line of code does this: input_branch[:,idx]
    x = tf.gather(input_cat, idx, axis=1)
    
    # now use an embedding to deal with integers as if they were one hot encoded
    x = Embedding(input_dim=N, 
                  output_dim=int(np.sqrt(N)), 
                  input_length=1, name=col+'_embed')(x)
    
    # save these outputs to concatenate later
    all_deep_branch_outputs.append(x)
    
# NUMERIC DATA INPUT
# create dense input branch for numeric
input_num = Input(shape=(X_train_num.shape[1],), name='numeric')
x_dense = Dense(units=15, activation='relu',name='num_1')(input_num)
    
all_deep_branch_outputs.append(x_dense)

# merge the deep branches together
deep_branch = concatenate(all_deep_branch_outputs,name='concat_embeds')
deep_branch = Dense(units=75,activation='relu', name='deep0')(deep_branch)
deep_branch = Dense(units=50,activation='relu', name='deep1')(deep_branch)
deep_branch = Dense(units=25,activation='relu', name='deep2')(deep_branch)
deep_branch = Dense(units=10,activation='relu', name='deep3')(deep_branch)
deep_branch = Dense(units=5,activation='relu', name='deep4')(deep_branch)
    
# merge the deep and wide branch
final_branch = concatenate([wide_branch, deep_branch],
                           name='concat_deep_wide')
final_branch = Dense(units=1,activation='sigmoid',
                     name='combined')(final_branch)

model = Model(inputs=[input_crossed,input_cat,input_num], 
              outputs=final_branch)

# model.summary()
In [872]:
%%time

model.compile(optimizer='sgd',
              loss='mean_squared_error',
              metrics=['Precision'])

# lets also add the history variable to see how we are doing
# and lets add a validation set to keep track of our progress
history = model.fit([X_train_crossed,X_train_cat,X_train_num],
                    y_train, 
                    epochs=15, 
                    batch_size=32, 
                    verbose=1, 
                    validation_data = ([X_test_crossed,X_test_cat,X_test_num],y_test))
Epoch 1/15
157/157 [==============================] - 1s 5ms/step - loss: 0.2247 - precision: 0.6354 - val_loss: 0.1994 - val_precision: 0.6587
Epoch 2/15
157/157 [==============================] - 0s 3ms/step - loss: 0.1984 - precision: 0.6373 - val_loss: 0.1884 - val_precision: 0.6624
Epoch 3/15
157/157 [==============================] - 0s 3ms/step - loss: 0.1920 - precision: 0.6435 - val_loss: 0.1862 - val_precision: 0.6801
Epoch 4/15
157/157 [==============================] - 0s 3ms/step - loss: 0.1888 - precision: 0.6486 - val_loss: 0.1815 - val_precision: 0.6694
Epoch 5/15
157/157 [==============================] - 0s 2ms/step - loss: 0.1862 - precision: 0.6517 - val_loss: 0.1786 - val_precision: 0.6744
Epoch 6/15
157/157 [==============================] - 0s 2ms/step - loss: 0.1846 - precision: 0.6521 - val_loss: 0.1769 - val_precision: 0.6721
Epoch 7/15
157/157 [==============================] - 0s 2ms/step - loss: 0.1832 - precision: 0.6553 - val_loss: 0.1768 - val_precision: 0.6591
Epoch 8/15
157/157 [==============================] - 1s 3ms/step - loss: 0.1815 - precision: 0.6585 - val_loss: 0.1766 - val_precision: 0.6602
Epoch 9/15
157/157 [==============================] - 0s 2ms/step - loss: 0.1805 - precision: 0.6629 - val_loss: 0.1724 - val_precision: 0.6727
Epoch 10/15
157/157 [==============================] - 0s 2ms/step - loss: 0.1790 - precision: 0.6660 - val_loss: 0.1706 - val_precision: 0.6837
Epoch 11/15
157/157 [==============================] - 0s 3ms/step - loss: 0.1770 - precision: 0.6729 - val_loss: 0.1714 - val_precision: 0.6734
Epoch 12/15
157/157 [==============================] - 1s 3ms/step - loss: 0.1757 - precision: 0.6760 - val_loss: 0.1678 - val_precision: 0.6767
Epoch 13/15
157/157 [==============================] - 0s 3ms/step - loss: 0.1744 - precision: 0.6767 - val_loss: 0.1665 - val_precision: 0.6846
Epoch 14/15
157/157 [==============================] - 0s 2ms/step - loss: 0.1734 - precision: 0.6815 - val_loss: 0.1905 - val_precision: 0.7485
Epoch 15/15
157/157 [==============================] - 0s 2ms/step - loss: 0.1724 - precision: 0.6890 - val_loss: 0.1658 - val_precision: 0.6824
CPU times: user 9.61 s, sys: 1 s, total: 10.6 s
Wall time: 7.95 s
In [873]:
yhat = np.round(model.predict([X_test_crossed,X_test_cat,X_test_num]))
print(mt.confusion_matrix(y_test,yhat))
print(mt.precision_score(y_test,yhat))
[[339 276]
 [ 44 593]]
0.6823935558112774
In [874]:
from matplotlib import pyplot as plt

%matplotlib inline

plt.figure(figsize=(10,4))
plt.subplot(2,2,1)
plt.plot(history.history['precision'])

plt.ylabel('Precision %')
plt.title('Training')
plt.subplot(2,2,2)
plt.plot(history.history['val_precision'])
plt.title('Validation')

plt.subplot(2,2,3)
plt.plot(history.history['loss'])
plt.ylabel('Training Loss')
plt.xlabel('epochs')

plt.subplot(2,2,4)
plt.plot(history.history['val_loss'])
plt.xlabel('epochs')
Out[874]:
Text(0.5, 0, 'epochs')

Comparing wide and deep models

In [875]:
from sklearn import metrics
y_pred_4 = model.predict([X_test_crossed,X_test_cat,X_test_num]).ravel()

#false positve and true postive rates using roc
fpr_4, tpr_4, thresholds_4 = metrics.roc_curve(y_test, y_pred_4)

#area under the curve
auc_4 = metrics.auc(fpr_4, tpr_4)
In [876]:
plt.figure(figsize=(12,12))

#plot halfway line
plt.plot([0,1], [0,1], 'k--')

#plot for model 0 ROC
plt.plot(fpr_0, tpr_0, label='Model 0 (area = {:.3f})'.format(auc_0))

#plot for model 1 ROC
plt.plot(fpr_1, tpr_1, label='Model 1 (area = {:.3f})'.format(auc_1))

#plot for model 2 ROC
plt.plot(fpr_2, tpr_2, label='Model 2 (area = {:.3f})'.format(auc_2))

#plot for model 3 ROC
plt.plot(fpr_3, tpr_3, label='Model 3 (area = {:.3f})'.format(auc_3))

#plot for model 4 ROC
plt.plot(fpr_4, tpr_4, label='Model 4 (area = {:.3f})'.format(auc_4))

plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('All Wide and Deep ROC curves')
plt.legend(loc='best')
plt.show()

From the above ROC we note that model 1 perfomed better than the other models, so we are going to compare it with the standard MultiLayer Perceptron from scikit learn's library.

Comparing our best Wide and Deep Model to the standard Multi Layer Perceptron

In [877]:
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score, precision_score
from sklearn import metrics

data_features = ['key','time_signature','valence',
                 'danceability', 'energy','instrumentalness',
                 'speechiness','acousticness','liveness',
                  'time_signature','energy'
                ]

mlp = MLPClassifier(hidden_layer_sizes=(50,),
                    learning_rate_init=0.01,
                    random_state=1,
                    activation='relu')

mlp.fit(X_train[data_features], y_train)
yhat_mlp = mlp.predict(X_test[data_features])

print("MLP Accuracy Score: ", accuracy_score(y_test, yhat_mlp))
print("MLP Precision Score: ",precision_score(y_test,yhat_mlp))

#false positve and true postive rates using roc
fpr_sk, tpr_sk, thresholds_sk = metrics.roc_curve(y_test, yhat_mlp)

#area under the curve
auc_sk = metrics.auc(fpr_sk, tpr_sk)
MLP Accuracy Score:  0.8099041533546326
MLP Precision Score:  0.7692307692307693

We note that the MLP has a higher precision score of 0.7692307692307693 compared to our best performing Wide and Deep Network which has an precision score of 0.6823935558112774

In [878]:
plt.figure(figsize=(10,12))

#plot halfway line
plt.plot([0,1], [0,1], 'k--')

#plot for Wide and Deep ROC
plt.plot(fpr_4, tpr_4, label='Wide and Deep (area = {:.3f})'.format(auc_4))

#plot for MLP ROC
plt.plot(fpr_sk, tpr_sk, label='MLP (area = {:.3f})'.format(auc_sk))

plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Wide and Deep vs MLP ROC curve')
plt.legend(loc='best')
plt.show()

We can conclude that our Wide and Deep Neural Network performed slightly better than the Multi Layer Perceptron (mlp) from scikit's learn standard library. The ROC curve of the Wide and Deep Network is more close to the top left with an AUC of 0.815 compared to 0.808 of the standard mlp. Now we are going to carry out a Mcnemar test to compare the two models

In [879]:
from statsmodels.stats.contingency_tables import mcnemar

# define contingency table
# calculate mcnemar test
result = mcnemar(mt.confusion_matrix(y_test,yhat_mlp), exact=False, correction=True)
result2 = mcnemar(mt.confusion_matrix(y_test,yhat_best), exact=False, correction=True)

# summarize the finding
print('statistic=%.3f, p-value=%.25f' % (result.statistic, result.pvalue))
print('statistic=%.3f, p-value=%.25f' % (result2.statistic, result2.pvalue))
statistic=44.576, p-value=0.0000000000244718815686789
statistic=41.158, p-value=0.0000000001404426639141641

Since the p-value is less than 0.05, we accept the alternative hypothesis that there is no significant difference between these models. However since the wide and deep network has a significantly higher p-value, we can conclude that it perfomes better compared to the MLP.

3. Exceptional Work

Here we examine the effects of dropout on the ROC curve compared to our best perfoming wide and deep model. We also look to see if there are any differences between the training and validation loss and accuracy graphs

In [880]:
from keras.layers import Dropout


# get crossed columns
X_train_crossed = X_train[cross_col_df_names].to_numpy()
X_test_crossed = X_test[cross_col_df_names].to_numpy()

# save categorical features
X_train_cat = X_train[categorical_headers].to_numpy() 
X_test_cat = X_test[categorical_headers].to_numpy() 

# and save off the numeric features
X_train_num = X_train.drop(columns=categorical_headers).to_numpy()
X_test_num = X_test.drop(columns=categorical_headers).to_numpy()


# we need to create separate lists for each branch
crossed_outputs = []

# CROSSED DATA INPUT
input_crossed = Input(shape=(X_train_crossed.shape[1],), dtype='int64', name='wide_inputs')
for idx,col in enumerate(cross_col_df_names):
    
    # track what the maximum integer value will be for this variable
    # which is the same as the number of categories
    N = max(X_train[col].max(),X_test[col].max())+1
    
    
    # this line of code does this: input_branch[:,idx]
    x = tf.gather(input_crossed, idx, axis=1)
    
    # now use an embedding to deal with integers as if they were one hot encoded
    x = Embedding(input_dim=N, 
                  output_dim=int(np.sqrt(N)), 
                  input_length=1, name=col+'_embed')(x)
    
    # save these outputs to concatenate later
    crossed_outputs.append(x)
    

# merging the branches together 
wide_branch = concatenate(crossed_outputs, name='wide_concat')
wide_branch = Dense(units=1,activation='relu',name='num_0')(wide_branch)
wide_branch = Dropout(0.1)(wide_branch)

# reset this input branch
all_deep_branch_outputs = []

# CATEGORICAL DATA INPUT
input_cat = Input(shape=(X_train_cat.shape[1],), dtype='int64', name='categorical_input')
for idx,col in enumerate(categorical_headers):
    
    # track what the maximum integer value will be for this variable
    # which is the same as the number of categories
    N = max(X_train[col].max(),X_test[col].max())+1
    
    # this line of code does this: input_branch[:,idx]
    x = tf.gather(input_cat, idx, axis=1)
    
    # now use an embedding to deal with integers as if they were one hot encoded
    x = Embedding(input_dim=N, 
                  output_dim=int(np.sqrt(N)), 
                  input_length=1, name=col+'_embed')(x)
    
    # save these outputs to concatenate later
    all_deep_branch_outputs.append(x)
    
# NUMERIC DATA INPUT
# create dense input branch for numeric
input_num = Input(shape=(X_train_num.shape[1],), name='numeric')
x_dense = Dense(units=15, activation='relu',name='num_1')(input_num)
x_dense = Dropout(0.1)(x_dense)
    
all_deep_branch_outputs.append(x_dense)

# merge the deep branches together
deep_branch = concatenate(all_deep_branch_outputs,name='concat_embeds')
deep_branch = Dense(units=75,activation='relu', name='deep0')(deep_branch)
deep_branch = Dropout(0.3)(deep_branch)
print('Deep 0 created')
deep_branch = Dense(units=50,activation='relu', name='deep1')(deep_branch)
deep_branch = Dropout(0.3)(deep_branch)
print('Deep 1 created')
deep_branch = Dense(units=25,activation='relu', name='deep2')(deep_branch)
deep_branch = Dropout(0.3)(deep_branch)
print('Deep 2 created')
deep_branch = Dense(units=10,activation='relu', name='deep3')(deep_branch)
deep_branch = Dropout(0.3)(deep_branch)
print('Deep 3 created')
deep_branch = Dense(units=5,activation='relu', name='deep4')(deep_branch)
deep_branch = Dropout(0.3)(deep_branch)
print('Deep 4 created')
    
# merge the deep and wide branch
final_branch = concatenate([wide_branch, deep_branch],
                           name='concat_deep_wide')
final_branch = Dense(units=1,activation='sigmoid',
                     name='combined')(final_branch)
deep_branch = Dropout(0.1)(deep_branch)

model = Model(inputs=[input_crossed,input_cat,input_num], 
              outputs=final_branch)

# model.summary()
Deep 0 created
Deep 1 created
Deep 2 created
Deep 3 created
Deep 4 created
In [881]:
%%time

model.compile(optimizer='sgd',
              loss='mean_squared_error',
              metrics=['Precision'])

# lets also add the history variable to see how we are doing
# and lets add a validation set to keep track of our progress
history = model.fit([X_train_crossed,X_train_cat,X_train_num],
                    y_train, 
                    epochs=15, 
                    batch_size=32, 
                    verbose=1, 
                    validation_data = ([X_test_crossed,X_test_cat,X_test_num],y_test))
Epoch 1/15
157/157 [==============================] - 1s 4ms/step - loss: 0.2697 - precision: 0.5155 - val_loss: 0.2495 - val_precision: 0.5266
Epoch 2/15
157/157 [==============================] - 0s 3ms/step - loss: 0.2543 - precision: 0.5207 - val_loss: 0.2495 - val_precision: 0.5419
Epoch 3/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2532 - precision: 0.5221 - val_loss: 0.2492 - val_precision: 0.5374
Epoch 4/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2506 - precision: 0.5312 - val_loss: 0.2480 - val_precision: 0.5561
Epoch 5/15
157/157 [==============================] - 0s 3ms/step - loss: 0.2517 - precision: 0.5237 - val_loss: 0.2475 - val_precision: 0.5560
Epoch 6/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2492 - precision: 0.5309 - val_loss: 0.2466 - val_precision: 0.5585
Epoch 7/15
157/157 [==============================] - 0s 3ms/step - loss: 0.2484 - precision: 0.5305 - val_loss: 0.2445 - val_precision: 0.5676
Epoch 8/15
157/157 [==============================] - 0s 3ms/step - loss: 0.2488 - precision: 0.5332 - val_loss: 0.2433 - val_precision: 0.5789
Epoch 9/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2479 - precision: 0.5347 - val_loss: 0.2406 - val_precision: 0.5915
Epoch 10/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2473 - precision: 0.5378 - val_loss: 0.2390 - val_precision: 0.5976
Epoch 11/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2462 - precision: 0.5425 - val_loss: 0.2366 - val_precision: 0.6064
Epoch 12/15
157/157 [==============================] - 0s 2ms/step - loss: 0.2432 - precision: 0.5548 - val_loss: 0.2310 - val_precision: 0.6228
Epoch 13/15
157/157 [==============================] - 0s 3ms/step - loss: 0.2413 - precision: 0.5569 - val_loss: 0.2278 - val_precision: 0.6266
Epoch 14/15
157/157 [==============================] - 1s 3ms/step - loss: 0.2427 - precision: 0.5533 - val_loss: 0.2248 - val_precision: 0.6318
Epoch 15/15
157/157 [==============================] - 0s 3ms/step - loss: 0.2395 - precision: 0.5550 - val_loss: 0.2216 - val_precision: 0.6285
CPU times: user 10.1 s, sys: 1.03 s, total: 11.1 s
Wall time: 7.63 s
In [882]:
yhat = np.round(model.predict([X_test_crossed,X_test_cat,X_test_num]))
yhat_drop = yhat
print(mt.confusion_matrix(y_test,yhat))
print(mt.precision_score(y_test,yhat))
[[271 344]
 [ 55 582]]
0.6285097192224622
In [883]:
y_pred_dropout = model.predict([X_test_crossed,X_test_cat,X_test_num]).ravel()

#false positve and true postive rates using roc
fpr_dropout, tpr_dropout, thresholds_2 = mt.roc_curve(y_test, y_pred_dropout)

#area under the curve
auc_dropout = mt.auc(fpr_dropout, tpr_dropout)
In [884]:
plt.figure(figsize=(10,12))

#plot halfway line
plt.plot([0,1], [0,1], 'k--')

#plot for Wide and Deep ROC
plt.plot(fpr_4, tpr_4, label='Wide and Deep (area = {:.3f})'.format(auc_4))

#plot for MLP ROC
plt.plot(fpr_dropout, tpr_dropout, label='Wide and Deep with Dropout (area = {:.3f})'.format(auc_dropout))

plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Wide and Deep vs Wide and Deep with Dropout ROC curve')
plt.legend(loc='best')
plt.show()

Here we note that our wide and deep model had a large AUC without the dropout, showing that our model was not overfitting.

In [885]:
from matplotlib import pyplot as plt

%matplotlib inline

plt.figure(figsize=(10,4))
plt.subplot(2,2,1)
plt.plot(history.history['precision'])

plt.ylabel('Precision %')
plt.title('Training with DropOut')
plt.subplot(2,2,2)
plt.plot(history.history['val_precision'])
plt.title('Validation with DropOut')

plt.subplot(2,2,3)
plt.plot(history.history['loss'])
plt.ylabel('Training')
plt.xlabel('epochs')

plt.subplot(2,2,4)
plt.plot(history.history['val_loss'])
plt.xlabel('epochs')


plt.figure(figsize=(10,4))
plt.subplot(2,2,1)
plt.plot(model10_hist_accur)

plt.ylabel('Precision %')
plt.title('Training without DropOut')
plt.subplot(2,2,2)
plt.plot(model10_val_accur)
plt.title('Validation without DropOut')

plt.subplot(2,2,3)
plt.plot(model10_hist_loss)
plt.ylabel('Training Loss')
plt.xlabel('epochs')

plt.subplot(2,2,4)
plt.plot(model10_val_loss)
plt.xlabel('epochs')
Out[885]:
Text(0.5, 0, 'epochs')

The validation accuracy without dropout is slightly higher compared to that with dropout but we also note that validation lines are pretty consistent for the accuracy and loss functions when have dropout compared to when we do not have it. This might be because our data set is small and if there are more data samples (bigger dataset) there is a possibility that using dropout might be beneficial in the overall generalization process thus reducing overfitting for the model. What we could do differently if there were no hardware constraints would be to increase the number of epochs to 30 and observe if there would be any changes.

In [ ]:
 
In [ ]: