What’s more, it provides a great number of experiments on open data and provide as benchmark. It also provides ready-to-go implementations of many SOTA (State of the Art) CTR prediction models in the literature, which is super convenient to use.
Import libraries required
import pandas as pd
import numpy as np
import pandas_profiling
import random
import tensorflow as tf
from sklearn.preprocessing import LabelEncoder
from deepctr.models import DeepFM
from deepctr.feature_column import SparseFeat, DenseFeat, get_feature_names
# For reproducible experiments
random.seed(200)
np.random.seed(200)
%load_ext line_profiler
Frappe dataset
We use the Frappe dataset, which is a small dataset for quick experimentation and testing of different models. It has been used for context-aware app recommendation, which contains 96,203 app usage logs of users under different contexts. The eight context variables are all categorical, including weather, city, daytime and so on.data = pd.read_csv('datasets/Frappe/Mobile_Frappe/frappe/frappe.csv', sep='\t')
data['target'] = 1
num_users = len(data["user"].unique())
items = data["item"].unique()
num_items = len(items)
print(f'distinct users: {num_users}')
print(f'distinct items: {num_items}')
print(f'sparsity: {len(data)/(num_users*num_items)}')
sparse_features = [
'user',
'item',
'daytime',
'weekday',
'isweekend',
'homework',
'cost',
'weather',
'country',
'city'
]
# Shuffle
data = data.sample(frac=1,random_state=0)
# pandas_profiling.ProfileReport(data)
data.loc[23225]
Output:
distinct users: 957
distinct items: 4082
sparsity: 0.024626555814783357
Data preparation
Here we follow the benchmark setting from DeepCTR documentation page. After one-hot encoding of the features, we obtain 5,382 features. As all logs should be considered as positive sample when making CTR prediction, we construct two negative instances for each log through randomly replacing the item variable with other item. The data is randomly split into training data (70%), validation data (20%), test data (10%) before constructing negative instances.
def get_gt(user):
""" Get ground truth of user """
return data[data['user']==user]['item'].values
##### 70%, 20%, 10% split
num_train, num_val = int(len(data)*0.7), int(len(data)*0.2)
num_test = len(data) - num_train - num_val
print(num_train, num_val, num_test)
train_data = data.iloc[0:num_train]
val_data = data.iloc[num_train:num_train+num_val]
test_data = data.iloc[num_train+num_val:]
def neg_sampling(d, n):
""" Get n negative samples for each pos samples """
neg_samples = []
def _get_neg_list():
neg_list = []
for r in d.iterrows():
user = r[1]['user']
num_sampled = 0
while num_sampled != n:
sampled_items = np.random.choice(items, size=n, replace=False)
# print(sampled_items)
num_sampled = sum([x not in get_gt(user) for x in sampled_items])
for i in sampled_items:
neg_ex = r[1].copy()
neg_ex['item'] = i
neg_ex['target'] = 0
# d = d.append(neg_ex)
neg_list.append(neg_ex)
# break
return neg_list
neg_samples += (_get_neg_list())
neg_df = pd.concat(neg_samples, axis=1).transpose()
d = d.append(neg_df)
d = d.sample(frac=1,random_state=0)
return d
num_neg = 2
train_data = neg_sampling(train_data, num_neg)
print('train neg sampling is finished')
val_data = neg_sampling(val_data, num_neg)
print('val neg sampling is finished')
test_data = neg_sampling(test_data, num_neg)
test_data
Encoding categorical values
for f in sparse_features:
print(f)
lbe = LabelEncoder()
data[f] = lbe.fit_transform(data[f])
train_data[f] = lbe.transform(train_data[f])
val_data[f] = lbe.transform(val_data[f])
test_data[f] = lbe.transform(test_data[f])
Prepare for the model input format with respect to training, validation, and testing data.
fixlen_feature_columns = [
SparseFeat(feat, vocabulary_size=data[feat].max()+1, embedding_dim=4) \
for feat in sparse_features
]
feature_names = get_feature_names(fixlen_feature_columns)
print(feature_names)
train_model_input = {
name:train_data[name].astype('float32').values \
for name in feature_names
}
val_model_input = {
name:val_data[name].astype('float32').values \
for name in feature_names
}
test_model_input = {
name:test_data[name].astype('float32').values \
for name in feature_names
}
Using DeepFM without Early Stopping strategy
Here we use the DeepFM (Deep Factorization Machines) model for training with the prepared dataset mentioned above. We don't use any early stopping here, and trian 40 epochs.
model = DeepFM([], fixlen_feature_columns, task='binary')
model.compile('adam',
'binary_crossentropy',
metrics='binary_crossentropy')
# Early stopping
earlystopping_callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5)
# Best model checkpoint
checkpoint_filepath = '/tmp/checkpoint'
model_checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_filepath,
save_weights_only=True,
monitor='val_loss',
mode='min',
save_best_only=True)
history = model.fit(train_model_input,
train_data['target'].astype('float32').values,
batch_size=256,
epochs=40,
verbose=2,
validation_data=(val_model_input,
val_data['target'].astype('float32').values),
# callbacks=[
# earlystopping_callback,
# model_checkpoint_callback
# ]
)
Check the performance on the test set
from sklearn.metrics import log_loss, roc_auc_score, accuracy_score
pred_ans = model.predict(test_model_input, batch_size=256)
print('log loss', log_loss(test_data['target'].astype('float32').values, pred_ans, eps=1e-7))
print('auc', roc_auc_score(test_data['target'].astype('float32').values, pred_ans))
During the 40 epochs, the log loss start decreasing and increasing again, ending up to the output below. Output:
log loss 0.2621498030364543
auc 0.9733840688051586
Using DeepFM with Early Stopping strategy
model = DeepFM([], fixlen_feature_columns, task='binary')
model.compile('adam',
'binary_crossentropy',
metrics='binary_crossentropy')
# Early stopping
earlystopping_callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5)
# Best model checkpoint
checkpoint_filepath = '/tmp/checkpoint'
model_checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_filepath,
save_weights_only=True,
monitor='val_loss',
mode='min',
save_best_only=True)
history = model.fit(train_model_input,
train_data['target'].astype('float32').values,
batch_size=256,
epochs=40,
verbose=2,
validation_data=(val_model_input,
val_data['target'].astype('float32').values),
callbacks=[
earlystopping_callback,
model_checkpoint_callback
]
)
This time we apply an early stopping strategy with a patience of 5 steps. That is, if there is no decrease in the 5 concecutive steps during trating, the training process will be stopped early before 40 epochs. Output:
log loss 0.19507425489108865
auc 0.9737500710457143
We can observe that our log loss and AUC have been improved with early stopping strategy compared to that without using early stopping.