In this post, we look at how to build ranking models. Differing from the retrieval stage, we will keep ratings (explicit feedback) in this time. And as we don’t have efficiency constraints like in the retrieval stage as the ranking model normally will work on retrieved items only from the retrieval stage, we can use a deeper model for ranking.
Content
- Load the Movielens 100k dataset
- Ranking model
- Movielens model
- Compile and training
- Getting ranked list of recommended items
Load the Movielens 100k dataset
from typing import Dict, Text # for typing hint
import pprint
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds
import tensorflow_recommenders as tfrs
print(tf.__version__)
print(tfrs.__version__)
Output:
2.9.1
v0.7.0
Let's load the MovieLens 100k dataset, but this time we will also use the ratings (also called explicit feedback) which is different from the retrieval model in the previous post.
ratings = tfds.load('movielens/100k-ratings', split='train')
ratings = ratings.map(lambda x: {
'movie_title': x['movie_title'],
'user_id': x['user_id'],
'user_rating': x['user_rating']
})
tf.random.set_seed(42)
shuffled = ratings.shuffle(100_000,
seed=42,
reshuffle_each_iteration=False)
train = shuffled.take(80_000)
test = shuffled.skip(20_000).take(20_000)
movie_titles = ratings.batch(1_000_000) \
.map(lambda x: x['movie_title'])
user_ids = ratings.batch(1_000_000) \
.map(lambda x: x['user_id'])
unique_movie_titles = np.unique(np.concatenate(list(movie_titles)))
unique_user_ids = np.unique(np.concatenate(list(user_ids)))
Ranking Model
Here we define the ranking model with deeper neural networks compared to the retrieval model.
class RankingModel(tf.keras.Model):
def __init__(self):
super().__init__()
embedding_dimension = 32
# Compute embeddings for users
self.user_embedding = tf.keras.Sequential([
tf.keras.layers.StringLookup(
vocabulary=unique_user_ids, mask_token=None),
tf.keras.layers.Embedding(
len(unique_user_ids)+1, embedding_dimension)
])
# Compute embeddings for movies
self.movie_embedding = tf.keras.Sequential([
tf.keras.layers.StringLookup(
vocabulary=unique_movie_titles, mask_token=None),
tf.keras.layers.Embedding(
len(unique_movie_titles)+1, embedding_dimension)
])
# Rating model for predict ratings
self.ratings = tf.keras.Sequential([
# Multiple dense layers
tf.keras.layers.Dense(256, activation='relu'),
tf.keras.layers.Dense(64, activation='relu'),
# Prediction layer
tf.keras.layers.Dense(1)
])
def call(self, inputs):
user_id, movie_titles = inputs
user_embedding = self.user_embedding(user_id)
movie_embedding = self.movie_embedding(movie_titles)
return self.ratings(
tf.concat([user_embedding, movie_embedding], axis=1))
We can test the defined and untrained ranking model as it is for getting the prediction score given a user id (42) and a movie (One Flew Over the Cuckoo's Nest (1975)).
RankingModel()((["42"], ["One Flew Over the Cuckoo's Nest (1975)"])).numpy()
Output:
array([[0.03740937]], dtype=float32)
Movielens Model
Now we can move on to define the MovielensModel using the defined ranking model, defining task and the compute_loss() function. As you might expect, we are using the mean squared error (MSE) as our loss below, and RMSE (Root MSE) as our metrics.
class MovielensModel(tfrs.models.Model):
def __init__(self):
super().__init__()
# Setup models in the init method
self.ranking_model = RankingModel()
self.task = tfrs.tasks.Ranking(
loss = tf.keras.losses.MeanSquaredError(),
metrics = [tf.keras.metrics.RootMeanSquaredError()]
)
def compute_loss(self, features: Dict[Text, tf.Tensor],
training=False) -> tf.Tensor:
# Implement the compute_loss method
# taking into the raw features
# returning the loss
rating_predictions = self.ranking_model(
(features['user_id'], features['movie_title']))
# The task computes the loss and the metrics
return self.task(labels=features['user_rating'],
predictions=rating_predictions)
Compile and Training
Compile and fit using the training set.
model = MovielensModel()
model.compile(optimizer=tf.keras.optimizers.Adagrad(learning_rate=0.1))
cached_train = train.shuffle(100_000).batch(8192).cache()
cached_test = test.batch(4096).cache()
model.fit(cached_train, epochs=3)
Getting ranked list of recommended items
Finally, we can get a ranked list of recommended items based on predicted scores of items using our trained model. In practice, we will sort candidates only from the retrieval stage.
test_ratings = {}
for m in test.take(5):
test_ratings[m['movie_title'].numpy()] = \
RankingModel()((['42'], [m['movie_title']]))
for m in sorted(test_ratings, key=test_ratings.get, reverse=True):
print(m)
Output:
b'Man Without a Face, The (1993)'
b'Maverick (1994)'
b'Unstrung Heroes (1995)'
b'Shining, The (1980)'
b'Free Willy (1993)'
More TFRS tutorials can be found at https://parklize.blogspot.com/p/tensorflow.html