Tensorflow Recommenders - How to rank candidate items?



In the previous posts, we have discussed (1) how to retrieve candidate items and (2) how to use contextual features for building models. As mentioned in the first post, the Retrieval component shrinks item candidates from O(thousands of millions) to O(thousands), and the Ranking component trims the candidates from O(thousands) to O(hundreds).

In this post, we look at how to build ranking models. Differing from the retrieval stage, we will keep ratings (explicit feedback) in this time. And as we don’t have efficiency constraints like in the retrieval stage as the ranking model normally will work on retrieved items only from the retrieval stage, we can use a deeper model for ranking.

Content
  • Load the Movielens 100k dataset
  • Ranking model
  • Movielens model
  • Compile and training
  • Getting ranked list of recommended items


Load the Movielens 100k dataset



from typing import Dict, Text # for typing hint

import pprint
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds
import tensorflow_recommenders as tfrs

print(tf.__version__)
print(tfrs.__version__)
Output:

2.9.1
v0.7.0
Let's load the MovieLens 100k dataset, but this time we will also use the ratings (also called explicit feedback) which is different from the retrieval model in the previous post.

ratings = tfds.load('movielens/100k-ratings', split='train')
ratings = ratings.map(lambda x: {
    'movie_title': x['movie_title'],
    'user_id': x['user_id'],
    'user_rating': x['user_rating']
})

tf.random.set_seed(42)
shuffled = ratings.shuffle(100_000, 
                           seed=42, 
                           reshuffle_each_iteration=False)

train = shuffled.take(80_000)
test = shuffled.skip(20_000).take(20_000)

movie_titles = ratings.batch(1_000_000) \
                .map(lambda x: x['movie_title'])
user_ids = ratings.batch(1_000_000) \
                .map(lambda x: x['user_id'])
    
unique_movie_titles = np.unique(np.concatenate(list(movie_titles)))
unique_user_ids = np.unique(np.concatenate(list(user_ids)))


Ranking Model

Here we define the ranking model with deeper neural networks compared to the retrieval model.

class RankingModel(tf.keras.Model):
    
    def __init__(self):
        super().__init__()
        embedding_dimension = 32
        
        # Compute embeddings for users
        self.user_embedding = tf.keras.Sequential([
            tf.keras.layers.StringLookup(
                vocabulary=unique_user_ids, mask_token=None),
            tf.keras.layers.Embedding(
                len(unique_user_ids)+1, embedding_dimension)
        ])
        
        # Compute embeddings for movies
        self.movie_embedding = tf.keras.Sequential([
            tf.keras.layers.StringLookup(
                vocabulary=unique_movie_titles, mask_token=None),
            tf.keras.layers.Embedding(
                len(unique_movie_titles)+1, embedding_dimension)
        ])
        
        # Rating model for predict ratings
        self.ratings = tf.keras.Sequential([
            # Multiple dense layers
            tf.keras.layers.Dense(256, activation='relu'),
            tf.keras.layers.Dense(64, activation='relu'),
            # Prediction layer
            tf.keras.layers.Dense(1)
        ])
        
        
    def call(self, inputs):
        user_id, movie_titles = inputs
        
        user_embedding = self.user_embedding(user_id)
        movie_embedding = self.movie_embedding(movie_titles)
        
        return self.ratings(
            tf.concat([user_embedding, movie_embedding], axis=1))
We can test the defined and untrained ranking model as it is for getting the prediction score given a user id (42) and a movie (One Flew Over the Cuckoo's Nest (1975)).

RankingModel()((["42"], ["One Flew Over the Cuckoo's Nest (1975)"])).numpy()
Output:

array([[0.03740937]], dtype=float32)





Movielens Model

Now we can move on to define the MovielensModel using the defined ranking model, defining task and the compute_loss() function. As you might expect, we are using the mean squared error (MSE) as our loss below, and RMSE (Root MSE) as our metrics.

class MovielensModel(tfrs.models.Model):
    
    def __init__(self):
        super().__init__()
        # Setup models in the init method
        self.ranking_model = RankingModel()
        self.task = tfrs.tasks.Ranking(
            loss = tf.keras.losses.MeanSquaredError(),
            metrics = [tf.keras.metrics.RootMeanSquaredError()]
        )
        
    def compute_loss(self, features: Dict[Text, tf.Tensor],
                    training=False) -> tf.Tensor:
        # Implement the compute_loss method
        # taking into the raw features
        # returning the loss
        rating_predictions = self.ranking_model(
            (features['user_id'], features['movie_title']))
        
        # The task computes the loss and the metrics
        return self.task(labels=features['user_rating'], 
                        predictions=rating_predictions)


Compile and Training

Compile and fit using the training set.

model = MovielensModel()
model.compile(optimizer=tf.keras.optimizers.Adagrad(learning_rate=0.1))

cached_train = train.shuffle(100_000).batch(8192).cache()
cached_test = test.batch(4096).cache()

model.fit(cached_train, epochs=3)

Getting ranked list of recommended items

Finally, we can get a ranked list of recommended items based on predicted scores of items using our trained model. In practice, we will sort candidates only from the retrieval stage.

test_ratings = {}
for m in test.take(5):
    test_ratings[m['movie_title'].numpy()] = \
        RankingModel()((['42'], [m['movie_title']]))
    
for m in sorted(test_ratings, key=test_ratings.get, reverse=True):
    print(m)
Output:

b'Man Without a Face, The (1993)'
b'Maverick (1994)'
b'Unstrung Heroes (1995)'
b'Shining, The (1980)'
b'Free Willy (1993)'

More TFRS tutorials can be found at https://parklize.blogspot.com/p/tensorflow.html

References