Tensorflow Recommenders 5 - Use DCN (Deep Cross Network) as the ranking model



In the previous tutorials, we have looked at retrieval model, ranking modelleveraging contextual features, and training both retrieval and ranking models in a multi-task learning scheme

In this post, we look at a more recent state-of-the-art ranking model - Deep Cross Network (DCN) - as our ranking model. The other settings are exactly the same as the previous tutorial for training both retrieval and ranking models in a multi-task learning scheme, except the introduction of DCN, and updated Movielens model with that in this tutorial.


Content

  • Deep Cross Network (DCN)
  • Prepare dataset
  • User model
  • Movie model
  • Movielens model
  • Train and evaluate the model


Deep Cross Network (DCN)

Cross features such as whether banana and milk are bought together play important role in recommender systems due to the challenges such as the feature space normally large and sparse, and therefore manually engineering features are challenging, and also, traditional neural networks are not work well on this by implicitly modeling feature interactions.

DCN was first introduced in 2017 with its updated version DCN V2 recently. The essense of DCN is the cross network:

$$x_{i+1} = x_0 \odot (W \times x_i + b) + x_i$$

where $x_0$ is the input features (an embedding layer) and $x_{i+1}$ is the $i+1$-th layer output of the cross net.

The cross network can be multiple layers, and can also be stacked on top of an embedding layer in parallel with a DNN. Alternatively, it can also stack on top of the embedding layer and followed by a DNN.

In the MovielensModel we use later on, we will use the second design where the DNN comes after the DCN with the keras Functional API.

Prepare dataset

First, let's import packages that we need, and print out Tensorflow and TFRS versions for reference.

from typing import Dict, Text # for typing hint

import pprint
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds
import tensorflow_recommenders as tfrs

print(tf.__version__)
print(tfrs.__version__)
Output:

2.9.1
v0.7.0

Load the Movielens 100k dataset.

# Load the movielens dataset
ratings = tfds.load('movielens/100k-ratings', split='train')
ratings = ratings.map(lambda x: {
    'movie_title': x['movie_title'],
    'user_id': x['user_id'],
    'user_rating': x['user_rating'],
    'timestamp': x['timestamp']
})

movies = tfds.load('movielens/100k-movies', split='train')
movies = movies.map(lambda x: x['movie_title'])

timestamps = np.concatenate(list(
    ratings.map(lambda x: x['timestamp']).batch(100)))
max_timestamp = timestamps.max()
min_timestamp = timestamps.min()

timestamp_buckets = np.linspace(
    min_timestamp, max_timestamp, num=1000)

unique_movie_titles = np.unique(np.concatenate(list(movies.batch(1000))))
unique_user_ids = np.unique(np.concatenate(list(ratings.batch(1_000).map(
    lambda x: x['user_id']))))

print(len(unique_movie_titles), len(unique_user_ids))
Output:

1664 943

User Model

We use UserModel as the same one defined in the previous tutorial in which we can optionally choose to use the timestamps as part of our user model.

class UserModel(tf.keras.Model):
    # User embedding will be user id + ts + normalized ts embeddings

    def __init__(self, use_timestamps):
        super().__init__()
        
        self._use_timestamps = use_timestamps
        
        # User id embedding
        self.user_embedding = tf.keras.Sequential([
            tf.keras.layers.StringLookup(
                vocabulary=unique_user_ids,
                mask_token=None),
            tf.keras.layers.Embedding(len(unique_user_ids)+1, 32)
        ])
        
        if use_timestamps:
            # Use timestamp
            self.timestamp_embedding = tf.keras.Sequential([
                tf.keras.layers.Discretization(timestamp_buckets.tolist()),
                tf.keras.layers.Embedding(len(timestamp_buckets)+1, 32)
            ])

            # Normalized timestamp
            self.normalized_timestamp = tf.keras.layers.Normalization(axis=None)
            self.normalized_timestamp.adapt(timestamps)
        
    
    def call(self, inputs):
        if not self._use_timestamps:
            return self.user_embedding(inputs['user_id'])
        
        return tf.concat([
            self.user_embedding(inputs['user_id']),
            self.timestamp_embedding(inputs['timestamp']),
            tf.reshape(self.normalized_timestamp(inputs['timestamp']),(-1,1))
        ], axis=1) 

Movie Model

Similarly, we use the MovieModel as the same one we used in the previous tutorial in which we optionally use the title text as part of the movie embeddings.

class MovieModel(tf.keras.Model):
    # Movie embedding: title text + id 
    
    def __init__(self, use_title_text):
        super().__init__()
        max_tokens = 10_000
        
        self._use_title_text = use_title_text
        
        self.title_embedding = tf.keras.Sequential([
            tf.keras.layers.StringLookup(
                vocabulary=unique_movie_titles, mask_token=None),
            tf.keras.layers.Embedding(len(unique_movie_titles)+1, 32)
        ])
        
        if use_title_text:
            self.title_vectorizer = tf.keras.layers.TextVectorization(
                max_tokens=max_tokens)
            self.title_vectorizer.adapt(movies)

            self.title_text_embedding = tf.keras.Sequential([
                self.title_vectorizer,
                tf.keras.layers.Embedding(max_tokens, 32, mask_zero=True),
                tf.keras.layers.GlobalAveragePooling1D()
            ])
        
        
    def call(self, inputs):
        if not self._use_title_text:
            return self.title_embedding(inputs)
        
        return tf.concat([
            self.title_embedding(inputs),
            self.title_text_embedding(inputs)
        ], axis=1)

Movielens Model

So far, the UserModel and MovieModel are exactly the same as in the previous tutorial, and nothing new. Now we move on to define our new MovielensModel which allows us to train both the retrieval and ranking tasks together in a multi-task training scheme. 

We can see that two tasks are defined in the __init__() method, and in the compute_loss() we are calculating the loss as the total of both tasks with equal contribution (with their corresponding weights self.rating_weight and self.retrieval_weight as 0.5 respectively).

class MovielensModel(tfrs.models.Model):
    
    def __init__(self, use_timestamps=True, use_title_text=True):
        super().__init__()
        
        self.rating_weight = 0.5
        self.retrieval_weight = 0.5
        
        # User and Movie models
        self.user_model = tf.keras.Sequential([
            UserModel(use_timestamps),
            tf.keras.layers.Dense(32)
        ])
        self.movie_model = tf.keras.Sequential([
            MovieModel(use_title_text),
            tf.keras.layers.Dense(32)
        ])
        
	# Ranking model with DCN using keras Functional API
        x0 = tf.keras.Input(shape=(64,))
        x1 = tfrs.layers.dcn.Cross()(x0, x0)
        x2 = tfrs.layers.dcn.Cross()(x0, x1)
        d1 = tf.keras.layers.Dense(256, activation='relu')(x2)
        d2 = tf.keras.layers.Dense(64, activation='relu')(d1)
        output = tf.keras.layers.Dense(1)(d2)
        self.rating_model = tf.keras.Model(inputs=x0, outputs=output)
    
        # Multi-tasks
        self.rating_task: tf.keras.layers.Layer = tfrs.tasks.Ranking(
            loss=tf.keras.losses.MeanSquaredError(),
            metrics=[tf.keras.metrics.RootMeanSquaredError()]
        )
        self.retrieval_task: tf.keras.layers.Layer = tfrs.tasks.Retrieval(
            metrics=tfrs.metrics.FactorizedTopK(
                candidates=movies.batch(128).map(self.movie_model)
            )
        )
            
    def call(self, features: Dict[Text, tf.Tensor]) -> tf.Tensor:
        user_embeddings = self.user_model({
            'user_id': features['user_id'],
            'timestamp': features['timestamp']
        })
        movie_embeddings = self.movie_model(
            features['movie_title']
        )
        return (
            user_embeddings, 
            movie_embeddings,
            self.rating_model(tf.concat([
                user_embeddings,
                movie_embeddings
            ], axis=1))
        )
        
    def compute_loss(self, features: Dict[Text, tf.Tensor], training=False) -> tf.Tensor:
        user_embeddings, movie_embeddings, rating_predictions = self.call(features)
        # Retrieval loss
        retrieval_loss = self.retrieval_task(user_embeddings, movie_embeddings)
        # Rating loss
        rating_loss = self.rating_task(
            labels=features['user_rating'],
            predictions=rating_predictions
        )
        
        # Combine two losses with hyper-parameters (to be tuned)
        return (self.rating_weight * rating_loss \
                + self.retrieval_weight * retrieval_loss)





Train and evaluate the model

We use 80% of the dataset for training, and the rest (20%) for testing.

# -------------------------------
# Experiment
# -------------------------------
# Prepare data
tf.random.set_seed(7)
shuffled = ratings.shuffle(100_000, seed=7,
                reshuffle_each_iteration=False)

train = shuffled.take(80_000)
test = shuffled.skip(80_000).take(20_000)

cached_train = train.shuffle(100_000).batch(2048).cache()
cached_test = test.batch(4096).cache()

model = MovielensModel(use_timestamps=True, use_title_text=True)
model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))

model.fit(cached_train, epochs=3)

train_acc = model.evaluate(
    cached_train, return_dict=True)['factorized_top_k/top_100_categorical_accuracy']
test_acc = model.evaluate(
    cached_test, return_dict=True)['factorized_top_k/top_100_categorical_accuracy']

print(f'Top-100 accuracy (train): {train_acc:.2f}')
print(f'Top-100 accuracy (test): {test_acc:.2f}')
Output:

Top-100 accuracy (train): 0.34
Top-100 accuracy (test): 0.25
Evaluation results of all metrics on the test set.

model.evaluate(cached_test, return_dict=True)
Output:

{'root_mean_squared_error': 1.0266695022583008,
 'factorized_top_k/top_1_categorical_accuracy': 0.001449999981559813,
 'factorized_top_k/top_5_categorical_accuracy': 0.010900000110268593,
 'factorized_top_k/top_10_categorical_accuracy': 0.025450000539422035,
 'factorized_top_k/top_50_categorical_accuracy': 0.13050000369548798,
 'factorized_top_k/top_100_categorical_accuracy': 0.2531999945640564,
 'loss': 13955.5576171875,
 'regularization_loss': 0,
 'total_loss': 13955.5576171875}
As a quick comparision, the results on the test set without DCN in the previous tutorial was as follows, in which the RMSE (Root Mean Squared Error) has been decreased using DCN.

{'root_mean_squared_error': 1.0596544742584229,
 'factorized_top_k/top_1_categorical_accuracy': 0.00139999995008111,
 'factorized_top_k/top_5_categorical_accuracy': 0.011549999937415123,
 'factorized_top_k/top_10_categorical_accuracy': 0.025450000539422035,
 'factorized_top_k/top_50_categorical_accuracy': 0.1335500031709671,
 'factorized_top_k/top_100_categorical_accuracy': 0.24815000593662262,
 'loss': 13961.8505859375,
 'regularization_loss': 0,
 'total_loss': 13961.8505859375}

We can get the 5 movies in the test set for user 42 with a certain timestamp, and sort them based on their scores in a descending order.

test_ratings = {}
for m in test.take(5):
#     print(m['movie_title'].numpy())
    _, _, test_ratings[m['movie_title'].numpy()] = \
        model(
            {'user_id':np.array(['42']), 
             'timestamp':np.array([892839492]), 
             'movie_title': np.array([m['movie_title'].numpy()])
            }
        )
    
for m in sorted(test_ratings, key=test_ratings.get, reverse=True):
    print(m)
Output:

b'Chasing Amy (1997)'
b'Top Gun (1986)'
b'Twister (1996)'
b'Event Horizon (1997)'
b'Batman Forever (1995)'

We can look into the scores of those movies in test_ratings

for r in test_ratings: 
    print(r, test_ratings[r].numpy()[0][0])
Output:

b'Top Gun (1986)' 3.278518
b'Chasing Amy (1997)' 3.3716574
b'Batman Forever (1995)' 2.8443925
b'Twister (1996)' 3.1891446
b'Event Horizon (1997)' 3.0244293

Finally, let's change the timestamp of user 42 for those movies to check if the predicted scores/ratings change. When we change the timestamp from 892839492 to 879024327, we can see the ratings change accordingly as below: Output:

b'Top Gun (1986)' 3.3720827
b'Chasing Amy (1997)' 3.408927
b'Batman Forever (1995)' 3.0349593
b'Twister (1996)' 3.162705
b'Event Horizon (1997)' 3.058013

More TFRS tutorials can be found at https://parklize.blogspot.com/p/tensorflow.html

References