Recommendation Systems by Matrix Factorization and Collaborative Filtering
We're all using any platform which has some items and these items are made for us, for users. In general, we use some of them and get suggestions for other items as well, but what suggestions are they, how do they even know what to suggest, which type of items each user likes and what is the probability, that they are going to like something platform suggests?
Recommendation systems are created just for this. Let's take Netflix as an example. It has a lot of movies and users watch them on their platform. When each user watches any movie, Netflix suggests other movies relevant to the user or the item, which the user has watched.
Nowadays almost all platforms like Facebook, Youtube, Twitter, Pinterest, Amazon, etc. use recommendation systems, to suggest relevant content to their users. It gives good growth in terms of revenue or other business metrics, which is important to the company.
Now let's formulate a recommendation system and show how it's going to suggest content relevant to users.
Recommendation Systems, Problem Formulation
Let's assume we have multiple users and movies. Each user has watched any movie and liked it. It means different movies can be liked by the same person and different users can like the same movie. It kind of like a matrix for users and items. This matrix will contain users and movies as rows and columns and each (i,j) position of the matrix will correspond to the rating of the movie or user has liked it or not. Something like this.
It is a sparse matrix. If the user has not watched the movie, then the corresponding place of this matrix will be undefined. Our goal is to find a model, which will fill all missing places in the matrix we showed above.
If we take a look at the matrix, we will realize, It's nothing more than solving a linear or a logistic regression problem for each user, where inputs are the description vectors of the movies. Then we will have this Loss function.
By minimizing this cost function, we are going to find user representations, embeddings. After that, we can just fill all missed places by the production of the corresponding user and movie representation vectors. If we combine all feature vectors of movies and all vectors of user representations, we are going to have 2 matrices. One matrix for movies and the other for users. Then the matrix above will become something like this.
Take a look at the vertical and horizontal matrices. They are the matrices we are talking about. The vertical one represents users and the other one represents the movies.
By just multiplying these 2 matrices, we are going to have predictions for users and movies. And for every user, we can recommend a movie, if the corresponding prediction is above the threshold (for example 0.5). This is how the matrix factorization works.
By talking about Matrix Factorization, we user vector representations of the movies, but where have we got it from? Each value of the vector represents a genre of the movie e.g is it an action movie or not? Is it a romantic movie etc.?. Who has spent a lot of time to describe each movie by their feature vectors? We have assumed that we've got these vectors and we use them to recommend any other to users.
Now let's assume we don't have these x vectors but have user representations. How can have a model for recommendation systems? It's just a simple switch: If we have user embeddings, then let's switch parameters with variables of the loss function and solve the same issue.
This time lets take a look at the review rating matrix, done by each user for certain items. What should x1 and x2 be so we can have the rating values for each user when we multiply user vector with the movie feature vector.
Let's define the Loss function which should be minimized not by user representation, but by the movie feature vectors. It means knowing each user and their test, we can describe each movie by their genres, cause we don't know anything about the movies and their representations. Here is how the Loss function will look like.
By using gradient descent, we can solve this optimization problem and as a result, we will have movie embeddings. What we've got know? Having user embeddings, we can learn movie feature vectors, and having movie feature vectors, we can re-learn user representations. It means we can randomly initialize user vectors, learn x vectors after that relearn user embeddings until we will get proper predictions for the items, but who has said we can't do this at the same time. Let's learn all parameters for users and movies simultaneously.
As a result, we are going to have 2 matrices. One for a user, and the other one for items and by computing their product, we are trying to approximate the rating matrix we are given.
This is what we need to get using collaborative filtering. It lets us get user and item embeddings, without knowing anything about them, except their ratings for each item. Putting all together, we will have
We can use this matrix factorization in production until we get a new user, who has not rated any movie and we don't know anything about him, cause new training will not give us information, we don't have any items. To use the recommendation system for this user as well, we just need to compute the mean ratings for each movie and use this mean rating vector as a prediction.
Content-Based Recommendation System
Now we've got user and item representations by using Collaborative Filtering. Let's see what happens when we have a user, who has rated just one movie. By relearning representations matrices, we will get an embedding for the user, but it won't tell us more about him/her. We've got just 1 rating for this user and we can't say he/she likes action, romantic, or movies in another genre. What to do now?
As you remember, we've got a matrix for movies, in each row, it has a feature vector for every movie. We have trained collaborative filtering and every representation for movies or for users contains semantic information and 2 movie vectors are closer if they contain closer contents or they are in the same genre. They look like each other. Then let's recommend the movies, which feature representation vectors are closer to the movie vector, which the user has given a high rating. In other words.
It means to find all the movies, which are similar to the movie user has rated. You can take the top 5 or 10 and recommend it to the user.
Now you know how to build a recommendation system, how to recommend an item to users based on their content or on their ratings, but keep in mind, feature representations of the items or users are changing when you've got new interactions, it means you need to retrain your model in a certain period of time to update all embeddings.