Collaborative Filtering in Recommendation Systems

Written by Kunal Deshmukh on March 11, 2018

Collaborative filtering in recommendation systems


Recommendation systems describe the techniques used to predict ratings and opinions in which a user may have a propensity to express. As an example, think of the recommended content shown to users while they navigate Netflix, or perhaps the products Amazon recommends to consumers based on prior purchases.
Recommendation systems are ubiquitous in the digital world. Users encounter these systems when seeing "Suggested Friends" displayed on Facebook, "Suggested Videos" on YouTube, and "Other Jobs for you" on LinkedIn.


There are two approaches through which recommendation system are designed:

1.   Content-based filtering

2.   Collaborative filtering

Techniques which selectively make use of both approaches are called Hybrid recommendation systems.



There are some advantages and disadvantages in both the systems :






Collaborative approach

●      Other user's scores are used

●      No deterministic result since chance is involved in the system

●      Needs more data

●      Problem with new users and new products.

Content-based approach

●      Works with lesser data.

●      Provides results based on the activities of the specific user.

●      Over-specialization




1.     A shallow dive in Content-Based filtering


Content-based recommendation systems take into account the data provided by the user both directly and indirectly. For example, age can be used to determine classes of products or items reviewed and bought by the user.
Since content-based recommendations are based on usage patterns of a user, a user is often able to speculate why he/she is being shown a recommendation.
This type of recommendation system relies on characteristics of the object. New content can be quickly recommended to the user. E.g. if the user has a history of watching all action movies, a newly released action movie is recommended by this system.
However, this system does not take into account behavior/data about other users in the system hence, if a particular action movie fetches very low rating / negative recommendations by other users, It will still be recommended to the user.


TF-IDF ( Term Frequency - Inverse Document Frequency) and cosine similarity between space vector model techniques are used in content-based filtering.



This technique is used in information retrieval and text mining. TF-IDF, as the name suggests has two terms. TF calculates normalized frequency at which a given term appears in the document. And IDF calculates importance of a term in general. Eg. terms 'recommendation', 'system', 'movie' conveys more information about the document than terms 'the', 'and', 'are' etc.

TF: It measures the frequency of a term in the document. Since the size of a document may vary, It will be futile to use simple count. Hence this count is normalized.

TF(w) = (Number of times term w appears in a document) / (Total number of terms in the document).

IDF: It measures the overall importance of a given term. Since commonly used terms like 'is', 'the', 'are' doesn't usually provide information about the document, IDF for these terms is low.  IDF is calculated as :

IDF(t) = log_e(Total number of documents / Number of documents with term t in it).

In content-based filtering technique, TF-IDF can be useful in determining products which are similar to a given product. Since this technique is keyword based, it is more useful in the area with high textual data. Eg. book recommendation. In these areas generally, cosine similarity is used to calculate

TF-IDF method is used to calculate a vector in feature space.

Cosine Similarity Matrix :

As the name promises, this method calculates cosine value in vector calculation. This method provides the estimation of similarity between two objects as a measure of the angle between two vectors.

Cosine similarity can be calculated by dot product between two vectors.



Cosine similarity is a measure of similarity between two vectors.



It is evident in Fig 1, the similarity between two vectors in 1(a) is the highest i.e. ~ 1. While cosine similarity in fig 1(b) would be around -1. Cosine similarity of vectors, orthogonal to each other will be 0.

TfidfVectorizer module of scikit learn can be used to calculate tf-idf matrix.And Cosine_similarity method in sklearn.metrics.pairwise module can be used to calculate cosine similarity of a matrix.


2. Deep drive in collaborative filtering

Developers at Xerox first use collaborative filtering in document retrieval system[5]. PageRank algorithm used by Google is an example of document retrieval system using collaborative filtering. Collaborative filtering is used to tailor recommendations based on the behavior of persons with similar interests. Sometimes it can be based on an item bought by the user. Since this method does not require a person himself to always contribute to a data store, and voids can be filled by the actions of other persons/ actions by the same person on other items.  Few approaches for User and Item-based collaborative recommendation techniques are as follow:

1.   Neighborhood-based approach

2.   Item-based approach

3.   Classification approach

4.   Neural Collaborative Filtering

Neighborhood-based approach

Let's refer the user for which rating is to be predicted as 'active user'. In the approach, users are selected based on their similarity to the active user. Correlation between two users can be found out using the formula below:

Let the ratings to a product by person 'A' and 'B' be 'Ak' and 'Bk' . And mean values are represented as A,B:


Persons with higher correlation are considered as neighbors.


Let Vi,j be the vote of user i on item j. Mean vote for i is:


The vote of an 'active user' a is :


Here, k is a constant used for normalization. w(a,i) is weights of 'n' similar users.


The simplest way to calculate w(a,i) is to using KNN algorithm where each point is represented by a user in n-dimensional space. However, there are more efficient algorithms which can be used to calculate weight w.

w(a,i) is calculated by pearson correlation coefficient :




The most serious problem collaborative filtering techniques face in a real world is too few ratings by the users. Hence, In the real-world dataset, user vs items matrix may have some null values.

w(a,i) is calculated in such cases as:



Where  fi = log(n/ni) is inverse user frequency.

here, n is a number of users. Nj is a number of users voting for item j. is inverse user frequency. 

One approach for to reduce the effect of null values is by dimensionality reduction. Billsus and pazzani[7] suggest the use of the non-correlation based approach to make predictions using neural networks.


Item - to - Item approach

Instead of using ratings given by the users to calculate neighborhood, the ratings are used to find similarity between items. The same pearson coefficient can be used for this approach.


Classification approach

In classification approach, items are represented as vectors and they are classified and suggested to the user based on the ratings provided by the active user to each class of items. With this approach, collaborative filtering is visualized as classic classification approach. Classification techniques such as Support Vector Machines or Bayesian classifier can be used. Random Forest proves useful in case of the unbalanced dataset.


Neural Collaborative Filtering

Neural networks are being used increasingly for collaborative filtering. Xiangnan HE et al[8] explored the use of neural networks for collaborative filtering. In this use, User-item interaction matrix data is treated as an implicit data. While observed entries at least reflect users' interest on items, the unobserved entries can be just missing data and there is in most cases, natural scarcity of negative feedback. The recommendation problem with implicit feedback is formulated as the problem of estimating the scores of unobserved entries. Matrix factorization is used to estimate predicted output. The missing data is replaced by using this input. Filled input space is then passed to a multi-layer perceptron network to estimate ratings for an active user.

YouTube recommendation system[10] is a deep neural network based recommendation system.



This system employs two separate neural networks. The first network is used in candidate generation while another network that follows is used for ranking. The candidate generation network only provides broad personalization via collaborative filtering. This network uses personal history of a user to suggest few hundred out of a huge corpus of videos to the user. The second filter uses a rich set of features including a description of a video and user to rank videos in suggestion.


Collaborative filtering approach performs better than simple content-based recommendation technique when huge user data is available. This method does not prove helpful in case of very sparse data. This is known as 'cold start problem'.


3. Hybrid Recommendation System

The hybrid recommendation system is a combination of collaborative and content-based filtering techniques. In this approach, content is used to infer ratings in case of the scarcity of ratings.

This combination is used in most recommendation systems at present. Netflix movie recommendation system is an example of hybrid recommendation system.

Scikit-surprise package is in python is useful to implementation of recommendation system. Since there is no single 'correct' way to implement recommendation system, various machine learning algorithms for classification can be explored. Typical benchmarks are available on for comparison[9].

How to measure success rate / Efficiency?

Statistical accuracy metrics are used to evaluate the accuracy of a filtering technique by comparing the predicted ratings directly with the actual user rating. Most commonly used statistical metrics are:

1.     Mean Absolute Error (MAE)

2.     Root Mean Square Error (RMSE) and

3.     ROC / PR curve



Mean absolute error is most popular metrics to measure the efficiency of recommendation system. It is a measure of the deviation of recommendation from user's specific value.

Here yi is actual user rating and  is a predicted rating.


Root mean square error measures average magnitude or error.

Precision, Recall measures :


Precision-recall values are useful for determining the accuracy of a prediction system in case of unbalanced datasets.



References :

1.     Content-based recommendation systems

2.     Recommender Systems

3.     Using collaborative filtering to weave an information Tapestry.

4.     Ite-based collaborative filtering

5.     Collaborative filtering tutorial:

6.     Learning Collaborative Information Filters

7.     Neural Collaborative Filtering [Xiangnan He et al]

8.     Surprise library

9.     Recommendation systems: principles, methods, and evaluation.

10.  Deep Neural networks in YouTube recommendations.


Technical Editor: Farnam Adelkhani