Collaborative filtering in recommendation systems
Recommendation
systems describe the techniques used to predict ratings and opinions in which a
user may have a propensity to express. As an example, think of the recommended
content shown to users while they navigate Netflix, or perhaps the products Amazon
recommends to consumers based on prior purchases.
Recommendation systems are ubiquitous in the digital world. Users encounter
these systems when seeing "Suggested Friends" displayed on Facebook,
"Suggested Videos" on YouTube, and "Other Jobs for you" on
LinkedIn.
There are two approaches through which recommendation system are designed:
1. Contentbased filtering
2. Collaborative filtering
Techniques which selectively make use of both approaches are called Hybrid recommendation systems.
There are some advantages and disadvantages in both the systems :

Advantages 
Disadvantages 
Collaborative approach 
● Other user's scores are used ● No deterministic result since chance is involved in the system 
● Needs more data ● Problem with new users and new products. 
Contentbased approach 
● Works with lesser data. ● Provides results based on the activities of the specific user. 
● Overspecialization 
1. A shallow dive in ContentBased filtering
Contentbased
recommendation systems take into account the data provided by the user both
directly and indirectly. For example, age can be used to determine classes of
products or items reviewed and bought by the user.
Since contentbased recommendations are based on usage patterns of a user, a
user is often able to speculate why he/she is being shown a recommendation.
This type of recommendation system relies on characteristics of the object. New
content can be quickly recommended to the user. E.g. if the user has a history
of watching all action movies, a newly released action movie is recommended by
this system.
However, this system does not take into account behavior/data about other users
in the system hence, if a particular action movie fetches very low rating /
negative recommendations by other users, It will still be recommended to the
user.
TFIDF ( Term Frequency  Inverse Document Frequency) and cosine similarity between space vector model techniques are used in contentbased filtering.
TFIDF :
This technique is used in information retrieval and text mining. TFIDF, as the name suggests has two terms. TF calculates normalized frequency at which a given term appears in the document. And IDF calculates importance of a term in general. Eg. terms 'recommendation', 'system', 'movie' conveys more information about the document than terms 'the', 'and', 'are' etc.
TF: It measures the frequency of a term in the document. Since the size of a document may vary, It will be futile to use simple count. Hence this count is normalized.
TF(w) = (Number of times term w appears in a document) / (Total number of terms in the document).
IDF: It measures the overall importance of a given term. Since commonly used terms like 'is', 'the', 'are' doesn't usually provide information about the document, IDF for these terms is low. IDF is calculated as :
IDF(t) = log_e(Total number of documents / Number of documents with term t in it).
In contentbased filtering technique, TFIDF can be useful in determining products which are similar to a given product. Since this technique is keyword based, it is more useful in the area with high textual data. Eg. book recommendation. In these areas generally, cosine similarity is used to calculate
TFIDF method is used to calculate a vector in feature space.
Cosine Similarity Matrix :
As the name promises, this method calculates cosine value in vector calculation. This method provides the estimation of similarity between two objects as a measure of the angle between two vectors.
Cosine similarity can be calculated by dot product between two vectors.
Cosine similarity is a measure of similarity between two vectors.
It is evident in Fig 1, the similarity between two vectors in 1(a) is the highest i.e. ~ 1. While cosine similarity in fig 1(b) would be around 1. Cosine similarity of vectors, orthogonal to each other will be 0.
TfidfVectorizer module of scikit learn can be used to calculate tfidf matrix.And Cosine_similarity method in sklearn.metrics.pairwise module can be used to calculate cosine similarity of a matrix.
2. Deep drive in collaborative filtering
Developers at Xerox first use collaborative filtering in document retrieval system^{[5]}. PageRank algorithm used by Google is an example of document retrieval system using collaborative filtering. Collaborative filtering is used to tailor recommendations based on the behavior of persons with similar interests. Sometimes it can be based on an item bought by the user. Since this method does not require a person himself to always contribute to a data store, and voids can be filled by the actions of other persons/ actions by the same person on other items. Few approaches for User and Itembased collaborative recommendation techniques are as follow:
1. Neighborhoodbased approach
2. Itembased approach
3. Classification approach
4. Neural Collaborative Filtering
Neighborhoodbased approach
Let's refer the user for which rating is to be predicted as 'active user'. In the approach, users are selected based on their similarity to the active user. Correlation between two users can be found out using the formula below:
Let the ratings to a product by person 'A' and 'B' be 'A_{k}' and 'B_{k}' . And mean values are represented as A,B:
Persons with higher correlation are considered as neighbors.
Let V_{i,j} be the vote of user i on item j. Mean vote for i is:
The vote of an 'active user' a is :
Here, k is a constant used for normalization. w(a,i) is weights of 'n' similar users.
The simplest way to calculate w(a,i) is to using KNN algorithm where each point is represented by a user in ndimensional space. However, there are more efficient algorithms which can be used to calculate weight w.
w(a,i) is calculated by pearson correlation coefficient :
The most serious problem collaborative filtering techniques face in a real world is too few ratings by the users. Hence, In the realworld dataset, user vs items matrix may have some null values.
w(a,i) is calculated in such cases as:
Where f_{i} = log(n/n_{i}) is inverse user frequency.
here, n is a number of users. N_{j} is a number of users voting for item j. is inverse user frequency.
One approach for to reduce the effect of null values is by dimensionality reduction. Billsus and pazzani^{[7]} suggest the use of the noncorrelation based approach to make predictions using neural networks.
Item  to  Item approach
Instead of using ratings given by the users to calculate neighborhood, the ratings are used to find similarity between items. The same pearson coefficient can be used for this approach.
Classification approach
In classification approach, items are represented as vectors and they are classified and suggested to the user based on the ratings provided by the active user to each class of items. With this approach, collaborative filtering is visualized as classic classification approach. Classification techniques such as Support Vector Machines or Bayesian classifier can be used. Random Forest proves useful in case of the unbalanced dataset.
Neural Collaborative Filtering
Neural networks are being used increasingly for collaborative filtering. Xiangnan HE et al^{[8]} explored the use of neural networks for collaborative filtering. In this use, Useritem interaction matrix data is treated as an implicit data. While observed entries at least reflect users' interest on items, the unobserved entries can be just missing data and there is in most cases, natural scarcity of negative feedback. The recommendation problem with implicit feedback is formulated as the problem of estimating the scores of unobserved entries. Matrix factorization is used to estimate predicted output. The missing data is replaced by using this input. Filled input space is then passed to a multilayer perceptron network to estimate ratings for an active user.
YouTube recommendation system^{[10]} is a deep neural network based recommendation system.
This system employs two separate neural networks. The first network is used in candidate generation while another network that follows is used for ranking. The candidate generation network only provides broad personalization via collaborative filtering. This network uses personal history of a user to suggest few hundred out of a huge corpus of videos to the user. The second filter uses a rich set of features including a description of a video and user to rank videos in suggestion.
Collaborative filtering approach performs better than simple contentbased recommendation technique when huge user data is available. This method does not prove helpful in case of very sparse data. This is known as 'cold start problem'.
3. Hybrid Recommendation System
The hybrid recommendation system is a combination of collaborative and contentbased filtering techniques. In this approach, content is used to infer ratings in case of the scarcity of ratings.
This combination is used in most recommendation systems at present. Netflix movie recommendation system is an example of hybrid recommendation system.
Scikitsurprise package is in python is useful to implementation of recommendation system. Since there is no single 'correct' way to implement recommendation system, various machine learning algorithms for classification can be explored. Typical benchmarks are available on for comparison^{[9]}.
How to measure success rate / Efficiency?
Statistical accuracy metrics are used to evaluate the accuracy of a filtering technique by comparing the predicted ratings directly with the actual user rating. Most commonly used statistical metrics are:
1. Mean Absolute Error (MAE)
2. Root Mean Square Error (RMSE) and
3. ROC / PR curve
MAE :
Mean absolute error is most popular metrics to measure the efficiency of recommendation system. It is a measure of the deviation of recommendation from user's specific value.
Here y_{i} is actual user rating and is a predicted rating.
RMSE :
Root mean square error measures average magnitude or error.
Precision, Recall measures :
Precisionrecall values are useful for determining the accuracy of a prediction system in case of unbalanced datasets.
References :
1. Contentbased recommendation systems http://www.fxpal.com/publications/FXPALPR06383.pdf
2. Recommender Systems http://recommendersystems.org
3. Using collaborative filtering to weave an information Tapestry. https://www.ischool.utexas.edu/~i385d/readings/Goldberg_UsingCollaborative_92.pdf
4. Itebased collaborative filtering http://www.cs.carleton.edu/cs_comps/0607/recommend/recommender/itembased.html
5. Collaborative filtering tutorial: https://www.cs.cmu.edu/~wcohen/collabfilteringtutorial.ppt
6. Learning Collaborative Information Filters http://www.ics.uci.edu/~pazzani/Publications/MLC98.pdf
7. Neural Collaborative Filtering [Xiangnan He et al] https://arxiv.org/pdf/1708.05031.pdf
8. Surprise library http://surpriselib.com/
9. Recommendation systems: principles, methods, and evaluation. https://www.sciencedirect.com/science/article/pii/S1110866515000341
10. Deep Neural networks in YouTube recommendations. https://static.googleusercontent.com/media/research.google.com/ru//pubs/archive/45530.pdf
Technical Editor: Farnam Adelkhani