Clustering Techniques to Improve Scalability and Accuracy of Recommender Systems
Abstract
Recommender systems have emerged as a class of essential tools in the success of modern e-commerce applications. These applications typically handle large datasets and often face challenges like data sparsity and scalability. Clustering techniques help to reduce the computational time needed for recommendation as well as handle the sparsity problem more efficiently. Traditional clustering based recommender systems create partitions (clusters) of the user-item rating matrix and execute the recommendation algorithm in the clusters separately in order to decrease the overall runtime of the system. Each user or item generally belong to at most one cluster. However, it may so happen that some users (boundary users) present in a particular cluster exhibit higher similarity with the preferences of the users residing in the nearby clusters than the ones present in their own cluster. Therefore, we propose a clustering based scalable recommendation algorithm that has a provision for switching a user from its original cluster to another cluster in order to provide more accurate recommendations. For a user belonging to multiple clusters, we aggregate recommendations from those clusters to which the user belongs in order to produce the final set of recommendations to that user. In this work, we propose two types of clustering, one on the basis of rating and the other on the basis of frequency and then compare their performances. Finally, we explore the applicability of cluster ensembles techniques in the proposed method. Our aim is to develop a recommendation framework that can scale well to handle large datasets without much affecting the recommendation quality. The outcomes of our experiments clearly demonstrate the scalability as well as efficacy of our method. It reduces the runtime of the baseline CF algorithm by a minimum of 58% and a maximum of 90% for MovieLens-10M dataset, and a minimum of 42% and a maximum of 84% for MovieLens-20M dataset. The accuracies of recommendations in terms of F1, MAP and NDCG metrics are also better than the existing clustering based recommender systems.