Machine Learning in Industry: April 2012

Matrix Completion and Recommender Systems

Ever since Netflix introduced the world to the Netflix Prize, Recommendation Systems have been a hot research topic in Machine Learning Community. What BellKor and Pragmatic Chaos achieved in their winning algorithm for the Netflix prize was an impressive feat. But ever since then, a lot of people are following some of the techniques they presented, especially the "SVD-based" recommendation System. The procedures they presented are straight-forward to implement, and there are scalable implementations available in Apache Mahout which really only implements a highly simplified correlation-based Recommendation System. And Mahout seems to be a popular choice among a lot of Web applications.
These techniques however have a lot of limitations, first of all, what Bell-Kor and Pragmatic Chaos presented in their SVD based recommendation system was actually a hidden factors model in which they solved a series of linear regression problems to determine the hidden factors (the alternating least-squares method). Anyone remotely familiar with regression based techniques, knows that the output variable needs to be a numerical variable. Although this method works fine for places where we only have numerical numbers available in data (Netflix ratings) but a lot of times we only have access to categorical variables (e.g which item the user browsed over, purchased etc.) . A lot of time, we have mixed variables; both categorical and numerical. In these situations, the techniques currently prevalent in industry simply cannot be applied.
What I developed here at Change.org is a true mixed variable recommendation system. We have tons of categorical and numerical features. Recommendation System is simply a marketable name for Statistical Matrix Completion Procedures. You impose a statistical observation model and then learn the parameters of the models using Convex Optimization. The result is a neat Matrix Completion Algorithm which is tremendously useful because of the flexibility it provides in the data it can handle. More on this later as it develops !!

Thursday, April 5, 2012

Matrix Completion and Recommender Systems