Mathias Brandewinder on .NET, F#, VSTO and Excel development, and quantitative analysis / machine learning.
by Mathias 28. April 2013 09:32

In our previous post, we began exploring Singular Value Decomposition (SVD) using Math.NET and F#, and showed how this linear algebra technique can be used to “extract” the core information of a dataset and construct a reduced version of the dataset with limited loss of information.

Today, we’ll pursue our excursion in Chapter 14 of Machine Learning in Action, and look at how this can be used to build a collaborative recommendation engine. We’ll follow the approach outlined by the book, starting first with a “naïve” approach, and then using an SVD-based approach.

We’ll start from a slightly modified setup from last post, loosely inspired by the Netflix Prize. The full code for the example can be found here on GitHub.

The problem and setup

In the early 2000s, Netflix had an interesting problem. Netflix’s business model was simple: you would subscribe, and for a fixed fee you could watch as many movies from their catalog as you wanted. However, what happened was the following: users would watch all the movies they knew they wanted to watch, and after a while, they would run out of ideas – and rather than search for lesser-known movies, they would leave. As a result, Netflix launched a prize: if you could create a model that could provide users with good recommendations for new movies to watch, you could claim a $1,000,000 prize.

Obviously, we won’t try to replicate the Netflix prize here, if only because the dataset was rather large; 500,000 users and 20,000 movies is a lot of data… We will instead work off a fake, simplified dataset that illustrates some of the key ideas behind collaborative recommendation engines, and how SVD can help in that context. For the sake of clarity, I’ll be erring on the side of extra-verbose.

Our dataset consists of users and movies; a movie can be rated from 1 star (terrible) to 5 stars (awesome). We’ll represent it with a Rating record type, associating a UserId, MovieId, and Rating:

type UserId = int
type MovieId = int
type Rating = { UserId:UserId; MovieId:MovieId; Rating:int }

To make our life simpler, and to be able to validate whether “it works”, we’ll imagine a world where only 3 types of movies exist, say, Action, Romance and Documentary – and where people have simple tastes: people either love Action and hate the rest, love Romance or hate the rest, or love Documentaries and hate the rest. We’ll assume that we have only 12 movies in our catalog: 0 to 3 are Action, 4 to 7 Romance, and 8 to 11 Documentary.



Comment RSS