My trajectory through “Machine Learning in Action” is becoming more unpredictable as we go – this time, rather than completing our last episode on K-means clustering (we’ll get back to it later), I’ll make another jump directly to Chapter 14, which is dedicated to Singular Value Decomposition, and convert the example from Python to F#.
The chapter illustrates how Singular Value Decomposition (or SVD in short) can be used to build a collaborative recommendation engine. We will follow the chapter pretty closely: today we will focus on the mechanics of using SVD in F# – and leave the recommendation part to our next installment.
As usual, the code is on GitHub.
Until this point, I have avoided using a Linear Algebra library, because the algorithms we discussed so far involved lightweight, row-centric operations, which didn’t warrant taking such a dependency. SVD is one of these cases where using an established library is a good idea, if only because implementing it yourself would not be trivial. So let’s create a new script file (Chapter14.fsx), add a reference to Math.NET Numerics for F# to our project via NuGet, and reference it in our script:
#r @"..\..\MachineLearningInAction\packages\MathNet.Numerics.2.4.0\lib\net40\MathNet.Numerics.dll"
#r @"..\..\MachineLearningInAction\packages\MathNet.Numerics.FSharp.2.4.0\lib\net40\MathNet.Numerics.FSharp.dll"
open MathNet.Numerics.LinearAlgebra
open MathNet.Numerics.LinearAlgebra.Double
Now that we have our tools, let’s start working our example. Imagine that we are running a website, where our users can rate dishes, from 1 (horrendous) to 5 (delightful). Our data would look something along these lines:
type Rating = { UserId: int; DishId: int; Rating: int }
// Our existing "ratings database"
let ratings = [
{ UserId = 0; DishId = 0; Rating = 2 };
{ UserId = 0; DishId = 3; Rating = 4 };
... omitted for brevity ...
{ UserId = 10; DishId = 8; Rating = 4 };
{ UserId = 10; DishId = 9; Rating = 5 } ]
Our goal will be to provide recommendations to User for Dishes they haven’t tasted yet, based on their ratings and what other users are saying.
Our first step will be to represent this as a Matrix, where each Row is a User, each Column a Dish, and the corresponding cell is the User Rating for that Dish. Note that not every Dish has been rated by every User – we will represent missing ratings as zeroes in our matrix:
let rows = 11
let cols = 11
let data = DenseMatrix(rows, cols)
ratings
|> List.iter (fun rating ->
data.[rating.UserId, rating.DishId] <- (float)rating.Rating)
We initialize our 11 x 11 matrix, which creates a zero-filled matrix, and then map our user ratings to each “cell”. Because we constructed our example that way, our UserIds go from 0 to 10, and DishIds from 0 to 10, so we can map them respectively to Rows and Columns.
Note: while this sounded like a perfect case to use a Sparse Matrix, I chose to go first with a DenseMatrix, which is more standard. I may look at whether there is a benefit to going sparse later.
Note: our matrix happens to be square, but this isn’t a requirement.
Note: I will happily follow along the book author and replace unknown ratings by zero, because it’s very convenient. I don’t fully get how this is justified, but it seems to work, so I’ll temporarily suspend disbelief and play along.
At that point, we have our data matrix ready. Before going any further, let’s write a quick utility function, to “pretty-render” matrices:
let printNumber v =
if v < 0.
then printf "%.2f " v
else printf " %.2f " v
// Display a Matrix in a "pretty" format
let pretty matrix =
Matrix.iteri (fun row col value ->
if col = 0 then printfn "" else ignore ()
printNumber value) matrix
printfn ""
We iterate over each row and column, start a newline every time we hit column 0, and print every value, nicely formatted with 2 digits after the decimal.
In passing, note the F#-friendly Matrix.iteri syntax – the good people at Math.NET do support F#, and MathNet.Numerics.FSharp.dll contains handy helpers, which allow for a much more functional usage of the library. Thanks, guys!
Let’s see how our data matrix looks like:
printfn "Original data matrix"
pretty data
… which produces the following output in FSI:
Original data matrix
2.00 0.00 0.00 4.00 4.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 5.00
4.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00
3.00 3.00 4.00 0.00 3.00 0.00 0.00 2.00 2.00 0.00 0.00
5.00 5.00 5.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 5.00 0.00 0.00 5.00 0.00
4.00 0.00 4.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 5.00
0.00 0.00 0.00 0.00 0.00 4.00 0.00 0.00 0.00 0.00 4.00
0.00 0.00 0.00 0.00 0.00 0.00 5.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 3.00 0.00 0.00 0.00 0.00 4.00 5.00 0.00
1.00 1.00 2.00 1.00 1.00 2.00 1.00 0.00 4.00 5.00 0.00
>
We seem to be in business.
More...