Mathias Brandewinder on .NET, F#, VSTO and Excel development, and quantitative analysis / machine learning.
by Mathias 25. March 2013 10:33

My trajectory through “Machine Learning in Action” is becoming more unpredictable as we go – this time, rather than completing our last episode on K-means clustering (we’ll get back to it later), I’ll make another jump directly to Chapter 14, which is dedicated to Singular Value Decomposition, and convert the example from Python to F#.

The chapter illustrates how Singular Value Decomposition (or SVD in short) can be used to build a collaborative recommendation engine. We will follow the chapter pretty closely: today we will focus on the mechanics of using SVD in F# – and leave the recommendation part to our next installment.

As usual, the code is on GitHub.

Until this point, I have avoided using a Linear Algebra library, because the algorithms we discussed so far involved lightweight, row-centric operations, which didn’t warrant taking such a dependency. SVD is one of these cases where using an established library is a good idea, if only because implementing it yourself would not be trivial. So let’s create a new script file (Chapter14.fsx), add a reference to Math.NET Numerics for F# to our project via NuGet, and reference it in our script:

#r @"..\..\MachineLearningInAction\packages\MathNet.Numerics.2.4.0\lib\net40\MathNet.Numerics.dll"
#r @"..\..\MachineLearningInAction\packages\MathNet.Numerics.FSharp.2.4.0\lib\net40\MathNet.Numerics.FSharp.dll"

open MathNet.Numerics.LinearAlgebra
open MathNet.Numerics.LinearAlgebra.Double

Now that we have our tools, let’s start working our example. Imagine that we are running a website, where our users can rate dishes, from 1 (horrendous) to 5 (delightful). Our data would look something along these lines:

type Rating = { UserId: int; DishId: int; Rating: int }

// Our existing "ratings database"
let ratings = [
    { UserId = 0; DishId = 0; Rating = 2 };
    { UserId = 0; DishId = 3; Rating = 4 };
    ... omitted for brevity ...
    { UserId = 10; DishId = 8; Rating = 4 };
    { UserId = 10; DishId = 9; Rating = 5 } ]

Our goal will be to provide recommendations to User for Dishes they haven’t tasted yet, based on their ratings and what other users are saying.

Our first step will be to represent this as a Matrix, where each Row is a User, each Column a Dish, and the corresponding cell is the User Rating for that Dish. Note that not every Dish has been rated by every User – we will represent missing ratings as zeroes in our matrix:

let rows = 11
let cols = 11
let data = DenseMatrix(rows, cols)
ratings 
|> List.iter (fun rating -> 
       data.[rating.UserId, rating.DishId] <- (float)rating.Rating)

We initialize our 11 x 11 matrix, which creates a zero-filled matrix, and then map our user ratings to each “cell”. Because we constructed our example that way, our UserIds go from 0 to 10, and DishIds from 0 to 10, so we can map them respectively to Rows and Columns.

Note: while this sounded like a perfect case to use a Sparse Matrix, I chose to go first with a DenseMatrix, which is more standard. I may look at whether there is a benefit to going sparse later.

Note: our matrix happens to be square, but this isn’t a requirement.

Note: I will happily follow along the book author and replace unknown ratings by zero, because it’s very convenient. I don’t fully get how this is justified, but it seems to work, so I’ll temporarily suspend disbelief and play along.

At that point, we have our data matrix ready. Before going any further, let’s write a quick utility function, to “pretty-render” matrices:

let printNumber v = 
    if v < 0. 
    then printf "%.2f " v 
    else printf " %.2f " v
// Display a Matrix in a "pretty" format
let pretty matrix = 
    Matrix.iteri (fun row col value ->
        if col = 0 then printfn "" else ignore ()
        printNumber value) matrix
    printfn ""

We iterate over each row and column, start a newline every time we hit column 0, and print every value, nicely formatted with 2 digits after the decimal.

In passing, note the F#-friendly Matrix.iteri syntax – the good people at Math.NET do support F#, and MathNet.Numerics.FSharp.dll contains handy helpers, which allow for a much more functional usage of the library. Thanks, guys!

Let’s see how our data matrix looks like:

printfn "Original data matrix"
pretty data

… which produces the following output in FSI:

Original data matrix

2.00  0.00  0.00  4.00  4.00  0.00  0.00  0.00  0.00  0.00  0.00
0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  5.00
4.00  0.00  0.00  0.00  0.00  0.00  0.00  1.00  0.00  0.00  0.00
3.00  3.00  4.00  0.00  3.00  0.00  0.00  2.00  2.00  0.00  0.00
5.00  5.00  5.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00
0.00  0.00  0.00  0.00  0.00  0.00  5.00  0.00  0.00  5.00  0.00
4.00  0.00  4.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  5.00
0.00  0.00  0.00  0.00  0.00  4.00  0.00  0.00  0.00  0.00  4.00
0.00  0.00  0.00  0.00  0.00  0.00  5.00  0.00  0.00  0.00  0.00
0.00  0.00  0.00  3.00  0.00  0.00  0.00  0.00  4.00  5.00  0.00
1.00  1.00  2.00  1.00  1.00  2.00  1.00  0.00  4.00  5.00  0.00
>

We seem to be in business.

More...












by Mathias 16. March 2013 10:04

Mondrian is one of those modern painters whose work everyone recognizes, even though few people will quote his name. He also happens to be one of my favorite artists – in spite of their simple geometric structure, I find his pieces strangely beautiful:

Mondrian_Composition_II_in_Red,_Blue,_and_Yellow

“Composition II in Red, Blue, and Yellow”, from Wikipedia

I have been hard at work on some pretty dry stuff lately, and needed a bit of a change of pace, and ended up spending a couple of hours coding a simple Mondrianizer in F#: give it a picture, and it will transform it into something “in the style of Mondrian”.

For instance, starting from my Twitter avatar, here is what the Mondrianizer produces:

TournesolMondrianizedTournesol

This was strictly quick-and-dirty hackery, so the code is not my best by any stretch of the imagination, but I was rather pleased by the results – you can find the current version of the Mondrianizer here on GitHub.

More...

by Mathias 3. March 2013 11:01

Recently, I had a few interesting discussions on F# code readability. One argument I often hear about F# is that by virtue of its succinctness, it increases the signal-to-noise ratio. I certainly found this to be true: when the entire code fits on your screen, and you don’t have to scroll around to figure out what is going on, navigating a code base becomes significantly simpler.

Relatedly, because the F# syntax is so much lighter than C#, some of my coding habits evolved. I stick to the “one public type per file” guideline in C#, and initially did the same in F#. That didn’t last long: declaring a Record Type in F# is a one-liner, and dedicating an entire file to it seems… overkill:

type Person = { FirstName: string; LastName: string; BornOn: DateTime }

As a result, my F# solutions tend to contain less files, and each file is more “self-contained”, usually declaring a couple of types and implementing some operations involving these types in a module. Again, less navigation required: open one file, and the truth, the whole truth, and nothing but the truth is right there on your screen.

In my experience, this also makes refactoring tools much less important in F# than C#. The lack of refactoring tools in F# used to be one of my main gripes with using the language. At that point, I don’t really care that much any more, because I don’t really need them that badly. Sure, it would be nice to propagate a rename automatically – but lots of the refactoring tools I commonly use with C# deal with navigating around or moving pieces of code from file to file (extract class, method, etc…), all problems that are minor when your code sits in just a couple of files, and the “what class owns what responsibility” issue vanishes because your functions are at a module level.

Conversely, I have found myself annoyed a few times looking at F# code where succinctness erred on the side of obfuscation. This tendency for terse naming conventions seems to be a cultural heritage from other functional languages, and makes sense to an extent – functional code tends to focus on applying generic transformations to “things”, and not that much on what the “thing” might be.

As an illustration, I have seen often code along these lines:

match list with
| x::xs -> ...

No need to go full on Java on your code, but a bit of naming effort goes a long way in making code intelligible: 

match list with
| head::tail -> ...

That being said, extreme terseness can be fun, at times – I don’t think I’ll see a C# Game of Life implementation that fits in a Tweet any time soon Smile.

Another readability aspect I found interesting with F# code is that the order of declarations matters, and the order of the files in the project matters as well. This seemingly odd constraint has a good reason – it makes the awesome F# type inference work.

Over time, I actually began to appreciate this not as a constraint, but almost as a feature. One problem I keep running into when I look into a C# code base I am not familiar with is “damn! where should I start?”. There is no clear way to proceed through the code, and I have ended up countless times navigating haphazardly from class to class, hoping to stumble upon a solid starting point.

I don’t really have that problem with F# code bases – essentially, I either start from the first line of the first file, and read forward, or the last line of the last file, working my way back. Either way works; the first one reads like a constructive proof, walking you every step to the ineluctable conclusion, the second one starts with the “high point” of the code base, digging progressively into the nitty-gritty and assumptions that were made to get there.

Someone reacted by saying that I was “just rationalizing”. There is probably some truth in that – but I believe there is something to be said for having a natural reading order in a code base. As a side-note, this is also one of my minor annoyances with GitHub: in the browser, the files from an F# repository are displayed in alphabetical order, which loses the logical project organization.

I might also change my tune when I have to deal with a truly large F# code base, which hasn’t happened to me yet. At that point, I may long for more freedom in organizing my code, and, say, arrange it by topical folders. For the moment, though, this hasn’t been an issue for me!

Comments

Comment RSS