Mathias Brandewinder on .NET, F#, VSTO and Excel development, and quantitative analysis / machine learning.
by Mathias 25. March 2013 10:33

My trajectory through “Machine Learning in Action” is becoming more unpredictable as we go – this time, rather than completing our last episode on K-means clustering (we’ll get back to it later), I’ll make another jump directly to Chapter 14, which is dedicated to Singular Value Decomposition, and convert the example from Python to F#.

The chapter illustrates how Singular Value Decomposition (or SVD in short) can be used to build a collaborative recommendation engine. We will follow the chapter pretty closely: today we will focus on the mechanics of using SVD in F# – and leave the recommendation part to our next installment.

As usual, the code is on GitHub.

Until this point, I have avoided using a Linear Algebra library, because the algorithms we discussed so far involved lightweight, row-centric operations, which didn’t warrant taking such a dependency. SVD is one of these cases where using an established library is a good idea, if only because implementing it yourself would not be trivial. So let’s create a new script file (Chapter14.fsx), add a reference to Math.NET Numerics for F# to our project via NuGet, and reference it in our script:

#r @"..\..\MachineLearningInAction\packages\MathNet.Numerics.2.4.0\lib\net40\MathNet.Numerics.dll"
#r @"..\..\MachineLearningInAction\packages\MathNet.Numerics.FSharp.2.4.0\lib\net40\MathNet.Numerics.FSharp.dll"

open MathNet.Numerics.LinearAlgebra
open MathNet.Numerics.LinearAlgebra.Double

Now that we have our tools, let’s start working our example. Imagine that we are running a website, where our users can rate dishes, from 1 (horrendous) to 5 (delightful). Our data would look something along these lines:

type Rating = { UserId: int; DishId: int; Rating: int }

// Our existing "ratings database"
let ratings = [
    { UserId = 0; DishId = 0; Rating = 2 };
    { UserId = 0; DishId = 3; Rating = 4 };
    ... omitted for brevity ...
    { UserId = 10; DishId = 8; Rating = 4 };
    { UserId = 10; DishId = 9; Rating = 5 } ]

Our goal will be to provide recommendations to User for Dishes they haven’t tasted yet, based on their ratings and what other users are saying.

Our first step will be to represent this as a Matrix, where each Row is a User, each Column a Dish, and the corresponding cell is the User Rating for that Dish. Note that not every Dish has been rated by every User – we will represent missing ratings as zeroes in our matrix:

let rows = 11
let cols = 11
let data = DenseMatrix(rows, cols)
|> List.iter (fun rating -> 
       data.[rating.UserId, rating.DishId] <- (float)rating.Rating)

We initialize our 11 x 11 matrix, which creates a zero-filled matrix, and then map our user ratings to each “cell”. Because we constructed our example that way, our UserIds go from 0 to 10, and DishIds from 0 to 10, so we can map them respectively to Rows and Columns.

Note: while this sounded like a perfect case to use a Sparse Matrix, I chose to go first with a DenseMatrix, which is more standard. I may look at whether there is a benefit to going sparse later.

Note: our matrix happens to be square, but this isn’t a requirement.

Note: I will happily follow along the book author and replace unknown ratings by zero, because it’s very convenient. I don’t fully get how this is justified, but it seems to work, so I’ll temporarily suspend disbelief and play along.

At that point, we have our data matrix ready. Before going any further, let’s write a quick utility function, to “pretty-render” matrices:

let printNumber v = 
    if v < 0. 
    then printf "%.2f " v 
    else printf " %.2f " v
// Display a Matrix in a "pretty" format
let pretty matrix = 
    Matrix.iteri (fun row col value ->
        if col = 0 then printfn "" else ignore ()
        printNumber value) matrix
    printfn ""

We iterate over each row and column, start a newline every time we hit column 0, and print every value, nicely formatted with 2 digits after the decimal.

In passing, note the F#-friendly Matrix.iteri syntax – the good people at Math.NET do support F#, and MathNet.Numerics.FSharp.dll contains handy helpers, which allow for a much more functional usage of the library. Thanks, guys!

Let’s see how our data matrix looks like:

printfn "Original data matrix"
pretty data

… which produces the following output in FSI:

Original data matrix

2.00  0.00  0.00  4.00  4.00  0.00  0.00  0.00  0.00  0.00  0.00
0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  5.00
4.00  0.00  0.00  0.00  0.00  0.00  0.00  1.00  0.00  0.00  0.00
3.00  3.00  4.00  0.00  3.00  0.00  0.00  2.00  2.00  0.00  0.00
5.00  5.00  5.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00
0.00  0.00  0.00  0.00  0.00  0.00  5.00  0.00  0.00  5.00  0.00
4.00  0.00  4.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  5.00
0.00  0.00  0.00  0.00  0.00  4.00  0.00  0.00  0.00  0.00  4.00
0.00  0.00  0.00  0.00  0.00  0.00  5.00  0.00  0.00  0.00  0.00
0.00  0.00  0.00  3.00  0.00  0.00  0.00  0.00  4.00  5.00  0.00
1.00  1.00  2.00  1.00  1.00  2.00  1.00  0.00  4.00  5.00  0.00

We seem to be in business.


by Mathias 31. October 2012 13:43

This post is to be filed in the “useless but fun” category. A friend of mine was doing some Hadoopy stuff a few days ago, experimenting with rather large sparse matrices and their products. Long story short, we ended up wondering how sparse the product of 2 sparse matrices should be.

A sparse matrix is a matrix which is primarily filled with zeros. The product of two matrices involves computing the dot product of each row of the first one, with each column of the second one. There is clearly a relationship between how dense the two matrices are, and how dense the result should be. As an obvious illustration, if we consider 2 matrices populated only with zeroes – as sparse as it gets – their product is obviously 0. Conversely, two matrices populated only with ones – as dense as it gets – will result in a “completely dense” matrix. But… what happens in between?

Should I expect the product of 2 sparse matrices to be more, or less dense than the original matrices? And does it depend on how large the matrices are? What would be your guess?


by Mathias 27. May 2012 10:20

File:AAMarkov.jpgMarkov chains are a classic in probability model. They represent systems that evolve between states over time, following a random but stable process which is memoryless. The memoryless-ness is the defining characteristic of Markov processes,  and is known as the Markov property. Roughly speaking, the idea is that if you know the state of the process at time T, you know all there is to know about it – knowing where it was at time T-1 would not give you additional information on where it may be at time T+1.

While Markov models come in multiple flavors, Markov chains with finite discrete states in discrete time are particularly interesting. They describe a system which is changes between discrete states at fixed time intervals, following a transition pattern described by a transition matrix.

Let’s illustrate with a simplistic example. Imagine that you are running an Airline, AcmeAir, operating one plane. The plane goes from city to city, refueling and doing some maintenance (or whatever planes need) every time.

Each time the plane lands somewhere, it can be in three states: early, on-time, or delayed. It’s not unreasonable to think that if our plane landed late somewhere, it may be difficult to catch up with the required operations, and as a result, the likelihood of the plane landing late at its next stop is higher. We could represent this in the following transition matrix (numbers totally made up):

Current \ Next Early On-time Delayed
Early 10% 85% 5%
On-Time 10% 75% 15%
Delayed 5% 60% 35%

Each row of the matrix represents the current state, and each column the next state. The first row tells us that if the plane landed Early, there is a 10% chance we’ll land early in our next stop, an 80% chance we’ll be on-time, and a 5% chance we’ll arrive late. Note that each row sums up to 100%: given the current state, we have to end up in one of the next states.

How could we simulate this system? Given the state at time T, we simply need to “roll” a random number generator for a percentage between 0% and 100%, and depending on the result, pick our next state – and repeat.

Using F#, we could model the transition matrix as an array (one element per state) of arrays (the probabilities to land in each state), which is pretty easy to define using Array comprehensions:

let P = 
      [| 0.10; 0.85; 0.05 |];
      [| 0.10; 0.75; 0.15 |];
      [| 0.05; 0.60; 0.35 |]

(Note: the entire code sample is also posted on

To simulate the behavior of the system, we need a function that given a state and a transition matrix, produces the next state according to the transition probabilities:

// given a roll between 0 and 1
// and a distribution D of 
// probabilities to end up in each state
// returns the index of the state
let state (D: float[]) roll =
   let rec index cumul current =
      let cumul = cumul + D.[current]
      match (roll <= cumul) with
      | true -> current
      | false -> index cumul (current + 1)
   index 0.0 0

// given the transition matrix P
// the index of the current state
// and a random generator,
// simulates what the next state is
let nextState (P: float[][]) current (rng: Random) =
   let dist = P.[current]
   let roll = rng.NextDouble()
   state dist roll

// given a transition matrix P
// the index i of the initial state
// and a random generator
// produces a sequence of states visited
let simulate (P: float[][]) i (rng: Random) =
   Seq.unfold (fun s -> Some(s, nextState P s rng)) i

The state function is a simple helper; given an array D which is assumed to contain probabilities to transition to each of the states, and a “roll” between 0.0 and 1.0, returns the corresponding state. nextState uses that function, by first retrieving the transition probabilities for the current state i, “rolling” the dice, and using state to compute the simulated next state. simulate uses nextState to create an infinite sequence of states, starting from an initial state i.

We need to open System to use the System.Random class – and we can now use this in the F# interactive window:

> let flights = simulate P 1 (new Random());;

val flights : seq<int>

> Seq.take 50 flights |> Seq.toList;;

val it : int list =
  [1; 0; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 0; 1; 1; 2; 2; 2; 1; 1; 1; 1; 1; 1;
   1; 1; 2; 1; 1; 1; 1; 1; 1; 1; 1; 2; 1; 1; 1; 1; 1; 2; 1; 1; 1; 1; 1; 1; 1]

Our small sample shows us what we expect: mostly on-time (Fly AcmeAir!), with some episodical delayed or early flights.

How many delays would we observe on a 1000-flights simulation? Let’s try:

> Seq.take 1000 flights |> Seq.filter (fun i -> i = 2) |> Seq.length;; 
val it : int = 174

We observe about 17% of delayed flights. This is relevant information, but a single simulation is just that – an isolated case. Fortunately, Markov chains have an interesting property: if it is possible to go from any state to any state, then the system will have a stationary distribution, which corresponds to its long term equilibrium. Essentially, regardless of the starting point, over long enough periods, each state will be observed with a stable frequency.

One way to understand better what is going on is to expand our frame. Instead of considering the exact state of the system, we can look at it in terms of probability: at any point in time, the system has a certain probability to be in each of its states.

For instance, imagine that given current information, we know that our plane will land at its next stop either early or on time, with a 50% chance of each. In that case, we can determine the probability that its next stop will be delayed by combining the transition probabilities:

p(delayed in T+1) = p(delayed in T) x P(delayed in T+1 | delayed in T) +  p(on-time in T) x P(delayed in T+1 | on-time in T) + p(early in T) x P(delayed in T+1 | early in T)

p(delayed in T+1) = 0.0 x 0.35 + 0.5 x 0.15 + 0.5 x 0.05 = 0.1

This can be expressed much more concisely using Vector notation. We can represent the state as a vector S, where each component of the vector is the probability to be in each state, in our case

S(T) = [ 0.50; 0.50; 0.0 ]

In that case, the state at time T+1 will be:

S(T+1) = S(T) x P

Let’s make that work with some F#. The product of a vector by a matrix is the dot-product of the vector with each column vector of the matrix:

// Vector dot product
let dot (V1: float[]) (V2: float[]) = V1 V2
   |> (v1, v2) -> v1 * v2)
   |> Array.sum

// Extracts the jth column vector of matrix M 
let column (M: float[][]) (j: int) =
   M |> (fun v -> v.[j])

// Given a row-vector S describing the probability
// of each state and a transition matrix P, compute
// the next state distribution
let nextDist S P =
   |> Array.mapi (fun j v -> column P j)
   |> v -> dot v S)

We can now handle our previous example, creating a state s with a 50/50 chance of being in state 0 or 1:

> let s = [| 0.5; 0.5; 0.0 |];;

val s : float [] = [|0.5; 0.5; 0.0|]

> let s' = nextDist s P;;

val s' : float [] = [|0.1; 0.8; 0.1|]


We can also easily check what the state of the system should be after, say, 100 flights:

> let s100 = Seq.unfold (fun s -> Some(s, nextDist s P)) s |> Seq.nth 100;;

val s100 : float [] = [|0.09119496855; 0.7327044025; 0.1761006289|]

After 100 flights, starting from either early or on-time, we have about 17% of chance of being delayed. Note that this is consistent with what we observed in our initial simulation. Given that our Markov chain has a stationary distribution, this is to be expected: unless our simulation was pathologically unlikely, we should observe the same frequency of delayed flights in the long run, no matter what the initial starting state is.

Can we compute that stationary distribution? The typical way to achieve this is to bust out some algebra and solve V = P x V, where V is the stationary distribution vector and P the transition matrix.

Here we’ll go for a numeric approximation approach. Rather than solving the system of equations, we will start from a uniform distribution over the states, and apply the transition matrix until the distance between two consecutive states is under a threshold Epsilon:

// Euclidean distance between 2 vectors
let dist (V1: float[]) V2 = V1 V2
   |> (v1, v2) -> (v1 - v2) * (v1 - v2))
   |> Array.sum
// Evaluate stationary distribution
// by searching for a fixed point
// under tolerance epsilon
let stationary (P: float[][]) epsilon =
   let states = P.[0] |> Array.length
   [| for s in 1 .. states -> 1.0 / (float)states |] // initial
   |> Seq.unfold (fun s -> Some((s, (nextDist s P)), (nextDist s P)))
   |> (fun (s, s') -> (s', dist s s'))
   |> Seq.find (fun (s, d) -> d < epsilon)

Running this on our example results in the following stationary distribution estimation:

> stationary P 0.0000001;;
val it : float [] * float =
  ([|0.09118958333; 0.7326858333; 0.1761245833|], 1.1590625e-08)

In short, in the long run, we should expect our plane to be early 9.1% of the time, on-time 73.2%, and delayed 17.6%.

Note: the fixed point approach above should work if a unique stationary distribution exists. If this is not the case, the function may never converge, or may converge to a fixed point that depends on the initial conditions. Use with caution!

Armed with this model, we could now ask interesting questions. Suppose for instance that we could improve the operations of AcmeAir, and reduce the chance that our next arrival is delayed given our current state. What should we focus on – should we reduce the probability to remain delayed after a delay (strategy 1), or should we prevent the risk of being delayed after an on-time landing (strategy 2)?

One way to look at this is to consider the impact of each strategy on the long-term distribution. Let’s compare the impact of a 1-point reduction of delays in each case, which we’ll assume gets transferred to on-time. We can then create the matrices for each strategy, and compare their respective stationary distributions:

> let strat1 = [|[|0.1; 0.85; 0.05|]; [|0.1; 0.75; 0.15|]; [|0.05; 0.61; 0.34|]|]
let strat2 = [|[|0.1; 0.85; 0.05|]; [|0.1; 0.76; 0.14|]; [|0.05; 0.60; 0.35|]|];;

val strat1 : float [] [] =
  [|[|0.1; 0.85; 0.05|]; [|0.1; 0.75; 0.15|]; [|0.05; 0.61; 0.34|]|]
val strat2 : float [] [] =
  [|[|0.1; 0.85; 0.05|]; [|0.1; 0.76; 0.14|]; [|0.05; 0.6; 0.35|]|]

> stationary strat1 0.0001;;
val it : float [] * float =
  ([|0.091; 0.7331333333; 0.1758666667|], 8.834666667e-05)
> stationary strat2 0.0001;;
val it : float [] * float = ([|0.091485; 0.740942; 0.167573|], 1.2698318e-05)

The numbers tell the following story: strategy 2 (improve reduction of delays after on-time arrivals) is better: it results in 16.6% delays, instead of 17.6% for strategy 1. Intuitively, this makes sense, because most of our flights are on-time, so an improvement in this area will have a much larger impact in the overall results that a comparable improvement on delayed flights.

There is (much) more to Markov chains than this, and there are many ways the code presented could be improved upon – but I’ll leave it at that for today, hopefully you will have picked up something of interest along the path of this small exploration!

I also posted the complete code sample on

by Mathias 7. April 2012 10:50

For no clear reason, I got interested in Convex Hull algorithms, and decided to see how it would look in F#. First, if you wonder what a Convex Hull is, imagine that you have a set of points in a plane – say, a board – and that you planted a thumbtack on each point. Now take an elastic band, stretch it, and wrap it around the thumbtacks. The elastic band will cling to the outermost tacks, leaving some tacks untouched. The convex hull is the set of tacks that are in contact with the elastic band; it is convex, because if you take any pair of points from the original set, the segment connecting them remains inside the hull.

The picture below illustrates the idea - the blue thumbtacks define the Convex Hull; all the yellow tacks are included within the elastic band, without touching it.


There are a few algorithms around to identify the Convex Hull of a set of points in 2 dimensions; I decided to go with Andrew’s monotone chain, because of its simplicity.

The insight of the algorithm is to observe that if you start from the leftmost tack, and follow the elastic downwards, the elastic turns only clockwise, until it reaches the rightmost tack. Similarly, starting from the right upwards, only clockwise turns happen, until the rightmost tack is reached. Given that the left- and right-most tacks belong to the convex hull, the algorithm constructs the upper and lower part of the hull by progressively constructing sequences that contain only clockwise turns.


by Mathias 11. March 2012 11:29

I am midway through the highly-recommended “Real-world functional programming: with examples in C# and F#”, which inspired me to play with graphics using F# and WinForms (hadn’t touched that one in a long, long time), and I figured it would be fun to try generating a Sierpinski Triangle.

The Sierpinski Triangle is generated starting from an initial triangle. 3 half-size copies of the triangle are created and placed outside of the original triangle, each of them having a corner “touching” the middle of one side of the triangle.


That procedure is then repeated for each of the 3 new triangles, creating more and more smaller triangles, which progressively fill in an enclosing triangular shape. The figure below uses the same starting point, stopped after 6 “generations”:



The Sierpinski Triangle is an example of a fractal figure, displaying self-similarity: if we were to run the procedure ad infinitum, each part of the Triangle would look like the whole “triangle” itself.

So how could we go about creating this in F#?

I figured this would be a good case for Seq.unfold: given a state (the triangles that have been produced for generation n), and an initial state to start from, provide a function which defines how the next generation of triangles should be produced, defining a “virtual” infinite sequence of triangles – and then use Seq.take to request the number of generations to be plotted.

Here is the entire code I used; you’ll need to add a reference to System.Windows.Forms and System.Drawings for it to run:

open System
open System.Drawing
open System.Windows.Forms

type Point = { X:float32; Y:float32 }
type Triangle = { A:Point; B:Point; C:Point }

let transform (p1, p2, p3) =
   let x1 = p1.X + 0.5f * (p2.X - p1.X) + 0.5f * (p3.X - p1.X)
   let y1 = p1.Y + 0.5f * (p2.Y - p1.Y) + 0.5f * (p3.Y - p1.Y)
   let x2 = p1.X + 1.0f * (p2.X - p1.X) + 0.5f * (p3.X - p1.X)
   let y2 = p1.Y + 1.0f * (p2.Y - p1.Y) + 0.5f * (p3.Y - p1.Y)
   let x3 = p1.X + 0.5f * (p2.X - p1.X) + 1.0f * (p3.X - p1.X)
   let y3 = p1.Y + 0.5f * (p2.Y - p1.Y) + 1.0f * (p3.Y - p1.Y)
   { A = { X = x1; Y = y1 }; B = { X = x2; Y = y2 }; C= { X = x3; Y = y3 }}

let generateFrom triangle = seq {
      yield transform (triangle.A, triangle.B, triangle.C)
      yield transform (triangle.B, triangle.C, triangle.A)
      yield transform (triangle.C, triangle.A, triangle.B)

let nextGeneration triangles =
   Seq.collect generateFrom triangles 
let render (target:Graphics) (brush:Brush) triangle =
   let p1 = new PointF(triangle.A.X, triangle.A.Y)
   let p2 = new PointF(triangle.B.X, triangle.B.Y)
   let p3 = new PointF(triangle.C.X, triangle.C.Y)
   let points = List.toArray <| [ p1; p2; p3 ]
   target.FillPolygon(brush, points)
let form = new Form(Width=500, Height=500)
let box = new PictureBox(BackColor=Color.White, Dock=DockStyle.Fill)
let image = new Bitmap(500, 500)
let graphics = Graphics.FromImage(image)
let brush = new SolidBrush(Color.FromArgb(0,0,0))
let renderTriangle = render graphics brush

let p1 = { X = 190.0f; Y = 170.0f }
let p2 = { X = 410.0f; Y = 210.0f}
let p3 = { X = 220.0f; Y = 360.0f}
let triangle = { A = p1; B = p2; C = p3 }

let root = seq { yield triangle }
let generations = 
   Seq.unfold (fun state -> Some(state, (nextGeneration state))) root
   |> Seq.take 7
Seq.iter (fun gen -> Seq.iter renderTriangle gen) generations

box.Image <- image


A few comments on the code. I first define 2 types, a Point with 2 float32 coordinates (float32 chosen because that’s what System.Drawing.PointF takes), , and a Triangle defined by 3 Points. The transform function is pretty ugly, and can certainly be cleaned up / prettified. It takes a tuple of 3 Points, and returns the corresponding transformed triangle, shrunk by half and located at the middle of the p1, p2 edge. We can now build up on this with the nextGeneration function, which takes in a Sequence of Triangles (generation n), transforms each of them into 3 new Triangles and uses collect to “flatten” the result into a new Sequence, generation n+1.

The rendering code has been mostly lifted with slight modifications from Chapter 4 of Real-world functional programming. The render function retrieves the 3 points of a Triangle and create a filled Polygon, displayed on a Graphics object which we create in a form.

Running that particular example generates the following Sierpinski triangle; you can play with the coordinates of the root triangle, and the number of generations, to build your own!

As an aside, I was a bit disappointed by the quality of the graphics; beyond 7 or 8 generations, the result gets fairly blurry. I’ll probably give a shot at moving this to XAML, and see if it’s any better.



As usual, comments, questions or criticisms are welcome!


Comment RSS