Mathias Brandewinder on .NET, F#, VSTO and Excel development, and quantitative analysis / machine learning.
21. February 2015 12:23

A few weeks ago, I came across DiffSharp, an automatic differentiation library in F#. As someone whose calculus skills have always been rather mediocre (thanks Wolfram Alpha!), but who needs to deal with gradients and the like on a regular basis because they are quite useful in machine learning and numerical methods, the project looked pretty interesting: who wouldn’t want exact and efficient calculations of derivatives? So I figured I would take a couple of hours to experiment with the library. This post is by no means an in-depth evaluation, but rather intended as “notes from the road” from someone entirely new to DiffSharp.

## Basics

Suppose I want to compute the derivative of f(x) = √ x at, say, 42.0. Double-checking Wolfram Alpha confirms that f has derivative f’(x) = 1 / (2 x √ x) .

Once DiffSharp is installed via Nuget, we can automatically evaluate f’(x) :

#r @"..\packages\DiffSharp.0.5.7\lib\DiffSharp.dll"

let f x = sqrt x
diff f 42. |> printfn "Evaluated: %f"
1. / (2. * sqrt 42.)  |> printfn "Actual: %f"

Evaluated: 0.077152
Actual: 0.077152

val f : x:Dual -> Dual
val it : unit = ()

First off, obviously, it worked. Without any need for us to perform anything, DiffSharp took in our implementation of f, and computed the correct value. This is really nifty.

The piece which is interesting here is the inferred signature of f. If I were to remove the line that immediately follows the function declaration, f would have the following signature:

val f : x:float –> float

The moment you include the line diff f 42., the inferred type changes drastically, and becomes

val f : x:Dual –> Dual

This is pretty interesting. Because we call diff on f, which expects Duals (a type that is defined in DiffSharp), our function isn’t what we originally defined it to be – and calling f 42.0 at that point (for instance) will fail, because 42.0 is a float, and not a Dual. In other words, DiffSharp leverages type inference pretty aggressively, to convert functions into the form it needs to perform its magic.

Edit: Atilim Gunes Baydin suggested another way around that issue, which is inlining f. The following works perfectly well, and allows to both differentiate f, and use this against floats:

let inline f x = sqrt x
let f' = diff f
f 42.

Thanks for the input!

This has a couple of implications. First, if you work in a script, you need to be careful about how you send your code to the F# interactive for execution. If you process the sample code above line by line in FSI, the evaluation will fail, because f will be inferred to be float –> float. Then, you will potentially need to annotate your functions with type hints, to help inference. As an example, the following doesn’t work:

let g x = 3. * x
diff g 42.

As is, g is still inferred to be of type float –> float, because of the presence of the constant term, which is by default inferred as a float. That issue can be addressed at least two ways – by explicitly marking x or 3. as dual in g, like this:

let g x = (dual 3.) * x
let h (x:Dual) = 3. * x

That’s how far we will go on this – if you want to dive in deeper, the Type Inference page discusses the topic in much greater detail

## A tiny example

So why is this interesting? As I mentioned earlier, differentiation is used heavily in numeric algorithms to identify values that minimize a function, a prime example being the gradient descent algorithm. The simplest example possible would be finding a (local) minimum of a single-argument function: starting from an arbitrary value x, we can iteratively follow the direction of steepest descent, until no significant change is observed.

Here is a quick-and-dirty implementation, using DiffSharp:

let minimize f x0 alpha epsilon =
let rec search x =
let fx' = diff f x
if abs fx' < epsilon
then x
else
let x = x - alpha * fx'
search x
search x0

Edit, 3/4/2015: fixed issue in code, using abs fx’ instead of fx’

Because DiffSharp handles the differentiation part automatically for us, with only 10 lines of code, we can now pass in arbitrary functions we want to minimize, and (with a few caveats…), and get a local minimum, no calculus needed:

let epsilon = 0.000001

let g (x:Dual) = 3. * pown x 2 + 2. * x + 1.
minimize g 0. 0.1 epsilon |> printfn "Min of g at x = %f"

let h (x:Dual) = x + x * sin(x) + cos(x) * (3. * x  - 7.)
minimize h 0. 0.1 epsilon |> printfn "Min of h at x = %f"

>
Min of g at x = -0.333333
Min of h at x = -0.383727

Let’s make sure this is reasonable. g is a quadratic function, which has a minimum or maximum at –b/2*a, that is, –2 / 2 x 3 - this checks out. As for h, inspecting the function plot confirms that it has a minimum around the identified value:

Edit, 3/4/2015: changed function h to a simpler shape.

## Conclusion

I have barely started to scratch the surface of DiffSharp in this post, but so far, I really, really like its promise. While I limited my examples to single-variable functions, DiffSharp supports multivariate functions, and vector operations as well. The way it uses type inference is a bit challenging at first, but seems a reasonable price to pay for the resulting magic. My next step will probably be a less tiny example, perhaps a logistic regression against realistic data. I am very curious to try out the algebra bits – and also wondering in the back of my head how to best use the library in general. For instance, how easy is it to construct a function from external data, and turn it into the appropriate types for DiffSharp to work its magic? How well does this integrate with other libraries, say, Math.NET? We’ll see!

In the meanwhile, I’d recommend checking out the project page, which happens to also be beautifully documented! And, as always, you can ping me on twitter for comments or question.

15. February 2014 12:51

My favorite column in MSDN Magazine is Test Run; it was originally focused on testing, but the author, James McCaffrey, has been focusing lately on topics revolving around numeric optimization and machine learning, presenting a variety of methods and approaches. I quite enjoy his work, with one minor gripe –his examples are all coded in C#, which in my opinion is really too bad, because the algorithms would gain much clarity if written in F# instead.

Back in June 2013, he published a piece on Amoeba Method Optimization using C#. I hadn’t seen that approach before, and found it intriguing. I also found the C# code a bit too hairy for my feeble brain to follow, so I decided to rewrite it in F#.

In a nutshell, the Amoeba approach is a heuristic to find the minimum of a function. Its proper respectable name is the Nelder-Nead method. The reason it is also called the Amoeba method is because of the way the algorithm works: in its simple form, it starts from a triangle, the “Amoeba”; at each step, the Amoeba “probes” the value of 3 points in its neighborhood, and moves based on how much better the new points are. As a result, the triangle is iteratively updated, and behaves a bit like an Amoeba moving on a surface.

Before going into the actual details of the algorithm, here is how my final result looks like. You can find the entire code here on GitHub, with some usage examples in the Sample.fsx script file. Let’s demo the code in action: in a script file, we load the Amoeba code, and use the same function the article does, the Rosenbrock function. We transform the function a bit, so that it takes a Point (an alias for an Array of floats, essentially a vector) as an input, and pass it to the solve function, with the domain where we want to search, in that case, [ –10.0; 10.0 ] for both x and y:

#load "Amoeba.fs"

open Amoeba
open Amoeba.Solver

let g (x:float) y =
100. * pown (y - x * x) 2 + pown (1. - x) 2

let testFunction (x:Point) =
g x. x.

solve Default [| (-10.,10.); (-10.,10.) |] testFunction 1000

Running this in the F# interactive window should produce the following:

val it : Solution = (0.0, [|1.0; 1.0|])
>

The algorithm properly identified that the minimum is 0, for a value of x = 1.0 and y = 1.0. Note that results may vary: this is a heuristic, which starts with a random initial amoeba, so each run could produce slightly different results, and might at times epically fail.

More...

25. November 2012 09:10

I am still working my way through “Machine Learning in Action”, converting the samples from Python to F#. I am currently in the middle of chapter 6, dedicated to Support Vector Machines, which has given me more trouble than the previous ones. This post will be sharing my current progress: the code I have so far is a working translation of the naïve SVM implementation, presented in the first half of the chapter. We’ll get to kernels, and the full Platt SMO algorithm in a later post – today will be solely discussing the simple, un-optimized version.

Two factors slowed me down with this chapter: the math, and Python.

The math behind the algorithm is significantly more involved than the other algorithms, and I won’t even try to go into why it works. I recommend reading An Idiot’s guide to SVMs, which I found a pretty complete and accessible explanation of the theory behind SVMs. I will focus instead on the implementation, which was in itself a bit challenging.

First, the Python code uses algebra quite a bit, and I found that deciphering what was going on required a bit of patience. Take a line like the following:

fXi = (float)(multiply(alphas, labelMat).T*(dataMatrix*dataMatrix[i,:].T))+b

I am reasonably well versed in linear algebra, but figuring out what this is saying takes some attention. Granted, I have no experience with Python and NumPy, so my whining is probably a bit unfair. Still, I thought the code was not very readable, and it motivated me to see if that could be improved, and as a result I ended up moving away from heavy algebra notation.

Then, the algorithm is implemented as a Deep Arrow. A main loop performs computations and evaluates conditions at multiple points, using continue to exit / short-circuit the evaluation. The code I ended up with doesn’t use mutation, but is still heavily indented, which I am not happy about - I’ll work on that later.

## Simplified algorithm implementation

Note: as the title of the post indicates, this is work in progress. The current implementation works, but has some obvious flaws (see last paragraph), which I intend to fix in upcoming posts. My intent is to share my progression through the problem – please don’t take this as a good reference SVM implementation. Hopefully we’ll get there soon, but this is not it, not yet.

More...

27. June 2012 15:06

This months’ issue of MSDN Magazine has an interesting piece on evolutionary algorithms. The article applies a genetic algorithm to identify the minimum value of a “pathological” continuous function, the Schwefel function. The Schwefel function

For X and Y values between –500 and 500, the “correct” answer is X and Y = 420.9687.

This function is designed to give fits to optimization algorithms. The issue here is that  the function has numerous peaks and valleys. As a result, if the search strategy is to use some form of gradient approach, that is, from a starting point, follow the steepest descent until a minimum is reached, there is a big risk to end up in a place which is a local minimum: the algorithm gets stuck in a valley with no path downwards, but there are other, better solutions around, which can be reached only by “climbing out of the hole” and temporarily following a path which heads in the wrong direction.

Out of curiosity, I checked how the Excel Solver would fare on this problem: The result was an abject failure – not even close to the true solution.

I thought it would be interesting to see how Bumblebee, my Artificial Bee Colony framework, would perform. There are some general similarities between the underlying ideas behind the articles’ algorithm and Bumblebee, the main difference being that Bumblebee simply mutates individual solutions, and doesn’t create “crossover solutions”.

Let’s open Visual Studio, create an F# Console project, grab the NuGet package for Bumblebee – and start coding.

As usual, we need 4 elements to leverage Bumblebee – a Type of Solution, and 3 functions: a Generator, which returns a brand-new, random solution, a Mutator, which transforms a known solution into a new, similar solution, and an Evaluator, which evaluates a solution and returns a float, increasing with the quality of the solution.

In this case, the Solution type is fairly straightforward. We are looking for 2 floats x and y, so we’ll go for a Tuple. Similarly, the Evaluation is a given, it is the negative of the Schwefel function. The reason we go for the negative is that Bumblebee will try to maximize the Evaluation, so if we are looking for a minimum, we need to reverse the sign – because the Minimum of a function is the Maximum of its negative.

let schwefel x =
-x * Math.Sin(Math.Sqrt(Math.Abs(x)))

let evaluate (x, y) =
- schwefel (x) - schwefel (y)

The Generation function is also trivial – we’ll simply pick random pairs of floats in the [ –500.0; 500.0 ] interval:

let min = -500.0
let max = 500.0

let generate (rng: Random) = (
rng.NextDouble() * (max - min) + min,
rng.NextDouble() * (max - min) + min)

The Mutation function takes a tiny bit more of effort. The idea I followed is essentially the same as the one used in the article: given a solution defined by a pair of floats, randomly decide if any of the elements will be mutated, and if yes, add a random permutation to that element, scaled to the precision we want to achieve:

let precision = 0.00001
let rate = 0.5

let mutate (rng: Random) solution =
let (x, y) = solution
let x =
if rng.NextDouble() < rate
then x + precision * ((max - min) * rng.NextDouble() + min)
else x
let y =
if rng.NextDouble() < rate
then y + precision * ((max - min) * rng.NextDouble() + min)
else y
(x, y)

More...

1. April 2012 09:47

Last week’s StackOverflow newsletter contained a fun problem I had never seen before: Bipartite Matching. Here is the problem:

There are N starting points (purple) and N target points (green) in 2D. I want an algorithm that connects starting points to target points by a line segment (brown) without any of these segments intersecting (red) and while minimizing the cumulative length of all segments. Image from the original post on StackOverflow

I figured it would be fun to try out Bumblebee, my artificial bee colony library, on the problem. As the accepted answer points out, the constraint that no segment should intersect is redundant, and we only need to worry about minimizing the cumulative length, because reducing the length implies removing intersections.

As usual with Bumblebee, I’ll go first with the dumbest thing that could work. The solution involves matching points from two lists, so we’ll define a record type for Point and represent a Solution as two (ordered) lists of points, packed in a Tuple:

type Point = { X: float; Y: float }
let points = 100
let firstList = [ for i in 0 .. points -> { X = (float)i ; Y = float(i) } ]
let secondList =  [ for i in 0 .. points -> { X = (float)i ; Y = float(i) } ]

let root = firstList, secondList
We’ll start with a silly problem, where the 2 lists are identical: the trivial solution here is to match each point with itself, resulting in a zero-length, which will be convenient to see how well the algorithm is doing and how far it is from the optimum.

How can we Evaluate the quality of a solution? We need to pair up the points of each of the lists, compute the distance of each pair, and sum them up – fairly straightforward:

let distance pair =
((fst pair).X - (snd pair).X) ** 2.0 + ((fst pair).Y - (snd pair).Y) ** 2.0

let evaluate = fun (solution: Point list * Point list) ->
List.zip (fst solution) (snd solution)
|> List.sumBy (fun p -> – distance p)

More...