Mathias Brandewinder on .NET, F#, VSTO and Excel development, and quantitative analysis / machine learning.
6. May 2012 12:48

I am digging back into the Bumblebee code base, to clean it up before talking at the New England F# user group in Boston in June. As usual for me, it’s a humbling experience to face my own code, 6 months later, or, if you are an incorrigible optimist, it’s great to see that I am so much smarter today than a few months ago…

In any case, while toying with one of the samples, I noted that performance was degrading pretty steeply as the size of the problem was increasing. Most of the action revolved around producing random shuffles of a list, so I figured it would be interesting to look into it and see where I messed up how this could be improved upon.

Here is the original code, a quick-and-dirty implementation of the Fisher-Yates shuffle:

open System

let swap fst snd i =
if i = fst then snd else
if i = snd then fst else
i

let shuffle items (rng: Random) =
let rec shuffleTo items upTo =
match upTo with
| 0 -> items
| _ ->
let fst = rng.Next(upTo)
let shuffled = List.permute (swap fst (upTo - 1)) items
shuffleTo shuffled (upTo - 1)
let length = List.length items
shuffleTo items length

[<EntryPoint>]
let main argv =
let test = [1..10000]
let random = new Random()
let shuffled = shuffle test random
System.Console.WriteLine("done...")
0

Running the test case in fsi, using #time, produces the following:

Real: 00:00:42.735, CPU: 00:00:42.734, GC gen0: 2307, gen1: 6, gen2: 0

(Digression: #time is absolutely awesome – just typing #time;; in a fsi session will automatically display performance information, allowing to quickly tweak a function and fine-tune it “on the fly”. I wish I had known about it earlier.)

My initial assumption was that the problem revolved around performing multiple permutations of a List. However, I figured it would be interesting to take the opportunity and use the Performance Analysis tools provided in VS11 – and here is what I got:

Uh-oh. Looks like the shuffle is spending most of its time doing comparisons in the swap function – and what the hell is HashCompare.GenericEqualityIntrinsic doing in here? Something is off.

Looking into the swap function provides a hint:

F# has identified that the function could be made generic. It’s great, but in our case it comes with overhead, because we simply want to compare integers. Let’s mark the function as inline, to avoid that problem (we could also make the function non-generic, by marking one of the inputs as integer):

More...

7. April 2012 10:50

For no clear reason, I got interested in Convex Hull algorithms, and decided to see how it would look in F#. First, if you wonder what a Convex Hull is, imagine that you have a set of points in a plane – say, a board – and that you planted a thumbtack on each point. Now take an elastic band, stretch it, and wrap it around the thumbtacks. The elastic band will cling to the outermost tacks, leaving some tacks untouched. The convex hull is the set of tacks that are in contact with the elastic band; it is convex, because if you take any pair of points from the original set, the segment connecting them remains inside the hull.

The picture below illustrates the idea - the blue thumbtacks define the Convex Hull; all the yellow tacks are included within the elastic band, without touching it.

There are a few algorithms around to identify the Convex Hull of a set of points in 2 dimensions; I decided to go with Andrew’s monotone chain, because of its simplicity.

The insight of the algorithm is to observe that if you start from the leftmost tack, and follow the elastic downwards, the elastic turns only clockwise, until it reaches the rightmost tack. Similarly, starting from the right upwards, only clockwise turns happen, until the rightmost tack is reached. Given that the left- and right-most tacks belong to the convex hull, the algorithm constructs the upper and lower part of the hull by progressively constructing sequences that contain only clockwise turns.

More...

1. April 2012 09:47

Last week’s StackOverflow newsletter contained a fun problem I had never seen before: Bipartite Matching. Here is the problem:

There are N starting points (purple) and N target points (green) in 2D. I want an algorithm that connects starting points to target points by a line segment (brown) without any of these segments intersecting (red) and while minimizing the cumulative length of all segments.

Image from the original post on StackOverflow

I figured it would be fun to try out Bumblebee, my artificial bee colony library, on the problem. As the accepted answer points out, the constraint that no segment should intersect is redundant, and we only need to worry about minimizing the cumulative length, because reducing the length implies removing intersections.

As usual with Bumblebee, I’ll go first with the dumbest thing that could work. The solution involves matching points from two lists, so we’ll define a record type for Point and represent a Solution as two (ordered) lists of points, packed in a Tuple:

type Point = { X: float; Y: float }
let points = 100
let firstList = [ for i in 0 .. points -> { X = (float)i ; Y = float(i) } ]
let secondList =  [ for i in 0 .. points -> { X = (float)i ; Y = float(i) } ]

let root = firstList, secondList
We’ll start with a silly problem, where the 2 lists are identical: the trivial solution here is to match each point with itself, resulting in a zero-length, which will be convenient to see how well the algorithm is doing and how far it is from the optimum.

How can we Evaluate the quality of a solution? We need to pair up the points of each of the lists, compute the distance of each pair, and sum them up – fairly straightforward:

let distance pair =
((fst pair).X - (snd pair).X) ** 2.0 + ((fst pair).Y - (snd pair).Y) ** 2.0

let evaluate = fun (solution: Point list * Point list) ->
List.zip (fst solution) (snd solution)
|> List.sumBy (fun p -> – distance p)

More...

6. January 2012 07:08

I just came across this interesting homework problem on StackOverflow:

Given a group of n items, each with a distinct value V(i), what is the best way to divide the items into 3 groups so the group with the highest value is minimized? Give the value of this largest group.

This looks like one of these combinatorial problems where a deterministic algorithm exists, and will be guaranteed to identify the optimal solution, with the small caveat that larger problem may take an extremely long time to resolve.

Note that as ccoakley points out in his comment to the StackOverflow question, there is no reason for a greedy approach to produce the optimal answer.

I figured this would be a good test for Bumblebee, my Artificial Bee Colony algorithm, which can produce a good solution in reasonable time, at the cost of not being guaranteed to find the actual optimum solution.

While we are at it, I figured we could also relax the constraints, and break the group into an arbitrary number of groups, and have possible duplicates rather than unique values.

More...

26. December 2011 11:04

Last week I posted some screen captures of Bumblebee in action on the Traveling Salesman Problem. One thing was bugging me, though – the algorithm didn’t seem to handle crossings very well. I ended up adding a de-crossing routing, which ended up speeding up the process quite a bit. The full C# demo, illustrating how to use Bumblebee from C#, has been pushed as version 0.2 to CodePlex.

Below is an illustration of the algorithm in action, in a nice seasonal color scheme. Santa needs to travel through 200 random cities: starting from a terrible initial route, the algorithm progressively disentangles them and ends up with a pretty good-looking solution under 2 minutes, significantly reducing the carbon footprint of those reindeers:

TSP with 200 cities running for 2 minutes

So what was the issue with line crossings? Consider the following route, in perfect order:

Now imagine that while searching, the algorithm discovers the following path:

More...