Mathias Brandewinder on .NET, F#, VSTO and Excel development, and quantitative analysis / machine learning.
11. December 2011 11:06

Yesterday, I made the first version of Bumblee public on Codeplex. Version 0.1 is an Alpha release, meaning that it’s usable, but still rough around the edges.

What is Bumblebee? It is an Artificial Bee Colony (ABC) algorithm, a randomized search method which mimics the behavior of bee hives. Given a search problem, the algorithm will dispatch Scout bees to look for new solutions, and Active bees to explore around known solutions. Once their search is complete, bees return to the Hive and share information with Inactive bees, and new searches are allocated based on the information available so far.

I have multiple goals with Bumblebee. I came across the algorithm for the first time in this article, which illustrates it on the Traveling Salesman Problem with a C# implementation. I enjoyed the article, but wondered if I could

• parallelize the algorithm to use multiple cores,
• provide a general API to run arbitrary problems.

… and I figured it would be a good sample project to sharpen my F# skills.

For the parallelization part, I decided to use the Task Parallel Library: the Hive creates Tasks for each bee beginning a search, which returns to a Queue to be processed and share the search result with the inactive bees in the Hive (see outline here).

Deciding on a design for the API took some back and forth, and will likely evolve in the next release. The API should:

• accommodate any problem that can be solved by that approach,
• be reasonably simple to use by default,
• limit parallelization problems, in particular around random numbers generation,
• be palatable for F# and C# users.

Regarding the first point, 3 things are needed to solve a problem using ABC: Scouts need to know how to find new random solutions, Active bees need to be able to find a random neighbor of a solution, and solutions need to be comparable. I figured this could be expressed via 3 functions, held together in a Problem class:

• a generator, which given a RNG returns a new random solution,
• a mutator, which given a RNG + solution tuple, returns a random neighbor of the solution,
• an evaluator, which given a solution returns a float, measuring the quality of the solution.

You can see the algorithm in action in the TspDemo console application included in the source code.

I opted to have the Random Number Generator as an argument because the algorithm is responsible for spawning Tasks, and is therefore in a good position to provide a RNG that is safe to use, relieving the client of the responsibility of creating RNGs. I’ll probably rework this a bit, though, because I don’t like the direct dependency on Random; it is not very safe, and I would like to provide the ability to use whatever RNG users may happen to prefer.

The reason I chose to have the evaluator return a float (instead of using IComparable) is because I figured it might be interesting to have a measure which allowed the computation of rates of improvements in solution quality.

As for simplicity, I ended up with a main class Solver with 2 main methods. Search(Problem) initiates a search as a Task, and goes on forever until Stop( ) is called. The Solver exposes an event, FoundSolution, which fires every time an improvement is found, and returns a SolutionMessage, containing the solution, its quality, and the time of discovery. It is the responsibility of the user to decide when to stop the search, based on the information returned via the events.

By default, the Solver is configured with “reasonable” parameters, but if needed, they are exposed via properties, which can be modified before the search is initiated.

No effort has gone for the first release to make this C# friendly – this, and abstracting the RNG, are the goals of the next release.

I would love to get feedback, both on the overall design and the code itself! You can download the project from here.

5. September 2011 14:20

Back in April, inspired by an MSDN article, I began looking into converting a Simulated Bee Colony algorithms from C# to F#; I thought this would be an interesting exercise on my slow path to learning F#, and I got a rough implementation going. Attention-disorder deficit and life in general got me side-tracked, but, better late than never, I am now ready to get back to it.

One aspect I am interested in, is to figure out how the original algorithm could be modified for parallelization. In its original form, the algorithm followed a turn-based approach: given the current state of the hive, process sequentially every bee (active, inactive, and scout bees), and repeat until the pre-set number of iterations is reached.

How could we approach parallelization? At a high level, two operations are taking place:

• some bees are searching from new solutions to the problem, either by creating brand-new solutions (Scout bees), or by finding a solution close to an initial Solution (Active bees),
• bees that have completed a search come back to the Hive, and share the Solution they found to the Inactive bees via a Waggle Dance; Inactive bees can update their state based on that new information.

The first part, the Search, is easily parallelizable: while searching, a Bee shares no data with other bees. Therefore, multiple bees could be searching for new solutions, each on its own thread. On the other hand, the information sharing part is trickier: if multiple bees were to share their new solution with inactive bees at the same time, concurrency problems would likely arise.

One approach to resolve the issue is to avoid the concurrency problem, and make sure that by design, the information-sharing part is taking place sequentially, one bee at a time. Here is how we will achieve this:

A queue will hold bees returning from a Search with a new Solution, and process bees from the queue one by one, passing their information to the current inactive bees. Once it has shared its information with the inactive bees, the bee (or one of the inactive bees) is sent to Search again in parallel – and when its search completes, it returns to the queue, where it goes back in line and wait until it can be processed.

More...

13. June 2010 12:30

In my last post I explored how ExcelDNA can be used to write high-performance UDFs for Excel, calling .Net code without the overhead of VSTO. Using .Net instead of VBA for intensive computations already yields a nice improvement. Still, I regretted that ExcelDNA supports .Net up to 3.5 only, which puts the Task Parallel Library off limits – and is too bad  because the TPL is just totally awesome to leverage the power of multi-cores.

As it turned out, this isn’t totally correct. Govert  Van Drimmelen (the man behind ExcelDNA) and Jon Skeet (the Chuck Norris of .Net) pointed that while the Task Parallel Library is a .Net 4.0 library, the Reactive Extensions for .Net 3.5 contains an unsupported 3.5 version of the TPL – which means that it should be possible to get parallelism to work with ExcelDNA.

This isn’t a pressing need of mine, so I thought I would leave that alone, and wait for the 4.0 version of ExcelDNA. Yeah right. Between my natural curiosity, Ross McLean’s comment (have fun at the Excel UK Dev Conference!), and the fact that I really want to know if I could get the Walkenbach test to run under 1 second, without too much of an effort, I had to check. And the good news is, yep, it works.

Last time we saw how to turn an average PC into a top-notch performer; let’s see how we can inject some parallelism to get a smoking hot calculation engine.

More...

7. June 2010 10:23

Some time ago, I came across ExcelDNA, an open-source library designed to integrate .Net into Excel, via a  post by the Grumpy One, who described it as an interesting way to get Excel to talk to a compiled library. Sounds right down my alley, but I still managed to let 6 months pass until I finally tried it.

This reminded me of another post, by J-Walk this time, where he uses a random walk simulation in VBA to benchmark system performance. Back then, I ran the VBA code, and also the equivalent C# in a console app, out of curiosity: 11.38 seconds, vs. 2.73 seconds. Why not try the same experiment, and see if we can get the best of both worlds and bring some of the C# power into Excel via ExcelDNA?

So I created a Class Library, with the following method, a close equivalent to the VBA benchmark code:

public class Experiment
{
public static string RandomWalk()
{
var stopwatch = new Stopwatch();
stopwatch.Start();
var position = 0;
var random = new Random();
for (var run = 0; run < 100000000; run++)
{
if (random.Next(0, 2) == 0)
{
position++;
}
else
{
position--;
}
}
stopwatch.Stop();
var elapsed = (double)stopwatch.ElapsedMilliseconds / 1000d;
return "Position: " + position.ToString() + ", Time: " + elapsed.ToString();
}
}

More...