Mathias Brandewinder on .NET, VSTO and Excel development, and quantitative analysis.
by Mathias 30. January 2012 01:32

As happy as I am with how Bumblebee came out so far, there was one sore spot bugging me. I tried my best to write it in a functional style, but failed in one place. The inner part of the Search algorithm, which processes bees coming back in the queue with a new solution, was written as a while loop, using two mutable references to maintain the best solution found so far, and the list of solutions stored by the inactive bees.

And then it hit me yesterday, as I was reading some material on F# Azure worker roles.

Novice:

   Master, I fail to find the path to the immutable way.

Master:

   Immutable way?

   Nothing changes, just call the

   Immutable way.

This is an execrable Haiku, and a reminder to myself. When I get stuck with my old mutable ways in F#, usually the answer is recursion.

In this case, all it took was rewriting the inner loop as a recursive function. The original loop looked along these lines:

while cancel.IsCancellationRequested = false do
   match returnQueue.IsEmpty with
   // the returning bees queue is not empty
   | false -> 
      let success, bee = returnQueue.TryDequeue()
      match success with
      // a returning bee has been found in the queue
      | true -> 
         inactives := waggle !inactives bee
         let candidate = Hive.solutionOf bee
         if candidate.Quality > best.Value.Quality then
            best.Value <- candidate
            foundSolution.Trigger(new SolutionMessage<'a>(candidate))
         else ignore ()
      // more code, irrelevant to the point

The recursive version simply includes in its arguments list all the previously mutable variables, and calls itself, passing along the result of the updates, along these lines:

let rec loop (queue: ConcurrentQueue<Bee<'a>>) best inactives (cancel: CancellationTokenSource) =
   if cancel.IsCancellationRequested then ignore ()
   else
      let success, bee = queue.TryDequeue()
      if success then
      // a returning bee has been found in the queue
         let updatedInactives = waggle inactives bee
         let candidate = Hive.solutionOf bee
         let updatedBest = bestOf candidate best
         if candidate.Quality > best.Quality 
         then foundSolution.Trigger(new SolutionMessage<'a>(candidate))
         
         dispatchBee updatedInactives pickRandom waggle bee

         loop queue updatedBest updatedInactives cancel

      else loop queue best inactives cancel

The structure is essentially the same, but all the mutable variables are now gone; instead, the recursive function passes forward new “updated” values.

I think what tripped me up is the fact that I never used recursion to run an “infinite loop” before – I always saw them done using while true statements. Conversely, I always used recursion to produce an actual result, computing intermediary results until a termination condition was met. This one is a bit different, because the recursion simply passes along some state information, but doesn’t return anything (its return is unit) or terminate until a cancellation happens.

by Mathias 11. December 2011 11:06

Yesterday, I made the first version of Bumblee public on Codeplex. Version 0.1 is an Alpha release, meaning that it’s usable, but still rough around the edges.

What is Bumblebee? It is an Artificial Bee Colony (ABC) algorithm, a randomized search method which mimics the behavior of bee hives. Given a search problem, the algorithm will dispatch Scout bees to look for new solutions, and Active bees to explore around known solutions. Once their search is complete, bees return to the Hive and share information with Inactive bees, and new searches are allocated based on the information available so far.

I have multiple goals with Bumblebee. I came across the algorithm for the first time in this article, which illustrates it on the Traveling Salesman Problem with a C# implementation. I enjoyed the article, but wondered if I could

  • parallelize the algorithm to use multiple cores,
  • provide a general API to run arbitrary problems.

… and I figured it would be a good sample project to sharpen my F# skills.

For the parallelization part, I decided to use the Task Parallel Library: the Hive creates Tasks for each bee beginning a search, which returns to a Queue to be processed and share the search result with the inactive bees in the Hive (see outline here).

Deciding on a design for the API took some back and forth, and will likely evolve in the next release. The API should:

  • accommodate any problem that can be solved by that approach,
  • be reasonably simple to use by default,
  • limit parallelization problems, in particular around random numbers generation,
  • be palatable for F# and C# users.

Regarding the first point, 3 things are needed to solve a problem using ABC: Scouts need to know how to find new random solutions, Active bees need to be able to find a random neighbor of a solution, and solutions need to be comparable. I figured this could be expressed via 3 functions, held together in a Problem class:

  • a generator, which given a RNG returns a new random solution,
  • a mutator, which given a RNG + solution tuple, returns a random neighbor of the solution,
  • an evaluator, which given a solution returns a float, measuring the quality of the solution.

You can see the algorithm in action in the TspDemo console application included in the source code.

I opted to have the Random Number Generator as an argument because the algorithm is responsible for spawning Tasks, and is therefore in a good position to provide a RNG that is safe to use, relieving the client of the responsibility of creating RNGs. I’ll probably rework this a bit, though, because I don’t like the direct dependency on Random; it is not very safe, and I would like to provide the ability to use whatever RNG users may happen to prefer.

The reason I chose to have the evaluator return a float (instead of using IComparable) is because I figured it might be interesting to have a measure which allowed the computation of rates of improvements in solution quality.

As for simplicity, I ended up with a main class Solver with 2 main methods. Search(Problem) initiates a search as a Task, and goes on forever until Stop( ) is called. The Solver exposes an event, FoundSolution, which fires every time an improvement is found, and returns a SolutionMessage, containing the solution, its quality, and the time of discovery. It is the responsibility of the user to decide when to stop the search, based on the information returned via the events.

By default, the Solver is configured with “reasonable” parameters, but if needed, they are exposed via properties, which can be modified before the search is initiated.

No effort has gone for the first release to make this C# friendly – this, and abstracting the RNG, are the goals of the next release.

I would love to get feedback, both on the overall design and the code itself! You can download the project from here.

by Mathias 4. December 2011 15:47

I am in the middle of “Working Effectively with Legacy Code”, and found it every bit as great as it was said to be. In the book, Feathers introduces the concept of Seams and Enabling Points:

a Seam is a place where you can alter behavior in your program without editing it in that place

every seam has an enabling point, a place where you can make the decision to use one behavior or another.

The idea - as I understand it - is that an enabling point is a hook for testability, a place where you can replace the behavior of a piece of code with your own controlled behavior, and validate that the results are as expected.

The reason I am bringing this up is that I have been writing lots of F# lately, and it made me realize that a functional style provides lots of enabling points, and can be much easier to test than object-oriented code.

Here is a simplified, but representative, example of the problem I was looking at: I needed to pick a random item in a list. In C#, a method along these lines would do the job:

public T PickFrom(IList<T> list)
{
   var random = new Random();
   return list[random.Next(list.Count())];
}

However, this code is utterly untestable; it’s also probably a terrible idea to instantiate a new Random every time this is called, so we modify it this way:

public T PickFrom(IList<T> list, Random random)
{
   return list[random.Next(list.Count())];
}

This is much better: now we have a decent Enabling Point, because the list of arguments of the method contains everything that is used inside the method. However, this is still untestable, but for a different reason: by definition, Random.Next() will return different values every time PickFrom is called, and expecting a repeatable result from PickFrom is a bit of a desperate enterprise.

More...

by Mathias 13. November 2011 11:03

We were doing some pair-programming with Petar recently, Test-Driven Development style, and started talking about how figuring out where to begin with the tests is often the hardest part. Petar noticed that when writing a test, I was typically starting at the end, first writing an Assert, and then coding my way backwards in the test – and that it helped getting things started.

I hadn’t realized I was doing it, and suspected it was coming from Kent Beck’s “Test-Driven Development, by Example”. Sure enough, the Patterns section of the book lists the following:

Assert First. When should you write the asserts? Try writing them first.

So why would this be a good idea?

I think the reason it works well, is that it helps focus the effort on one single goal at a time, and requires clarifying what that goal is. Starting with the Assert forces you to imagine one single fact that should be true once you have implemented the feature, and to think about how you are going to verify that the feature is indeed working.

Once the Assert is in place, you can now write the story backwards: what is the method that was called to get the result being checked, and  where does it belong? What classes and setup is required to make that method call? And, now that the story is written, what is it really saying, and what should the test method name be?

In other words, begin with the Assert, figure out the Act part, Arrange the actors, and (re)name the test method.

I think what trips some people is that while a good test will look like a little story, progressing from a beginning to a logical end, the process leading to it follows a completely opposite direction. Kent Beck points the Self-Similarity in the entire process: write stories which describe what the application will do once done, write tests which describe what the feature does once the code is implemented, and write asserts which will pass once the test is complete. Always start with the end in mind, and do exactly what it takes to achieve your goal.

dyn002_original_472_480_pjpeg_2665126_e9a998bfcc0456c55ca52c2c29484609[1]

by Mathias 5. September 2011 14:20

Back in April, inspired by an MSDN article, I began looking into converting a Simulated Bee Colony algorithms from C# to F#; I thought this would be an interesting exercise on my slow path to learning F#, and I got a rough implementation going. Attention-disorder deficit and life in general got me side-tracked, but, better late than never, I am now ready to get back to it.

One aspect I am interested in, is to figure out how the original algorithm could be modified for parallelization. In its original form, the algorithm followed a turn-based approach: given the current state of the hive, process sequentially every bee (active, inactive, and scout bees), and repeat until the pre-set number of iterations is reached.

How could we approach parallelization? At a high level, two operations are taking place:

  • some bees are searching from new solutions to the problem, either by creating brand-new solutions (Scout bees), or by finding a solution close to an initial Solution (Active bees),
  • bees that have completed a search come back to the Hive, and share the Solution they found to the Inactive bees via a Waggle Dance; Inactive bees can update their state based on that new information.

The first part, the Search, is easily parallelizable: while searching, a Bee shares no data with other bees. Therefore, multiple bees could be searching for new solutions, each on its own thread. On the other hand, the information sharing part is trickier: if multiple bees were to share their new solution with inactive bees at the same time, concurrency problems would likely arise.

One approach to resolve the issue is to avoid the concurrency problem, and make sure that by design, the information-sharing part is taking place sequentially, one bee at a time. Here is how we will achieve this:

ParallelBees

A queue will hold bees returning from a Search with a new Solution, and process bees from the queue one by one, passing their information to the current inactive bees. Once it has shared its information with the inactive bees, the bee (or one of the inactive bees) is sent to Search again in parallel – and when its search completes, it returns to the queue, where it goes back in line and wait until it can be processed.

More...

Comments

Comment RSS