Mathias Brandewinder on .NET, F#, VSTO and Excel development, and quantitative analysis / machine learning.
by Mathias 30. January 2012 01:32

As happy as I am with how Bumblebee came out so far, there was one sore spot bugging me. I tried my best to write it in a functional style, but failed in one place. The inner part of the Search algorithm, which processes bees coming back in the queue with a new solution, was written as a while loop, using two mutable references to maintain the best solution found so far, and the list of solutions stored by the inactive bees.

And then it hit me yesterday, as I was reading some material on F# Azure worker roles.

Novice:

   Master, I fail to find the path to the immutable way.

Master:

   Immutable way?

   Nothing changes, just call the

   Immutable way.

This is an execrable Haiku, and a reminder to myself. When I get stuck with my old mutable ways in F#, usually the answer is recursion.

In this case, all it took was rewriting the inner loop as a recursive function. The original loop looked along these lines:

while cancel.IsCancellationRequested = false do
   match returnQueue.IsEmpty with
   // the returning bees queue is not empty
   | false -> 
      let success, bee = returnQueue.TryDequeue()
      match success with
      // a returning bee has been found in the queue
      | true -> 
         inactives := waggle !inactives bee
         let candidate = Hive.solutionOf bee
         if candidate.Quality > best.Value.Quality then
            best.Value <- candidate
            foundSolution.Trigger(new SolutionMessage<'a>(candidate))
         else ignore ()
      // more code, irrelevant to the point

The recursive version simply includes in its arguments list all the previously mutable variables, and calls itself, passing along the result of the updates, along these lines:

let rec loop (queue: ConcurrentQueue<Bee<'a>>) best inactives (cancel: CancellationTokenSource) =
   if cancel.IsCancellationRequested then ignore ()
   else
      let success, bee = queue.TryDequeue()
      if success then
      // a returning bee has been found in the queue
         let updatedInactives = waggle inactives bee
         let candidate = Hive.solutionOf bee
         let updatedBest = bestOf candidate best
         if candidate.Quality > best.Quality 
         then foundSolution.Trigger(new SolutionMessage<'a>(candidate))
         
         dispatchBee updatedInactives pickRandom waggle bee

         loop queue updatedBest updatedInactives cancel

      else loop queue best inactives cancel

The structure is essentially the same, but all the mutable variables are now gone; instead, the recursive function passes forward new “updated” values.

I think what tripped me up is the fact that I never used recursion to run an “infinite loop” before – I always saw them done using while true statements. Conversely, I always used recursion to produce an actual result, computing intermediary results until a termination condition was met. This one is a bit different, because the recursion simply passes along some state information, but doesn’t return anything (its return is unit) or terminate until a cancellation happens.

by Mathias 14. January 2012 14:16

I am putting together a demo VSTO add-in for my talk at the Excel Developer Conference. I wanted to play with charts a bit, and given that I am working off a .NET model, I figured it would be interesting to produce charts directly from the data, bypassing the step of putting data in a worksheet altogether.

In order to do that, we simply need to create a Chart in a workbook, add a Series to the SeriesCollection of the chart, and directly set the Series Values and XValues as an array, along these lines:

var excel = this.Application;
var workbook = excel.ActiveWorkbook;
var charts = workbook.Charts;
var chart = (Chart)charts.Add();

chart.ChartType = Excel.XlChartType.xlLine;
chart.Location(XlChartLocation.xlLocationAsNewSheet, "Tab Name");

var seriesCollection = (SeriesCollection)chart.SeriesCollection();
var series = seriesCollection.NewSeries();

series.Values = new double[] {1d, 3d, 2d, 5d};
series.XValues = new string[] {"A", "B", "C", "D"};
series.Name = "Series Name";

This will create a simple Line chart in its own sheet – without any reference to a worksheet data range.

Now why would I be interested in this approach, when it’s so convenient to create a chart from data that is already in Excel?

Suppose for a moment that you are dealing with the market activity on a stock, which you can retrieve from an external data source as a collection of StockActivity .NET objects:

public class StockActivity
{
   public DateTime Day { get; set; }
   public decimal Open { get; set; }
   public decimal Close { get; set; }
}

In this case, extracting the array for the X and Y values would be a trivial matter, making it very easy to produce a chart of, say, the Close values over time:

// Create a few fake datapoints
var day1 = new StockActivity()
              {
                 Day = new DateTime(2010, 1, 1), 
                 Open = 100m, 
                 Close = 110m
              };
var day2 = new StockActivity()
              {
                 Day = new DateTime(2010, 1, 2), 
                 Open = 110m, 
                 Close = 130m
              };
var day3 = new StockActivity()
              {
                 Day = new DateTime(2010, 1, 3), 
                 Open = 130m, 
                 Close = 105m
              };
var history = new List<StockActivity>() { day1, day2, day3 };

var excel = this.Application;
var workbook = excel.ActiveWorkbook;
var charts = workbook.Charts;
var chart = (Chart)charts.Add();

chart.ChartType = Excel.XlChartType.xlLine;
chart.Location(XlChartLocation.xlLocationAsNewSheet, "Stock Chart);

var seriesCollection = (SeriesCollection)chart.SeriesCollection();
var series = seriesCollection.NewSeries();

series.Values = history.Select(it => (double)it.Close).ToArray();
series.XValues = history.Select(it => it.Day).ToArray();
series.Name = "Stock";

Using LINQ, we Select from the list the values we are interested in, and pass them into an array, ready for consumption into a chart, and boom! We are done.

If what you need to do is explore data and produce charts to figure out potentially interesting relationships, this type of approach isn’t very useful. On the other hand, if your problem is to produce on a regular basis the same set of charts, using data coming from an external data source, this is a very interesting option!

by Mathias 6. January 2012 07:08

I just came across this interesting homework problem on StackOverflow:

Given a group of n items, each with a distinct value V(i), what is the best way to divide the items into 3 groups so the group with the highest value is minimized? Give the value of this largest group.

This looks like one of these combinatorial problems where a deterministic algorithm exists, and will be guaranteed to identify the optimal solution, with the small caveat that larger problem may take an extremely long time to resolve.

Note that as ccoakley points out in his comment to the StackOverflow question, there is no reason for a greedy approach to produce the optimal answer.

I figured this would be a good test for Bumblebee, my Artificial Bee Colony algorithm, which can produce a good solution in reasonable time, at the cost of not being guaranteed to find the actual optimum solution.

While we are at it, I figured we could also relax the constraints, and break the group into an arbitrary number of groups, and have possible duplicates rather than unique values.

More...

Comments

Comment RSS