Mathias Brandewinder on .NET, F#, VSTO and Excel development, and quantitative analysis / machine learning.
by Mathias 6. February 2013 15:06

In spite of being color blind, I am a visual guy – I like to see things. Nothing beats a chart to identify problems in your data. I also spend lots of time manipulating data in FSI, the F# REPL, and while solutions like FSharpChart makes it possible to produce nice graphs fairly easily, I still find it introduces a bit of friction, and wondered how complicated it would be to use Excel as a charting engine.

Turns out, it’s not very complicated. The typical use case for generating charts in Excel is to first put data in a spreadsheet, and use the corresponding range as a source for a chart. However, it’s also perfectly possible to directly create a Chart object, and manipulate its SeriesCollection, adding and editing Series, which are arrays of XValues and Values.

As a starting point, I decided to focus on 2 problems:

  • plotting functions, in 2 and 3 dimensions,
  • producing scatterplots.

Both are rather painful to do in Excel itself – and scatterplots are the one chart I really care about when analyzing data, because it helps figuring out whether or not some variables are related.

What I wanted was a smooth experience from FSI – start typing code, and ship data to Excel, without having to worry about the joys of the Excel interop and its syntax. The video below shows what I ended up with, in action.

Note: watching me type is about as exciting as watching paint dry, so I sped up the video from its original 5 minutes down to 2 - otherwise there is no trick or editing.

This year’s blockbuster: plotting functions from F# to Excel

More...

by Mathias 20. January 2013 13:17

I am still toying with the idea of using FSI from within Excel – wouldn’t it be nice if, instead of having to resort to VBA or C# via VSTO, I could leverage F#, with unfettered access to .NET and a nice scripting language, while having at my disposal things like the charting abilities of Excel?

Judging from the discussion on Twitter this morning, it seems I am not the only one to like the idea of F# in Excel:

I am still far from this perfect world, and wouldn’t mind some input from the F# community, because I am having a hard time figuring out where the sweet spot is. At that point, what I have is a pretty rudimentary WPF FSI editor, written in C#.

Note: yes, I should have written it in F#, shame on me! I am still more comfortable with the WPF/C# combo at the moment, but getting increasingly uncomfortable with XAML and the amount of magic string involved in data binding. Jon Harrop presented some very stimulating ideas on this topic at the last San Francisco F# user group, I intend to try the NoXAML route at a later point.

Anyways, you can find my rudimentary editor here on GitHub, with a crude WPF demo. Running it should produce something like this:

Editor

I quite liked how FsNotebook organized the code into blocks and separated the inputs and outputs, so I followed the same idea: you can add new sections at the top and evaluate each one separately, and see the result at the bottom. There is obviously plenty of work to do still, but at least this is a working prototype.

Note: the code is still pretty ugly, and totally not ready for prime time. Specifically, resources aren’t disposed properly at all – use at your own peril!

More...

by Mathias 27. June 2012 15:06

This months’ issue of MSDN Magazine has an interesting piece on evolutionary algorithms. The article applies a genetic algorithm to identify the minimum value of a “pathological” continuous function, the Schwefel function.

SchefelFunction

The Schwefel function

For X and Y values between –500 and 500, the “correct” answer is X and Y = 420.9687.

This function is designed to give fits to optimization algorithms. The issue here is that  the function has numerous peaks and valleys. As a result, if the search strategy is to use some form of gradient approach, that is, from a starting point, follow the steepest descent until a minimum is reached, there is a big risk to end up in a place which is a local minimum: the algorithm gets stuck in a valley with no path downwards, but there are other, better solutions around, which can be reached only by “climbing out of the hole” and temporarily following a path which heads in the wrong direction.

Out of curiosity, I checked how the Excel Solver would fare on this problem:

ExcelSchwefel

The result was an abject failure – not even close to the true solution.

I thought it would be interesting to see how Bumblebee, my Artificial Bee Colony framework, would perform. There are some general similarities between the underlying ideas behind the articles’ algorithm and Bumblebee, the main difference being that Bumblebee simply mutates individual solutions, and doesn’t create “crossover solutions”.

Let’s open Visual Studio, create an F# Console project, grab the NuGet package for Bumblebee – and start coding.

As usual, we need 4 elements to leverage Bumblebee – a Type of Solution, and 3 functions: a Generator, which returns a brand-new, random solution, a Mutator, which transforms a known solution into a new, similar solution, and an Evaluator, which evaluates a solution and returns a float, increasing with the quality of the solution.

In this case, the Solution type is fairly straightforward. We are looking for 2 floats x and y, so we’ll go for a Tuple. Similarly, the Evaluation is a given, it is the negative of the Schwefel function. The reason we go for the negative is that Bumblebee will try to maximize the Evaluation, so if we are looking for a minimum, we need to reverse the sign – because the Minimum of a function is the Maximum of its negative.

let schwefel x = 
   -x * Math.Sin(Math.Sqrt(Math.Abs(x)))

let evaluate (x, y) = 
   - schwefel (x) - schwefel (y)

The Generation function is also trivial – we’ll simply pick random pairs of floats in the [ –500.0; 500.0 ] interval:

let min = -500.0
let max = 500.0

let generate (rng: Random) = (
   rng.NextDouble() * (max - min) + min,
   rng.NextDouble() * (max - min) + min)

The Mutation function takes a tiny bit more of effort. The idea I followed is essentially the same as the one used in the article: given a solution defined by a pair of floats, randomly decide if any of the elements will be mutated, and if yes, add a random permutation to that element, scaled to the precision we want to achieve:

let precision = 0.00001
let rate = 0.5

let mutate (rng: Random) solution =
   let (x, y) = solution
   let x =
      if rng.NextDouble() < rate 
      then x + precision * ((max - min) * rng.NextDouble() + min)
      else x
   let y =
      if rng.NextDouble() < rate 
      then y + precision * ((max - min) * rng.NextDouble() + min)
      else y
   (x, y)

More...

by Mathias 19. February 2012 15:37

This post is part of a series providing commentary on the VSTO Stocks project. I initially developed it for theExcel Developers Conference in London, to illustrate some of the benefits or interesting features of VSTO add-ins compared to traditional VBA automation. The add-in is a work in progress, and is by no means production ready, but it is functional; I will update the code and add comments over time. Feel free to ask questions in the comments!

Level: intermediate.

Code version: 39134382823e 

The straightforward way to organize an Add-In solution is to just create a single project of type Excel 2007 Add-In (or whichever Office application / version you may target). So why would I go through the pain of structuring my Solution into four distinct Projects in the VSTO Stocks add-in?

Having a single project is a perfectly valid way to proceed, with things kept simple and tight. However, a drawback of going that route is that it makes it easy to write untestable code, with poor separation of concerns.

Note: in the past, I also ran into testing issues with one single project, because I couldn’t reference the Add-In project itself in my automated tests suite, which was problematic. This problem seems to be gone now.

Specifically, one source of such problems is to directly reference either ThisAddIn or Globals classes to access the Excel object model. They behave more or less as static classes would, which is a testing nightmare. For instance, in order to access the Active worksheet in a method, I could write code like this:

public void DoSomeStuff()
{
   var addIn = Globals.ThisAddIn;
   var excel = addIn.Application;         
   var workbook = excel.ActiveWorkbook;
   var activeSheet = workbook.ActiveSheet;
   // do stuff with the sheet
}

As is, this method is virtually untestable: we do not control the instantiation of Globals, and cannot replace it with fakes to emulate various situations that could arise in the application. What if, for instance, we wanted to test the behavior of that function when no worksheet is active?

That problem would be avoided altogether if instead of accessing Excel through the Add-In, we injected a reference to Excel in the method:

public void DoSomeStuff(Application excel)
{       
   var workbook = excel.ActiveWorkbook;
   var activeSheet = workbook.ActiveSheet;
   // do stuff with the sheet
}

Now we are free to substitute Application with our own version, set it up to emulate whatever scenario we see fit, and test what the behavior of the method should be in that case. I’ll revisit that question later when discussing automated tests, and leave it at that for now. For that matter, note also that we could perfectly well re-use that method anywhere, as there is no dependency on the add-in.

By separating into multiple projects, we can enforce that rule. In our Solution, the add-in acts purely as a bootstrapper: its only role is to kick things off when Excel starts, and initialize the UserInterface project, where the core of the application logic resides.

Note: I may rename the UserInterface project to ExcelApplication later on, the name would be more fitting.

The UserInterface project has a reference to Microsoft.Office.Interop.Excel, but not the Add-In, and therefore cannot use Globals or ThisAddin. In the ThisAddIn_Startup method, the running instance of Excel is retrieved and passed to the TaskPaneBuilder in the UserInterface project. From that point on, the Add-In itself doesn’t do anything: the ExcelController in the UserInterface project holds a reference to Excel, and handles all the interactions with Excel going forward.

So what is the Domain project about, then?

This one may or may not be a luxury item, depending on your project. The intent of the Domain project is to represent the business model you are dealing with; it should have no knowledge of Excel (hence the lack of reference to the Excel interop). For instance, StockHistory or MovingAverage are concepts which exist and can be modeled independently of Excel. Our add-in is currently using Excel as a client, but nothing would preclude developing a web-based version later, for instance. In that case, having a separate library which contains the Domain would prove very valuable, because it could be used as-is: the “only” effort required would be to write a different client / user interface to interact with that Domain model.

The name of the fourth project, UnitTests, should give away its purpose: it hosts the automated tests that verify certain behavior of the application. We’ll revisit it later on.

Of course, you don’t have to follow that organization - I also write add-ins which live in one single project! The right organization for your solution depends on each individual project, their complexity, and how they may grow over time. The structure/ideas I describe here are on the heavy side, but have served me well over the years - I hope they will help you think about your projects as well!

by Mathias 14. January 2012 14:16

I am putting together a demo VSTO add-in for my talk at the Excel Developer Conference. I wanted to play with charts a bit, and given that I am working off a .NET model, I figured it would be interesting to produce charts directly from the data, bypassing the step of putting data in a worksheet altogether.

In order to do that, we simply need to create a Chart in a workbook, add a Series to the SeriesCollection of the chart, and directly set the Series Values and XValues as an array, along these lines:

var excel = this.Application;
var workbook = excel.ActiveWorkbook;
var charts = workbook.Charts;
var chart = (Chart)charts.Add();

chart.ChartType = Excel.XlChartType.xlLine;
chart.Location(XlChartLocation.xlLocationAsNewSheet, "Tab Name");

var seriesCollection = (SeriesCollection)chart.SeriesCollection();
var series = seriesCollection.NewSeries();

series.Values = new double[] {1d, 3d, 2d, 5d};
series.XValues = new string[] {"A", "B", "C", "D"};
series.Name = "Series Name";

This will create a simple Line chart in its own sheet – without any reference to a worksheet data range.

Now why would I be interested in this approach, when it’s so convenient to create a chart from data that is already in Excel?

Suppose for a moment that you are dealing with the market activity on a stock, which you can retrieve from an external data source as a collection of StockActivity .NET objects:

public class StockActivity
{
   public DateTime Day { get; set; }
   public decimal Open { get; set; }
   public decimal Close { get; set; }
}

In this case, extracting the array for the X and Y values would be a trivial matter, making it very easy to produce a chart of, say, the Close values over time:

// Create a few fake datapoints
var day1 = new StockActivity()
              {
                 Day = new DateTime(2010, 1, 1), 
                 Open = 100m, 
                 Close = 110m
              };
var day2 = new StockActivity()
              {
                 Day = new DateTime(2010, 1, 2), 
                 Open = 110m, 
                 Close = 130m
              };
var day3 = new StockActivity()
              {
                 Day = new DateTime(2010, 1, 3), 
                 Open = 130m, 
                 Close = 105m
              };
var history = new List<StockActivity>() { day1, day2, day3 };

var excel = this.Application;
var workbook = excel.ActiveWorkbook;
var charts = workbook.Charts;
var chart = (Chart)charts.Add();

chart.ChartType = Excel.XlChartType.xlLine;
chart.Location(XlChartLocation.xlLocationAsNewSheet, "Stock Chart);

var seriesCollection = (SeriesCollection)chart.SeriesCollection();
var series = seriesCollection.NewSeries();

series.Values = history.Select(it => (double)it.Close).ToArray();
series.XValues = history.Select(it => it.Day).ToArray();
series.Name = "Stock";

Using LINQ, we Select from the list the values we are interested in, and pass them into an array, ready for consumption into a chart, and boom! We are done.

If what you need to do is explore data and produce charts to figure out potentially interesting relationships, this type of approach isn’t very useful. On the other hand, if your problem is to produce on a regular basis the same set of charts, using data coming from an external data source, this is a very interesting option!

Comments

Comment RSS