Mathias Brandewinder on .NET, F#, VSTO and Excel development, and quantitative analysis / machine learning.
6. February 2013 15:06

In spite of being color blind, I am a visual guy – I like to see things. Nothing beats a chart to identify problems in your data. I also spend lots of time manipulating data in FSI, the F# REPL, and while solutions like FSharpChart makes it possible to produce nice graphs fairly easily, I still find it introduces a bit of friction, and wondered how complicated it would be to use Excel as a charting engine.

Turns out, it’s not very complicated. The typical use case for generating charts in Excel is to first put data in a spreadsheet, and use the corresponding range as a source for a chart. However, it’s also perfectly possible to directly create a Chart object, and manipulate its SeriesCollection, adding and editing Series, which are arrays of XValues and Values.

As a starting point, I decided to focus on 2 problems:

• plotting functions, in 2 and 3 dimensions,
• producing scatterplots.

Both are rather painful to do in Excel itself – and scatterplots are the one chart I really care about when analyzing data, because it helps figuring out whether or not some variables are related.

What I wanted was a smooth experience from FSI – start typing code, and ship data to Excel, without having to worry about the joys of the Excel interop and its syntax. The video below shows what I ended up with, in action.

Note: watching me type is about as exciting as watching paint dry, so I sped up the video from its original 5 minutes down to 2 - otherwise there is no trick or editing.

This year’s blockbuster: plotting functions from F# to Excel

More...

14. January 2012 14:16

I am putting together a demo VSTO add-in for my talk at the Excel Developer Conference. I wanted to play with charts a bit, and given that I am working off a .NET model, I figured it would be interesting to produce charts directly from the data, bypassing the step of putting data in a worksheet altogether.

In order to do that, we simply need to create a Chart in a workbook, add a Series to the SeriesCollection of the chart, and directly set the Series Values and XValues as an array, along these lines:

var excel = this.Application;
var workbook = excel.ActiveWorkbook;
var charts = workbook.Charts;
var chart = (Chart)charts.Add();

chart.ChartType = Excel.XlChartType.xlLine;
chart.Location(XlChartLocation.xlLocationAsNewSheet, "Tab Name");

var seriesCollection = (SeriesCollection)chart.SeriesCollection();
var series = seriesCollection.NewSeries();

series.Values = new double[] {1d, 3d, 2d, 5d};
series.XValues = new string[] {"A", "B", "C", "D"};
series.Name = "Series Name";

This will create a simple Line chart in its own sheet – without any reference to a worksheet data range.

Now why would I be interested in this approach, when it’s so convenient to create a chart from data that is already in Excel?

Suppose for a moment that you are dealing with the market activity on a stock, which you can retrieve from an external data source as a collection of StockActivity .NET objects:

public class StockActivity
{
public DateTime Day { get; set; }
public decimal Open { get; set; }
public decimal Close { get; set; }
}

In this case, extracting the array for the X and Y values would be a trivial matter, making it very easy to produce a chart of, say, the Close values over time:

// Create a few fake datapoints
var day1 = new StockActivity()
{
Day = new DateTime(2010, 1, 1),
Open = 100m,
Close = 110m
};
var day2 = new StockActivity()
{
Day = new DateTime(2010, 1, 2),
Open = 110m,
Close = 130m
};
var day3 = new StockActivity()
{
Day = new DateTime(2010, 1, 3),
Open = 130m,
Close = 105m
};
var history = new List<StockActivity>() { day1, day2, day3 };

var excel = this.Application;
var workbook = excel.ActiveWorkbook;
var charts = workbook.Charts;
var chart = (Chart)charts.Add();

chart.ChartType = Excel.XlChartType.xlLine;
chart.Location(XlChartLocation.xlLocationAsNewSheet, "Stock Chart);

var seriesCollection = (SeriesCollection)chart.SeriesCollection();
var series = seriesCollection.NewSeries();

series.Values = history.Select(it => (double)it.Close).ToArray();
series.XValues = history.Select(it => it.Day).ToArray();
series.Name = "Stock";

Using LINQ, we Select from the list the values we are interested in, and pass them into an array, ready for consumption into a chart, and boom! We are done.

If what you need to do is explore data and produce charts to figure out potentially interesting relationships, this type of approach isn’t very useful. On the other hand, if your problem is to produce on a regular basis the same set of charts, using data coming from an external data source, this is a very interesting option!

11. September 2011 12:39

The project I am currently working on involves developing a forecasting model. Starting from an initial estimate, the model will progressively update its forecast as time goes by and real data becomes available.

The process of developing such a model is iterative by nature: you design the mechanics of a forecasting algorithm, look at how it would have performed on historical data, fine-tune the design and parameters based on the results, and rinse & repeat.

The problem I started running into is the “look at how it would have performed on historical data”. There is plenty of data available, which is both a blessing and a curse. A blessing, because more data means better validation. A curse, because as the amount of data increases, it becomes difficult to navigate through it, and focus on individual cases.

So far, my approach has been to create metrics of fit between a model and a set of data, and to run a model against large sets of data, measuring how well the model is doing against the data set. However, I still don’t have a good solution for digging into why a particular case is not working so well. What I would like to achieve is to identify a problematic case, and explore what is going on, ideally by generating charts on the fly to visualize what is happening. Unfortunately, the tools I am using currently do not accommodate that scenario well. Excel is great at producing charts in a flexible manner, but my model is .NET code, and I don’t have a convenient, lightweight way to use C# code in Excel. Conversely, creating exploratory charts from C# is somewhat expensive, and requires a lengthy cycle: write code for the chart, compile (and lose whatever is loaded in memory), observe – and repeat.

I am currently exploring an alternative, via F# and FSharpChart. F# offers a very interesting possibility over C#, F# Interactive (fsi). Fsi is a REPL (Read, Evaluate, Print Loop), which allows you to type in code interactively in the console and execute it as you go. The beauty of it is that you can experiment with code live, without having to go through the code change / recompile cycle. Add to the mix FSharpChart, a library/script which wraps .NET DataVisualization.Charting and makes it conveniently usable from F#, and you get a nice way to write .NET code and generate charts, on the fly.

Let’s illustrate on a simple example. Suppose I have a model that simulates sales following a Poisson process, and want to check whether this “looks right”. First, let’s download FSharpChart, create a folder called “Explore” on the Desktop, and copy the FSharpChart.fsx script file into that folder. Then, let’s create an empty text file called Explore.fsx in the same folder, which we will use to experiment with code and charts, and save whatever snippets come in handy at the time.

Next, let’s double-click on the Explore.fsx file, which will then be opened in Visual Studio, and type in the following:

#load @"C:\Users\Mathias\Desktop\Explore\fsharpchart.fsx"

open System
open System.Drawing
open MSDN.FSharp.Charting

let random = new Random()

// Simulate a Poisson distribution with parameter lambda
let poisson lambda =
let L = Math.Exp(-lambda)
let rec simulate (k,p) =
if p > L then simulate (k + 1, p * random.NextDouble())
else k - 1
simulate (0, 1.0)

let sales lambda periods = [
for i in 1.0 .. periods -> (i, poisson lambda) ]

More...

16. April 2010 11:03

I am pretty excited that the new Silverlight Toolkit now supports Stacked Series. If you develop business applications, at some point or another, you will have to produce charts. A while back I looked into the WPF / Silverlight toolkit, which offers charting capabilities, and I really liked it, because its design supports mvvm-style databinding pretty well. There was one major drawback, though: it didn’t include any stacked series. The point of reference for most business users is Excel charts, and this left a large chunk of the standard charts out. Problem solved with the last release: the stacked histogram and some of his friends are in! The Excel “Outdoors Bar” type is still not included, but I don’t think anyone will complain. I can’t wait to play with this release.

29. October 2009 10:05

Recently, a client asked me if it was possible to create an Excel scatter plot of his products, adding a label on each data point, and using different colors and symbols for different types of products. You could think of this as plotting 5 dimensions at once, instead of the usual two.

I quickly coded a VBA macro to do that, with a sample workbook to illustrate the usage. The macro is pretty rough, but was sufficient for my needs as is, so I haven’t put extra efforts in: feel free to improve upon it…

Here is a sample of the output:

More...