Mathias Brandewinder on .NET, F#, VSTO and Excel development, and quantitative analysis / machine learning.
by Mathias 14. April 2013 12:20

Last Thursday, I gave a talk at the Bay.NET user group in Berkeley, introducing F# to C# developers. First off, I have to thank everybody who came – you guys were great, lots of good questions, nice energy, I had a fantastic time!

My goal was to highlight why I think F# is awesome, and of course this had to include a Type Provider demo, one of the most amazing features of F# 3.0. So I went ahead, and demoed Tomas Petricek’s World Bank Type Provider, and Howard Mansell’s R Type Provider – together. The promise of Type Providers is to enable information-rich programming; in this case, we get immediate access to a wealth of data over the internet, in one line of code, entirely discoverable by IntelliSense in Visual Studio - and we can use all the visualization arsenal of R to see what’s going on. Pretty rad.

Rather than just dump the code, I thought it would be fun to turn that demo into a video. The result is a 7 minutes clip, with only minor editing (a few cuts, and I sped up the video x3 because the main point here isn’t how terrible my typing skills are). I think it’s largely self-explanatory, the only points that are worth commenting upon are:

  • I am using a NuGet package for the R Type Provider that doesn’t officially exist yet. I figured a NuGet package would make that Type Provider more usable, and spent my week-end creating it, but haven’t published it yet. Stay tuned!
  • The most complex part of the demo is probably R’s syntax from hell. For those of you who don’t know R, it’s a free, open-source statistical package which does amazingly cool things. What you need to know to understand this video is that R is very vector-centric. You can create a vector in R using the syntax myData <- c(1,2,3,4), and combine vectors into what’s called a data frame, essentially a collection of features. The R type provider exposes all R packages and functions through a single static type, aptly named R – so for instance, one can create a R vector from F# by typing let myData = R.c( [|1; 2; 3; 4 |]).

That’s it! Let me know what you think, and if you have comments or questions.

by Mathias 6. February 2013 15:06

In spite of being color blind, I am a visual guy – I like to see things. Nothing beats a chart to identify problems in your data. I also spend lots of time manipulating data in FSI, the F# REPL, and while solutions like FSharpChart makes it possible to produce nice graphs fairly easily, I still find it introduces a bit of friction, and wondered how complicated it would be to use Excel as a charting engine.

Turns out, it’s not very complicated. The typical use case for generating charts in Excel is to first put data in a spreadsheet, and use the corresponding range as a source for a chart. However, it’s also perfectly possible to directly create a Chart object, and manipulate its SeriesCollection, adding and editing Series, which are arrays of XValues and Values.

As a starting point, I decided to focus on 2 problems:

  • plotting functions, in 2 and 3 dimensions,
  • producing scatterplots.

Both are rather painful to do in Excel itself – and scatterplots are the one chart I really care about when analyzing data, because it helps figuring out whether or not some variables are related.

What I wanted was a smooth experience from FSI – start typing code, and ship data to Excel, without having to worry about the joys of the Excel interop and its syntax. The video below shows what I ended up with, in action.

Note: watching me type is about as exciting as watching paint dry, so I sped up the video from its original 5 minutes down to 2 - otherwise there is no trick or editing.

This year’s blockbuster: plotting functions from F# to Excel

More...

by Mathias 22. August 2012 13:35

In my recent post on Decision Tree Classifiers, I mentioned that I was too lazy to figure out how to visualize the Decision Tree “supporting” the classifier. Well, at times, the Internet can be an awesome place. Cesar Mendoza has forked the Machine Learning in Action GitHub project, and done a very fine job resolving that problem using the Microsoft Automatic Graph Layout library, and running it on the Lenses Dataset from the University of California, Irvine Machine Learning dataset repository.

Here is the result of the visualization, you can find his code here:

 

image

Unfortunately, as far as I can tell, the library is not open source, and requires a MSDN license. The amount of great stuff produced at Microsoft Research is amazing, it’s just too bad that at times licensing seems to get in the way of getting the word out…

by Mathias 11. September 2011 12:39

The project I am currently working on involves developing a forecasting model. Starting from an initial estimate, the model will progressively update its forecast as time goes by and real data becomes available.

The process of developing such a model is iterative by nature: you design the mechanics of a forecasting algorithm, look at how it would have performed on historical data, fine-tune the design and parameters based on the results, and rinse & repeat.

The problem I started running into is the “look at how it would have performed on historical data”. There is plenty of data available, which is both a blessing and a curse. A blessing, because more data means better validation. A curse, because as the amount of data increases, it becomes difficult to navigate through it, and focus on individual cases.

So far, my approach has been to create metrics of fit between a model and a set of data, and to run a model against large sets of data, measuring how well the model is doing against the data set. However, I still don’t have a good solution for digging into why a particular case is not working so well. What I would like to achieve is to identify a problematic case, and explore what is going on, ideally by generating charts on the fly to visualize what is happening. Unfortunately, the tools I am using currently do not accommodate that scenario well. Excel is great at producing charts in a flexible manner, but my model is .NET code, and I don’t have a convenient, lightweight way to use C# code in Excel. Conversely, creating exploratory charts from C# is somewhat expensive, and requires a lengthy cycle: write code for the chart, compile (and lose whatever is loaded in memory), observe – and repeat.

I am currently exploring an alternative, via F# and FSharpChart. F# offers a very interesting possibility over C#, F# Interactive (fsi). Fsi is a REPL (Read, Evaluate, Print Loop), which allows you to type in code interactively in the console and execute it as you go. The beauty of it is that you can experiment with code live, without having to go through the code change / recompile cycle. Add to the mix FSharpChart, a library/script which wraps .NET DataVisualization.Charting and makes it conveniently usable from F#, and you get a nice way to write .NET code and generate charts, on the fly.

Let’s illustrate on a simple example. Suppose I have a model that simulates sales following a Poisson process, and want to check whether this “looks right”. First, let’s download FSharpChart, create a folder called “Explore” on the Desktop, and copy the FSharpChart.fsx script file into that folder. Then, let’s create an empty text file called Explore.fsx in the same folder, which we will use to experiment with code and charts, and save whatever snippets come in handy at the time.

Setup

Next, let’s double-click on the Explore.fsx file, which will then be opened in Visual Studio, and type in the following:

#load @"C:\Users\Mathias\Desktop\Explore\fsharpchart.fsx"

open System
open System.Drawing
open MSDN.FSharp.Charting

let random = new Random()

// Simulate a Poisson distribution with parameter lambda
let poisson lambda =
   let L = Math.Exp(-lambda)
   let rec simulate (k,p) =
      if p > L then simulate (k + 1, p * random.NextDouble())
      else k - 1
   simulate (0, 1.0)

let sales lambda periods = [ 
   for i in 1.0 .. periods -> (i, poisson lambda) ]

More...

by Mathias 16. April 2010 11:03

I am pretty excited that the new Silverlight Toolkit now supports Stacked Series. If you develop business applications, at some point or another, you will have to produce charts. A while back I looked into the WPF / Silverlight toolkit, which offers charting capabilities, and I really liked it, because its design supports mvvm-style databinding pretty well. There was one major drawback, though: it didn’t include any stacked series. The point of reference for most business users is Excel charts, and this left a large chunk of the standard charts out. Problem solved with the last release: the stacked histogram and some of his friends are in! The Excel “Outdoors Bar” type is still not included, but I don’t think anyone will complain. I can’t wait to play with this release.

Comments

Comment RSS