Mathias Brandewinder on .NET, F#, VSTO and Excel development, and quantitative analysis / machine learning.
28. June 2010 13:14

A client asked me recently a fun probability question, which revolved around figuring out the probability of success of a research program. In a simplified form, here is the problem: imagine that you have multiple labs, each developing products which have independent probabilities of succeeding – what is the probability of more than a certain number of products being eventually successful?

Let’s illustrate on a simple example. Product A has a 30% probability of success, and product B a 60% probability of success. Combining these into a probability tree, we work out that there is an 18% chance of having 2 products successful, 18% + 12 % + 42% = 72% chance of having 1 or more products succeed, and 28% chances of a total failure.

It’s not a very complicated theoretical problem. Practically, however, when the number of products increases, the number of outcomes becomes large, fairly fast – and working out every single combination by hand is extremely tedious.

Fortunately, using a simple trick, we can generate these combinations with minimal effort. The representation of integers in base 2 is a decomposition in powers of 2, resulting in a unique sequence of 0 and 1. In our simplified example, if we consider the numbers 0, 1, 2 and 3, their decomposition is

0 = 0 x 2^2 + 0 x 2^1 –> 00

1 = 0 x 2^2 + 1 ^ 2^1 –> 01

2 = 1 x 2^2 + 0 x 2^1 –> 10

3 = 1 x 2^2 + 1 x 2^2 –> 11

As a result, if if consider a 1 to encode the success of a product, and a 0 its failure, the binary representation of integers from 0 to 3 gives us all possible outcomes for our two-products scenario.

More...

21. June 2010 13:12

Lately I have spent time on a pet project, which requires access to historical financial data. Mads Kristensen has a nice post where he shows how to read  stock quotes from Yahoo finance using C#, which was very helpful to get started. I figured it would be interesting to try out a conversion to F# and see what the result looked like.

Mads focus is on getting quasi real-time updates of a quote; my interest is in an easier problem: retrieving historical data. Fortunately, Yahoo provides a free service for that, too. Given a quote symbol and two dates, it returns a comma-separated file list of all the values for the quote between these 2 dates.

So what do we need to do? Given a valid symbol and 2 dates, we want to create the WebRequest to send to Yahoo, retrieve the response, break it into lines, and parse each line into a quote, which will be added to a list. The core of the resulting program will be the ReadQuotes function, which will look like this:

let ReadQuotes symbol date1 date2 =
CreateRequest symbol date1 date2
|> GetResponse
|> BreakIntoLines
|> CreateQuotes symbol

## Creating the WebRequest

The web request required to obtain historical data from Yahoo follows this pattern:

http://ichart.finance.yahoo.com/table.csv?s=S&a=A&b=B&c=C&d=D&e=E&f=F&g=d&ignore=.csv

where:

• S is the symbol (ex: MSFT)
• A, B, C are the start month, day and year, the month being coded in base 0 (i.e. January is 0)
• D, E, F are the end month, day and year, the month being coded in base 0 (i.e. January is 0)

For instance, replacing S with MSFT, A with 0, B with 1, C with 2010, D with 1, E with 15, F with 2010, will return all the available quotes for Microsoft between January 1 and February 15, 2010.

Let’s start by creating a Console application, by selecting new F# project > F# Application, and typing in the following code:

open System;
open System.Net
open System.IO
open System.Text

let RetrieveDateInfo (date:DateTime) =
(date.Day, date.Month-1, date.Year)

let CreateRequest symbol startDate endDate =

let startDay, startMonth, startYear = RetrieveDateInfo startDate
let endDay, endMonth, endYear = RetrieveDateInfo endDate

let query = String.Format("&a={0}&b={1}&c={2}&d={3}&e={4}&f={5}&g=d&ignore=.csv", startMonth, startDay, startYear, endMonth, endDay, endYear)
let url = "http://ichart.finance.yahoo.com/table.csv?s=" + symbol + query
WebRequest.Create(url)

More...

13. June 2010 12:30

In my last post I explored how ExcelDNA can be used to write high-performance UDFs for Excel, calling .Net code without the overhead of VSTO. Using .Net instead of VBA for intensive computations already yields a nice improvement. Still, I regretted that ExcelDNA supports .Net up to 3.5 only, which puts the Task Parallel Library off limits – and is too bad  because the TPL is just totally awesome to leverage the power of multi-cores.

As it turned out, this isn’t totally correct. Govert  Van Drimmelen (the man behind ExcelDNA) and Jon Skeet (the Chuck Norris of .Net) pointed that while the Task Parallel Library is a .Net 4.0 library, the Reactive Extensions for .Net 3.5 contains an unsupported 3.5 version of the TPL – which means that it should be possible to get parallelism to work with ExcelDNA.

This isn’t a pressing need of mine, so I thought I would leave that alone, and wait for the 4.0 version of ExcelDNA. Yeah right. Between my natural curiosity, Ross McLean’s comment (have fun at the Excel UK Dev Conference!), and the fact that I really want to know if I could get the Walkenbach test to run under 1 second, without too much of an effort, I had to check. And the good news is, yep, it works.

Last time we saw how to turn an average PC into a top-notch performer; let’s see how we can inject some parallelism to get a smoking hot calculation engine.

More...

7. June 2010 10:23

Some time ago, I came across ExcelDNA, an open-source library designed to integrate .Net into Excel, via a  post by the Grumpy One, who described it as an interesting way to get Excel to talk to a compiled library. Sounds right down my alley, but I still managed to let 6 months pass until I finally tried it.

This reminded me of another post, by J-Walk this time, where he uses a random walk simulation in VBA to benchmark system performance. Back then, I ran the VBA code, and also the equivalent C# in a console app, out of curiosity: 11.38 seconds, vs. 2.73 seconds. Why not try the same experiment, and see if we can get the best of both worlds and bring some of the C# power into Excel via ExcelDNA?

So I created a Class Library, with the following method, a close equivalent to the VBA benchmark code:

public class Experiment
{
public static string RandomWalk()
{
var stopwatch = new Stopwatch();
stopwatch.Start();
var position = 0;
var random = new Random();
for (var run = 0; run < 100000000; run++)
{
if (random.Next(0, 2) == 0)
{
position++;
}
else
{
position--;
}
}
stopwatch.Stop();
var elapsed = (double)stopwatch.ElapsedMilliseconds / 1000d;
return "Position: " + position.ToString() + ", Time: " + elapsed.ToString();
}
}

More...

4. June 2010 12:19

I am very honored that the East Bay chapter of the Bay Area .Net user group will have me as a speaker this upcoming Wednesday. I’ll be talking “For Those About to Mock”, about Mocks and Stubs in .Net, after a presentation of the unit testing features of Visual Studio by Deborah Kurata – an apt opening topic, if you ask me.