Mathias Brandewinder on .NET, F#, VSTO and Excel development, and quantitative analysis / machine learning.
22. March 2014 13:11

A couple of days ago, I got into the following Twitter exchange:

So why do I think FsCheck + XUnit = The Bomb?

I have a long history with Test-Driven Development; to this day, I consider Kent Beck’s “Test-Driven Development by Example” one of the biggest influences in the way I write code (any terrible code I might have written is, of course, to be blamed entirely on me, and not on the book).

In classic TDD style, you typically proceed by writing incremental test cases which match your requirements, and progressively write the code that will satisfy the requirements. Let’s illustrate on an example, a password strength validator. Suppose that my requirements are “a password must be at least 8 characters long to be valid”. Using XUnit, I would probably write something along these lines:

namespace FSharpTests

open Xunit
open CSharpCode

module Password validator tests =

[<Fact>]
let length above 8 should be valid () =
let validator = Validator ()


… and in the CSharpCode project, I would then write the dumbest minimal implementation that could passes that requirement, that is:

public class Validator
{
{
return true;
}
}


Next, I would write a second test, to verify the obvious negative:

namespace FSharpTests

open Xunit
open CSharpCode

module Password validator tests =

[<Fact>]
let length above 8 should be valid () =
let validator = Validator ()

[<Fact>]
let length under 8 should not be valid () =
let validator = Validator ()


This fails, producing the following output in Visual Studio:

… which forces me to fix my implementation, for instance like this:

public class Validator
{
{
{
return false;
}

return true;
}
}


Let’s pause here for a couple of remarks. First, note that while my tests are written in F#, the code base I am testing against is in C#. Mixing the two languages in one solution is a non-issue. Then, after years of writing C# test cases with names like Length_Above_8 _Should_Be_Valid, and arguing whether this was better or worse than LengthAbove8 ShouldBeValid, I find that having the ability to simply write “length above 8 should be valid”, in plain old English (and seeing my tests show that way in the test runner as well), is pleasantly refreshing. For that reason alone, I would encourage F#-curious C# developers to try out writing tests in F#; it’s a nice way to get your toes in the water, and has neat advantages.

But that’s not the main point I am interested here. While this process works, it is not without issues. From a single requirement, “a password must be at least 8 characters long to be valid”, we ended up writing 2 test cases. First, the cases we ended up are somewhat arbitrary, and don’t fully reflect what they say. I only tested two instances, one 7 characters long, one 8 characters long. This is really relying on my ability as a developer to identify “interesting cases” in a vast universe of possible passwords, hoping that I happened to cover sufficient ground.

This is where FsCheck comes in. FsCheck is a port of Haskell’s QuickCheck, a property-based testing framework. The term “property” is somewhat overloaded, so let’s clarify: what “Property” means in that context is a property of our program that should be true, in the same sense as mathematically, a property of any number x is “x * x is positive”. It should always be true, for any input x.

Install FsCheck via Nuget, as well as the FsCheck XUnit extension; you can now write tests that verify properties by marking them with the attribute [<Property>], instead of [<Fact>], and the XUnit test runner will pick them up as normal tests. For instance, taking our example from right above, we can write:

namespace FSharpTests

open Xunit
open FsCheck
open FsCheck.Xunit
open CSharpCode

module Specification =

[<Property>]
let square should be positive (x:float) =
x * x > 0.


Let’s run that – fail. If you click on the test results, here is what you’ll see:

FsCheck found a counter-example, 0.0. Ooops! Our specification is incorrect here, the square value doesn’t have to be strictly positive, and could be zero. This is an obvious mistake, let’s fix the test, and get on with our lives:

[<Property>]
let square should be positive (x:float) =
x * x >= 0.


More...

6. February 2012 04:24

This post is part of a series providing commentary on the VSTO Stocks project. I initially developed it for the Excel Developers Conference in London, to illustrate some of the benefits or interesting features of VSTO add-ins compared to traditional VBA automation. The add-in is a work in progress, and is by no means production ready, but it is functional; I will update the code and add comments over time. Feel free to ask questions in the comments!

Level: beginner.

Code version: 346c1bd9394e

One of the key benefits I find in using VSTO for Office automation instead of VBA is that it enables using Source Control tools.

During a development effort, regardless of the technology used, lots of things can go wrong. A code change which initially looked like a great idea progressively degenerates into chaos, something goes awry with a file which becomes irrecoverably corrupt, a hard drive suddenly decides it is time to call it quits – all these happen. When they do, it’s nice to have a safety net, and know that somewhere, safe and warm, a snapshot of the code taken in happier times is waiting and can be restored, giving you a safe point to restart from, with only a few hours of work lost.

What I have often seen done with Excel development goes along these lines: on a regular basis, the developer saves the workbook somewhere “safe” with a time-stamp convention, like “MyWorkbook-2010-12-24.xlsx”.

On the plus side, this is a very lightweight process, which addresses some of the issues. At the same time, it is cumbersome: the developer needs to be diligent, the process is manual and error-prone (messing up the timestamp, or accidentally over-writing archives is very possible), and recovering the right version from a folder that contains multiple versions only identified by a timestamp is impractical.

Developers working in other ecosystems have been facing the same issue, and address it with specialized tools: source control systems. In a nutshell, the idea of source control is to operate like a library: the source code is stored in a “vault” (known as the Repository), developers check out a local copy of the current version on their machine, edit it, and check in/commit the modified code back into the vault if they are happy with the result.

Put differently, whatever code is currently being modified on the developer’s machine is “scratch paper”; it become “real” only once it is committed.

There are a few obvious benefits. The entire history of the project is saved for posterity, and its state at any point in time can be instantly restored. The system generates timestamps automatically, and each commit has comments attached to it, which helps navigation between versions. This encourages experimenting with code ideas: check out the code, spike something – if it works, great, if not, discard it and revert to the previous repository version.

[pic of repo on CodePlex]

Overview of the project history with Mercurial + Tortoise on Windows

More interestingly, version control systems typically store the difference between the current version and the previous one, and not the file itself. Besides keeping the size of the repository minimal, it also allows to produce “diffs”, i.e. code differences: the source control system can easily produce a view that highlights all the differences between two versions of the code, which is invaluable.

The “Diff” highlights what has been added or removed between versions.

Diff view of the changes to a specific file on CodePlex

## Why hasn’t the traditional Excel developer community embraced source control?

The main reason, in my opinion, is that Source Control systems are at their best when dealing with text files. While this is the case for most development platforms, Excel is peculiar in that aspect: the code is embedded in the Workbook, in multiple forms (Excel formulas, code-behind worksheets, macro modules…), and the overall project isn’t a collection of text files containing code. Up to version 2003, workbooks were saved as a proprietary binary file, which couldn’t be used to produce meaningful "differences” – and the Open XML format adopted since version 2007 still isn’t very practical for differentiation purposes.

By contrast, a VSTO add-in like VSTO Stocks consists entirely of .NET code, which is ultimately a collection of text files – there is nothing attached to a specific Workbook. As a result, it is a perfect fit for Source Control, with automatic archival of successive versions, and highly detailed “audit” of changes between versions.

Note that nothing prevents using Source Control tools for “classic” Excel development – I do it all the time, even when working with Excel 2003 workbooks. You won’t get the full benefits of source control (no diffs), but you will still get a history of all the code changes in the project. Also, if you tend to re-use VBA utilities like UDFs, .bas files are perfect candidates for source control: store the utilities .bas files in a repository, and import them in workbooks when you need them.

## How to get started with Souce Control?

A nice thing about Source Control tools is that some of the best and most widely used systems are open source and totally free. The two systems I use are Subversion (for the past 7 years or so) and Mercurial since a few months – they are both great. The other name that comes up a lot is Git, which as far as I know is very similar to Mercurial.

The main difference is that Subversion has a centralized model (there is a central “source of truth” where the official code resides, and where all changes get committed), whereas Mercurial is a distributed system (developers can work with the full  benefits of version control even disconnected from the “central”, and can merge their work with any other clone of the repository). Both are worth looking into, and offer different advantages. There are plenty of discussions comparing the two approaches, so I won’t go further into it.

On the other hand, regardless of what system you pick, I recommend installing Tortoise (TortoiseHg for Mercurial, TortoiseSVN for Subversion); it’s an extension which integrates source control with Windows, so that you can manage your repositories directly via the graphical user interface. It’s a great way to start without having to struggle with arcane command-line tools.

Tortoise integrates your Version Control system right into Windows.

12. September 2011 14:39

Just saw that version 3.1 of Microsoft Solver Foundation has been released. I haven’t had time to try it out yet, but the list of improvements looks promising. A better non linear solver, better MIP and MIQP – sounds like Santa has come early! And I am curious about the faster bindings with LINQ…

13. February 2011 13:46

I have been a fan of Test-Driven Development for a long time; it has helped me write better code and keep my sanity more than once. However, until now, I haven’t really looked into Behavior-Driven Development, even though I have often heard it described as a natural next step from TDD. A recent piece in MSDN Magazine re-ignited my interest, and helped me figure out one point I had misunderstood, namely how BDD and TDD fit together, so I started looking into existing frameworks.

Most of them follow a similar approach: write in a plain-text file a description of the feature in Gherkin, a “feature description” language that is human readable, let the framework generate test stubs which map to the story, and progressively fill the stubs as the feature gets implemented.

I am probably (too) used to writing tests as code, but something about the idea of starting from text files just doesn’t feel right to me. I understand the appeal of Gherkin as a platform-independent specification language, and of letting the product owner write specifications – but the thought of having to maintain two sets of files (the features and the actual tests) worries me. I may warm up to it in due time, but in the meanwhile I came across StoryQ, a framework which felt much easier to adopt for me.

StoryQ is a tiny dll, which permits to write stories as tests in C#, using a fluent interface, with all the comfort and safety of strong typing and intellisense; Gherkin stories can be produced as an output of the tests, and a separate utility allows you to create code templates from Gherkin.

Rather than talk about it, let’s see a quick code example. I have a regular NUnit TestFixture with one Test, which represents a Story I am interested in: when I pay the check at the restaurant, I need to add a tip to the check. There are 2 scenarios I am interested in: when I am happy, I’ll tip a nice 20%, but when I am not, there will be zero tip. This is how it could look like in StoryQ:

using NUnit.Framework;
using StoryQ;

[TestFixture]
public class CalculateTip
{
[Test]
public void CalculatingTheTip()
{
new Story("Calculating the Tip")
.InOrderTo("Pay the check")
.AsA("Customer")

.WithScenario("Unhappy with service")
.Given(CheckTotalIs, 100d)
.When(IAmHappyWithService, false)
.Then(TipShouldBe, 0d)

.WithScenario("Happy with service")
.Given(CheckTotalIs, 100d)
.When(IAmHappyWithService, true)
.Then(TipShouldBe, 20d)

.Execute();
}

public double CheckTotal { get; set; }

public bool IsHappy { get; set; }

public void CheckTotalIs(double total)
{
this.CheckTotal = total;
}

public void IAmHappyWithService(bool isHappy)
{
this.IsHappy = isHappy;
}

public void TipShouldBe(double expectedTip)
{
var tip = TipCalculator.Tip(CheckTotal, IsHappy);
Assert.AreEqual(expectedTip, tip);
}
}

(The TipCalculator class is a simple class I implemented on the side).

This test can now be run just like any other NUnit test; when I ran this with ReSharper within Visual Studio, I immediately saw the output below. Pretty nice, I say.

## What I liked so far

• Painless transition for someone used to TDD. For someone like me, who is used to write unit tests within Visual Studio, this is completely straightforward. No new language to learn, a process pretty similar to what I am used to – a breeze.
• Completely smooth integration with NUnit and ReSharper: no plugin to install, no tweaks, it just worked.
• Fluent interface: the fluent interface provides guidance as you write the story, and hints at what steps are expected next.
• Passing arguments: I like the API for expressing the Given/When/Then steps. Passing arguments feels very natural.
• xml report: I have not played much with it yet, but there is an option to produce an xml file with the results of the tests, which should work well with a continuous integration server.

## What I didn’t like that much

• Execute: at some point I inadvertently deleted the .Execute() call at the end of the Story, and it took me a while to figure out why all my tests were passing, but no output was produced. More generally, I would have preferred something like Verify(), which seems clearer, but that’s nitpicking.
• Multiple scenarios in one test: once I figured out that I could chain multiple scenarios in one story, I was a happy camper, but all the examples I saw on the project page have one story / one scenario per test method. It’s only when I used the WPF story converter that I realized I could do this.
• Crash of the WPF converter: the converter is awesome – but the first time I ran it it crashed.

So where do I go from there? So far, I really enjoyed playing with StoryQ – enough that I want to give it a go on a real project. I expect that the path to getting comfortable with BDD will be similar to TDD: writing lots of tests, some of them fairly bad, until over time a certain feeling for what’s right or very wrong develops… In spite of my reservations, I am skeptical but curious (after all, I have been known to be wrong sometimes…), so I also plan to give SpecFlow a try.

13. June 2010 12:30

In my last post I explored how ExcelDNA can be used to write high-performance UDFs for Excel, calling .Net code without the overhead of VSTO. Using .Net instead of VBA for intensive computations already yields a nice improvement. Still, I regretted that ExcelDNA supports .Net up to 3.5 only, which puts the Task Parallel Library off limits – and is too bad  because the TPL is just totally awesome to leverage the power of multi-cores.

As it turned out, this isn’t totally correct. Govert  Van Drimmelen (the man behind ExcelDNA) and Jon Skeet (the Chuck Norris of .Net) pointed that while the Task Parallel Library is a .Net 4.0 library, the Reactive Extensions for .Net 3.5 contains an unsupported 3.5 version of the TPL – which means that it should be possible to get parallelism to work with ExcelDNA.

This isn’t a pressing need of mine, so I thought I would leave that alone, and wait for the 4.0 version of ExcelDNA. Yeah right. Between my natural curiosity, Ross McLean’s comment (have fun at the Excel UK Dev Conference!), and the fact that I really want to know if I could get the Walkenbach test to run under 1 second, without too much of an effort, I had to check. And the good news is, yep, it works.

Last time we saw how to turn an average PC into a top-notch performer; let’s see how we can inject some parallelism to get a smoking hot calculation engine.

More...