Mathias Brandewinder on .NET, F#, VSTO and Excel development, and quantitative analysis / machine learning.
12. May 2015 13:50

As much as we people who write code like to talk about code, the biggest challenge in a software project is not code. A project rarely fails because of technology – it usually fails because of miscommunications: the code that is delivered solves a problem (sometimes), but not the right one. One of the reasons we often deliver the wrong solution is that coding involves translating the world of the original problem into a different language. Translating one way is hard enough as it is, but then, rarely are users comfortable with reading and interpreting code – and as a result, confirming whether the code “does the right thing” is hard, and errors go un-noticed.

This is why the idea of Ubiquitous Language, coined by Eric Evans in his Domain Driven Design book, always appealed to me. The further apart the languages of the domain expert and the code are, the more likely it is that something will be lost in translation.

However, achieving this perfect situation, with “a language structured around the domain model and used by all team members to connect all the activities of the team with the software” [source], is hard. I have tried this in the past, mainly through tests. My idea at the time was that tests, especially BDD style, could perhaps provide domain experts with scenarios similar enough to their worldview that they could serve as a basis for an active dialogue. The experience wasn’t particularly successful: it helped some, but in the end, I never got to the point where tests would become a shared, common ground (which doesn’t mean it’s not possible – I just didn’t manage to do it).

Fast forward a bit to today – I just completed a project, and it’s the closest I have ever been to seeing Ubiquitous Language in action. It was one of the most satisfying experiences I had, and F# had a lot to do with why it worked.

The project involved some pretty complex modeling, and only two people – me and the client. The client is definitely a domain expert, and on the very high end of the “computer power user” spectrum: he is very comfortable with SQL, doesn’t write software applications, but has a license to Visual Studio and is not afraid of code.

The fact that F# worked well for me isn’t a surprise – I am the developer in that equation, and I love it, for all the usual technical reasons. It just makes my life writing code easier. The part that was interesting here is that F# worked well for the client, too, and became our basis for communication.

What ended up happening was the following: I created a GitHub private repository, and started coding in a script file, fleshing out a domain model with small, runnable pieces of code illustrating what it was doing. We would have regular Skype meetings, with a screen share so that I could walk him through the code in Visual Studio, and explain the changes I made - and we would discuss. Soon after, he started to run the code himself, and even making small changes here and there, not necessarily the most complicated bits, but more domain-specific parts, such as adjusting parameters and seeing how the results would differ. And soon, I began receiving emails containing specific scenarios he had experimented with, using actual production data, and pointing at possible flaws in my approach, or questions that required clarifications.

So how did F# make a difference? I think it’s a combination of at least 2 things: succinctness, and static typing + scripts. Succinctness, because you can define a domain with very little code, without loosing expressiveness. As a result, the core entities of the domain end up taking a couple of lines at the top of a single file, and it’s easy to get a full picture, without having to navigate around between files and folders, and keep information in your head. As an illustration, here is a snippet of code from the project:

type Window = { Early:DateTime; Target:DateTime; Late:DateTime }

type Trip = {
ID:TripID
Origin:Location
Destination:Location
Pickup:Window
Dropoff:Window
Dwell:TimeSpan }

type Action =
| Pickup of Trip
| Dropoff of Trip
| CompleteRoute of Location

This is concise, and pretty straightforward – no functional programming guru credentials needed. This is readable code, which we can talk about without getting bogged down in extraneous details.

The second ingredient is static typing + scripts. What this creates is a safe environment for experimentation.  You can just change a couple of lines here and there, run the code, and see what happens. And when you break something, the compiler immediately barks at you – just undo or fix it. Give someone a running script, and they can start playing with it, and exploring ideas.

In over 10 years writing code professionally, I never had such a collaborative, fruitful, and productive interaction cycle with a client. Never. This was the best of both worlds – I could focus on the code and the algorithms, and he could immediately use it, try it out, and send me invaluable feedback, based on his domain knowledge. No noise, no UML diagrams, no slides, no ceremony – just write code, and directly communicate around it, making sure nothing was amiss. Which triggered this happy tweet a few weeks back:

We were looking at the code together, and my client spotted a domain modeling mistake, right there. This is priceless.

As a side-note, another thing that is priceless is “F# for Fun and Profit”. Scott Wlaschin has been doing an incredible work with this website. It’s literally a gold mine, and I picked up a lot of ideas there. If you haven’t visited it yet, you probably should.

3. May 2015 16:34

I got curious the other day about how to measure the F# community growth, and thought it could be interesting to take a look at this through StackOverflow. As it turns out, it’s not too hard to get some data, because StackExchange exposes a nice API, which allows you to make all sorts of queries and get a JSON response back.

As a starting point, I figured I would just try to get the number of questions asked per month. The API allows you to retrieve questions on any site, by tag, between arbitrary dates. Responses are paged: you can get up to 100 items per page, and keep asking for next pages until there is nothing left to receive. That sounds like a perfect job for the FSharp.Data JSON Type Provider.

First things first, we create a type, Questions, by pointing the JSON Type Provider to a url that returns questions; based on the structure of the JSON document it receives, the Type Provider creates a type, which we will then be able to use to make queries:

#I"../packages"
#r @"FSharp.Data.2.2.0\lib\net40\FSharp.Data.dll"
open FSharp.Data
open System

[<Literal>]
let sampleUrl = "https://api.stackexchange.com/2.2/questions?site=stackoverflow"
type Questions = JsonProvider<sampleUrl>

Next, we’ll need to grab all the questions tagged F# between 2 given dates. As an example, the following would return the second page (questions 101 to 200) from all F# questions asked between January 1, 2014 and January 31, 2015:

https://api.stackexchange.com/2.2/questions?page=2&pagesize=100&fromdate=1420070400&todate=1422662400&tagged=F%23&site=stackoverflow

There are a couple of quirks here. First, the dates are in UNIX standard, that is, the number of seconds elapsed from January 1, 1970. Then, we need to keep pulling pages, until the response indicates that there are no more questions to receive, which is indicated by the HasMore property. That’s not too hard: let’s create a couple of functions, first to convert a .NET date to a UNIX date, and then to build up a proper query, appending the page and dates we are interested in to our base query – and finally, let’s build a request that recursively calls the API and appends results, until there is nothing left:

let fsharpQuery = "https://api.stackexchange.com/2.2/questions?site=stackoverflow&tagged=F%23&pagesize=100"

let unixEpoch = DateTime(1970,1,1)
let unixTime (date:DateTime) =
(date - unixEpoch).TotalSeconds |> int64

let page (page:int) (query:string) =
sprintf "%s&page=%i" query page
let between (from:DateTime) (to:DateTime) (query:string) =
sprintf "%s&&fromdate=%i&todate=%i" query (unixTime from) (unixTime to)

let questionsBetween (from:DateTime) (to:DateTime) =
let baseQuery = fsharpQuery |> between from to
let rec pull results p =
let nextPage = Questions.Load (baseQuery |> page p)
let results = results |> Array.append nextPage.Items
if (nextPage.HasMore)
then pull results (p+1)
else results
pull Array.empty 1

And we are pretty much done. At that point, we can for instance ask for all the questions asked in January 2015, and check what percentage were answered:

let january2015 = questionsBetween (DateTime(2015,1,1)) (DateTime(2015,1,31))

january2015
|> Seq.averageBy (fun x -> if x.IsAnswered then 1. else 0.)
|> printfn "Average answer rate: %.3f"

… which produces a fairly solid 78%.

If you play a bit more with this, and perhaps try to pull down more data, you might experience (as I did) the Big StackExchange BanHammer. As it turns out, the API has usage limits (which is totally fair). In particular, if you ask for too much data, too fast, you will get banned from making requests, for a dozen hours or so.

This is not pleasant. However, in their great kindness, the API designers have provided a way to avoid it. When you are making too many requests, the response you receive will include a field named “backoff”, which indicates for how many seconds you should back off until you make your next call.

This got me stumped for a bit, because that field doesn’t show up by default on the response – only when you are hitting the limit. As a result, I wasn’t sure how to pass that information to the JSON Type Provider, until Max Malook helped me out (thanks so much, Max!). The trick here is to supply not one sample response to the type provider, but a list of samples, in that case, one without the backoff field, and one with it.

I carved out an artisanal, hand-crafted sample for the occasion, along these lines:

[<Literal>]
let sample = """
[{"items":[
{"tags":["f#","units-of-measurement"],//SNIPPED FOR BREVITY}],
"has_more":false,
"quota_max":300,
"quota_remaining":294},
{"items":[
{"tags":["f#","units-of-measurement"],//SNIPPED FOR BREVITY}],
"has_more":false,
"quota_max":300,
"quota_remaining":294,
"backoff":10}]"""

type Questions = JsonProvider<sample,SampleIsList=true>

… and everything is back in order – we can now modify the recursive request, causing it to sleep for a bit when it encounters a backoff. Not the cleanest solution ever, but hey, I just want to get data here:

let questionsBetween (from:DateTime) (to:DateTime) =
let baseQuery = fsharpQuery |> between from to
let rec pull results p =
let nextPage = Questions.Load (baseQuery |> page p)
let results = results |> Array.append nextPage.Items
if (nextPage.HasMore)
then
match nextPage.Backoff with
| None -> ignore ()
pull results (p+1)
else results
pull Array.empty 1

So what were the results? I decided, quite arbitrarily, to count questions month by month since January 2010. Here is how the results looks like:

Clearly, the trend is up – it doesn’t take an advanced degree in statistics to see that. It’s interesting also to see the slump around 2012-2013; I can see a similar pattern in the Meetup registration numbers in San Francisco. My sense is that after a spike in interest in 2010, when F# launched with Visual Studio, there hasn’t been much marketing push for the language, and interest eroded a bit, until serious community-driven efforts took place. However, I don’t really have data to back that up – this is speculation.

How this correlates to overall F# adoption is another question: while I think this curves indicates growth, the number of questions on StackOverflow is clearly a very indirect measurement of how many people actually use it, and StackOverflow itself is a distorted sample of the overall population. Would be interesting to take a similar look at GitHub, perhaps…

30. March 2015 16:34

In our previous post, we looked at James McCaffrey’s code, “Gradient Descent Training Using C#” from MSDN magazine, and took a stab at rewriting the first part in F#, to clarify a bit the way the dataset was created. Today, we’ll dive in the second block, which implements the logistic regression using gradient descent. Again, we won’t discuss why the algorithm works – the article does a pretty good job at that – and focus instead purely on the F# / C# conversion part.

Let’s begin by taking a look at the core of the C# code, which lives in the LogisticClassifier class. I took the liberty to do some minor cleanup, and remove some parts which were un-necessary, so as to make it a bit easier to see what is going on:

public class LogisticClassifier
{
private int numFeatures; // number of x variables aka features
private double[] weights; // b0 = constant
private Random rnd;

public LogisticClassifier(int numFeatures)
{
this.numFeatures = numFeatures;
this.weights = new double[numFeatures + 1]; // [0] = b0 constant
this.rnd = new Random(0);
}

public double[] Train(double[][] trainData, int maxEpochs, double alpha)
{
// alpha is the learning rate
int epoch = 0;
int[] sequence = new int[trainData.Length]; // random order
for (int i = 0; i < sequence.Length; ++i)
sequence[i] = i;

while (epoch < maxEpochs)
{
++epoch;

if (epoch % 100 == 0 && epoch != maxEpochs)
{
double mse = Error(trainData, weights);
Console.Write("epoch = " + epoch);
Console.WriteLine("  error = " + mse.ToString("F4"));
}

Shuffle(sequence); // process data in random order

// stochastic/online/incremental approach
for (int ti = 0; ti < trainData.Length; ++ti)
{
int i = sequence[ti];
double computed = ComputeOutput(trainData[i], weights);
int targetIndex = trainData[i].Length - 1;
double target = trainData[i][targetIndex];

weights[0] += alpha * (target - computed) * 1; // the b0 weight has a dummy 1 input
for (int j = 1; j < weights.Length; ++j)
weights[j] += alpha * (target - computed) * trainData[i][j - 1];
}
} // while
return this.weights; // by ref is somewhat risky
} // Train

private void Shuffle(int[] sequence)
{
for (int i = 0; i < sequence.Length; ++i)
{
int r = rnd.Next(i, sequence.Length);
int tmp = sequence[r];
sequence[r] = sequence[i];
sequence[i] = tmp;
}
}

private double Error(double[][] trainData, double[] weights)
{
// mean squared error using supplied weights
int yIndex = trainData[0].Length - 1; // y-value (0/1) is last column
double sumSquaredError = 0.0;
for (int i = 0; i < trainData.Length; ++i) // each data
{
double computed = ComputeOutput(trainData[i], weights);
double desired = trainData[i][yIndex]; // ex: 0.0 or 1.0
sumSquaredError += (computed - desired) * (computed - desired);
}
return sumSquaredError / trainData.Length;
}

private double ComputeOutput(double[] dataItem, double[] weights)
{
double z = 0.0;
z += weights[0]; // the b0 constant
for (int i = 0; i < weights.Length - 1; ++i) // data might include Y
z += (weights[i + 1] * dataItem[i]); // skip first weight
return 1.0 / (1.0 + Math.Exp(-z));
}
} // LogisticClassifier

Just from the length of it, you can tell that most of the action is taking place in the Train method, so let’s start there. What we have here is two nested loops. The outer one runs maxEpoch times, a user defined parameter. Inside that loop, we randomly shuffle the input dataset, and then loop over each training example, computing the predicted output of the logistic function for that example, comparing it to a target, the actual  label of the example, which can be 0 or 1, and adjusting the weights so as to reduce the error. We also have a bit of logging going on, displaying the prediction error every hundred outer iteration. Once the two loops are over, we return the weights.

Two things strike me here. First, a ton of indexes are involved, and this tends to obfuscate what is going on; as a symptom, a few comments are needed, to clarify how the indexes work, and what piece of the data is organized. Then, there is a lot of mutation going on. It’s not necessarily a bad thing, but I tend to avoid it as much as possible, simply because it requires keeping more moving parts in my head when I try to follow the code, and also, as McCaffrey himself points out in a comment, because “by ref is somewhat risky”.

As a warm up, let’s begin with the error computation, which is displayed every 100 iterations.  Rather than having to remember in what column the actual expected value is stored, let’s make our life easier, and use a type alias, Example, so that the features are neatly tucked in an array, and the value is clearly separated. We need to compute the average square difference between the expected value, and the output of the logistic function for each example. As it turns out, we have already implemented the logistic function in the first part in the code, so re-implementing it as in ComputeOutput seems like un-necessary work – we can get rid of that part entirely, and simply map every example to the square error, and compute the average, using pattern matching on the examples to separate clearly the features and the expected value:

type Example = float [] * float

let Error (trainData:Example[], weights:float[]) =
// mean squared error using supplied weights
trainData
|> Array.map (fun (features,value) ->
let computed = logistic weights features
let desired = value
(computed - desired) * (computed - desired))
|> Array.average

Some of you might argue that this could be made tighter – I can think of at least two possibilities. First, using a Tuple might not be the most expressive approach; replacing it with a Record instead could improve readability. Then, we could also skip the map + average part, and directly ask F# to compute the average on the fly:

type Example = { Features:float[]; Label:float }

let Error (trainData:Example[], weights:float[]) =
trainData
|> Array.averageBy (fun example ->
let computed = logistic weights example.Features
let desired = example.Label
(computed - desired) * (computed - desired))

I will keep my original version the way it is, mostly because we created a dataset based on tuples last times.

We are now ready to hit the center piece of the algorithm. Just like we would probably try to extract a method in C#, we will start extracting some of the gnarly code that lies in the middle:

for (int ti = 0; ti < trainData.Length; ++ti)
{
int i = sequence[ti];
double computed = ComputeOutput(trainData[i], weights);
int targetIndex = trainData[i].Length - 1;
double target = trainData[i][targetIndex];

weights[0] += alpha * (target - computed) * 1; // the b0 weight has a dummy 1 input
for (int j = 1; j < weights.Length; ++j)
weights[j] += alpha * (target - computed) * trainData[i][j - 1];
}

Rather than modify the weights values, it seems safer to compute new weights. And because we opted last week to insert a column with ones for the constant feature, we won’t have to deal with the index misalignment, which requires separate handling for b0 and the rest. Instead, we can write an update operation that takes in an example and weights, and returns new weights:

let update (example:Example) (weights:float[]) =
let features,target = example
let computed = logistic weights features
weights
|> Array.mapi (fun i w ->
w + alpha * (target - computed) * features.[i])

Array.mapi allows us to iterate over the weights, while maintaining the index we are currently at, which we use to grab the feature value at the corresponding index. Alternatively, you could go all verbose and zip the arrays together – or all fancy with a double-pipe and map2 to map the two arrays in one go. Your pick:

Array.zip weights features
|> Array.map (fun (weight,feat) ->
weight + alpha * (target - computed) * feat)

(weights,features)
||> Array.map2 (fun weight feat ->
weight + alpha * (target - computed) * feat)

We are now in a very good place; the only thing left to do is to plug that into the two loops. The inner loop is a perfect case for a fold (the Aggregate method in LINQ): given a starting value for weights, we want to go over every example in our training set, and, for each of them, run the update function to compute new weights. For the while loop, we’ll take a different approach, and use recursion: when the epoch reaches maxEpoch, you are done, return the weights, otherwise, keep shuffling the data and updating weights. Let’s put that all together:

let Train (trainData:Example[], numFeatures, maxEpochs, alpha, seed) =

let rng = Random(seed)
let epoch = 0

let update (example:Example) (weights:float[]) =
let features,target = example
let computed = logistic weights features
weights
|> Array.mapi (fun i w ->
w + alpha * (target - computed) * features.[i])

let rec updateWeights (data:Example[]) epoch weights =

if epoch % 100 = 0
then printfn "Epoch: %i, Error: %.2f" epoch (Error (data,weights))

if epoch = maxEpochs then weights
else
let data = shuffle rng data
let weights =
data
|> Array.fold (fun w example -> update example w) weights
updateWeights data (epoch + 1) weights
// initialize the weights and start the recursive update
let initialWeights = [| for _ in 1 .. numFeatures + 1 -> 0. |]
updateWeights trainData 0 initialWeights

And that’s pretty much it. We replaced the whole class by a couple of functions, and all the indexes are gone. This is probably a matter of taste and comfort with functional concepts, but in my opinion, this is much easier to follow.

Before trying it out, to make sure it works, I’ll take a small liberty, and modify the Train function. As it stands right now, it returns the final weights, but really, we don’t care about the weights, what we want is a classifier, which is a function that, given an array, will predict a one or a zero. That’s easy enough, let’s return a function at the end instead of weights:

// initialize the weights and start the recursive update
let initialWeights = [| for _ in 1 .. numFeatures + 1 -> 0. |]
let finalWeights = updateWeights trainData 0 initialWeights
let classifier (features:float[]) =
if logistic finalWeights features > 0.5 then 1. else 0.
classifier

We can now wrap it up, and see our code in action:

printfn "Begin Logistic Regression (binary) Classification demo"
printfn "Goal is to demonstrate training using gradient descent"

let numFeatures = 8 // synthetic data
let numRows = 10000
let seed = 1

printfn "Generating %i artificial data items with %i features" numRows numFeatures
let trueWeights, allData = makeAllData(numFeatures, numRows, seed)

printfn "Data generation weights:"
trueWeights |> Array.iter (printf "%.2f ")
printfn ""

printfn "Creating train (80%%) and test (20%%) matrices"

let trainData, testData = makeTrainTest(allData, 0)
printfn "Done"

let maxEpochs = 1000
let alpha = 0.01

let classifier = Train (trainData,numFeatures,maxEpochs,alpha,0)

let accuracy (examples:Example[]) =
examples
|> Array.averageBy (fun (feat,value) ->
if classifier feat = value then 1. else 0.)

accuracy trainData |> printfn "Prediction accuracy on train data: %.4f"
accuracy testData |> printfn "Prediction accuracy on test data: %.4f"

We used a small trick to compute the accuracy – we mark every correct call as a one, every incorrect one as a zero, which, when we compute the average, gives us directly the proportion of cases that were called correctly. On my machine, I get the following output:

>
Prediction accuracy on train data: 0.9988
Prediction accuracy on test data: 0.9980

Looks good enough to me, the implementation seems to be working. The whole code presented here is available as a gist here. I’ll leave it at that for now (I might revisit it later, and try to make this work with DiffSharp at some point, if anyone is interested) – feel free to ask questions, or drop me a comment on Twitter!

22. March 2015 17:27

I will admit it, I got a bit upset by James McCaffrey’s column in MSDN magazine this month, “Gradient Descent Training Using C#”. While the algorithm explanations are quite good, I was disappointed by the C# sample code, and kept thinking to myself “why oh why isn’t this written in F#”. This is by no means intended as a criticism of C#; it’s a great language, but some problems are just better suited for different languages, and in this case, I couldn’t fathom why F# wasn’t used.

Long story short, I just couldn’t let it go, and thought it would be interesting to take that C# code, and do a commented rewrite in F#. I won’t even go into why the code does what it does – the article explains it quite well – but will instead purely focus on the implementation, and will try to keep it reasonably close to the original, at the expense of some additional nifty things that could be done.

The general outline of the code follows two parts:

• Create a synthetic dataset, creating random input examples, and computing the expected result using a known function,
• Use gradient descent to learn the model parameters, and compare them to the true value to check whether the method is working.

You can download the original C# code here. Today we’ll focus only on the first part, which is mainly contained in two methods, MakeAllData and MakeTrainTest:

static double[][] MakeAllData(int numFeatures, int numRows, int seed)
{
Random rnd = new Random(seed);
double[] weights = new double[numFeatures + 1]; // inc. b0
for (int i = 0; i < weights.Length; ++i)
weights[i] = 20.0 * rnd.NextDouble() - 10.0; // [-10.0 to +10.0]

double[][] result = new double[numRows][]; // allocate matrix
for (int i = 0; i < numRows; ++i)
result[i] = new double[numFeatures + 1]; // Y in last column

for (int i = 0; i < numRows; ++i) // for each row
{
double z = weights[0]; // the b0
for (int j = 0; j < numFeatures; ++j) // each feature / column except last
{
double x = 20.0 * rnd.NextDouble() - 10.0; // random X in [10.0, +10.0]
result[i][j] = x; // store x
double wx = x * weights[j + 1]; // weight * x
z += wx; // accumulate to get Y
}
double y = 1.0 / (1.0 + Math.Exp(-z));
if (y > 0.55)  // slight bias towards 0
result[i][numFeatures] = 1.0; // store y in last column
else
result[i][numFeatures] = 0.0;
}
Console.WriteLine("Data generation weights:");
ShowVector(weights, 4, true);

return result;
}

MakeAllData takes a number of features and rows, and a seed for the random number generator so that we can replicate the same dataset repeatedly. The dataset is represented as an array of array of doubles. The first columns, from 0 to numFeatures – 1, contain random numbers between –10 and 10. The last column contains a 0 or a 1. What we are after here is a classification model: each row can take two states (1 or 0), and we are trying to predict them from observing the features. In our case, that value is computed using a logistic model: we have a set of weights (which we also generate randomly), corresponding to each feature, and the output is

logistic [ x1; x2; … xn ] = 1.0 / (1.0 + exp ( - (w0 * 1.0 + w1 * x1 + w2 * x2 + … + wn * xn))

Note that w0 plays the role of a constant term in the equation, and is multiplied by 1.0 all the time. This is adding some complications to a code where indices are already flying left and right, because now elements in the weights array are mis-aligned by one element with the elements in the features array. Personally, I also don’t like adding another column to contain the predicted value, because that’s another implicit piece of information we have to remember.

In that frame, I will make two minor changes here, just to keep my sanity. First, as is often done, we will insert a column containing just 1.0 in each observation, so that the weights and features are now aligned. Then, we will move the 0s and 1s outside of the features array, to avoid any ambiguity.

Good. Instead of creating a Console application, I’ll simply go for a script. That way, I can just edit my code and check live whether it does what I want, rather than recompile and run every time.

Let’s start with the weights. What we are doing here is simply creating an array of numFeatures + 1 elements, populated by random values between –10.0 and 10.0. We’ll go a bit fancy here: given that we are also generating random numbers the same way a bit further down, let’s extract a function that generates numbers uniformly between a low and high value:

let rnd = Random(seed)
let generate (low,high) = low + (high-low) * rnd.NextDouble()
let weights = Array.init (numFeatures + 1) (fun _ -> generate(-10.0,10.0))

The next section is where things get a bit thornier. The C# code creates an array, then populates it row by row, first filling in the columns with random numbers, and then applying the logistic function to compute the value that goes in the last column. We can make that much clearer, by extracting that function out. The logistic function is really doing 2 things:

• first, the sumproduct of 2 arrays,
• and then, 1.0/(1.0 + exp ( – z ).

That is easy enough to implement:

let sumprod (v1:float[]) (v2:float[]) =
Seq.zip v1 v2 |> Seq.sumBy (fun (x,y) -> x * y)

let sigmoid z = 1.0 / (1.0 + exp (- z))

let logistic (weights:float[]) (features:float[]) =
sumprod weights features |> sigmoid

We can now use all this, and generate a dataset by simply first creating rows of random values (with a 1.0 in the first column for the constant term), applying the logistic function to compute the value for that row, and return them as a tuple:

open System

let sumprod (v1:float[]) (v2:float[]) =
Seq.zip v1 v2 |> Seq.sumBy (fun (x,y) -> x * y)

let sigmoid z = 1.0 / (1.0 + exp (- z))

let logistic (weights:float[]) (features:float[]) =
sumprod weights features |> sigmoid

let makeAllData (numFeatures, numRows, seed) =

let rnd = Random(seed)
let generate (low,high) = low + (high-low) * rnd.NextDouble()
let weights = Array.init (numFeatures + 1) (fun _ -> generate(-10.0,10.0))

let dataset =
[| for row in 1 .. numRows ->
let features =
[|
yield 1.0
for feat in 1 .. numFeatures -> generate(-10.0,10.0)
|]
let value =
if logistic weights features > 0.55
then 1.0
else 0.0
(features, value)
|]

weights, dataset

Done. Let’s move to the second part of the data generation, with the MakeTrainTest method. Basically, what this does is take a dataset, shuffle it, and split it in two parts, 80% which we will use for training, and 20% we leave out for validation.

static void MakeTrainTest(double[][] allData, int seed,
out double[][] trainData, out double[][] testData)
{
Random rnd = new Random(seed);
int totRows = allData.Length;
int numTrainRows = (int)(totRows * 0.80); // 80% hard-coded
int numTestRows = totRows - numTrainRows;
trainData = new double[numTrainRows][];
testData = new double[numTestRows][];

double[][] copy = new double[allData.Length][]; // ref copy of all data
for (int i = 0; i < copy.Length; ++i)
copy[i] = allData[i];

for (int i = 0; i < copy.Length; ++i) // scramble order
{
int r = rnd.Next(i, copy.Length); // use Fisher-Yates
double[] tmp = copy[r];
copy[r] = copy[i];
copy[i] = tmp;
}
for (int i = 0; i < numTrainRows; ++i)
trainData[i] = copy[i];

for (int i = 0; i < numTestRows; ++i)
testData[i] = copy[i + numTrainRows];
}

Again, there is a ton of indexing going on, which in my old age I find very hard to follow. Upon closer inspection, really, the only thing complicated here is the Fischer-Yates shuffle, which takes an array and randomly shuffles the order. The rest is pretty simply – we just want to shuffle, and then split into two arrays. Let’s extract the shuffle code (which happens to also be used and re-implemented later on):

let shuffle (rng:Random) (data:_[]) =
let copy = Array.copy data
for i in 0 .. (copy.Length - 1) do
let r = rng.Next(i, copy.Length)
let tmp = copy.[r]
copy.[r] <- copy.[i]
copy.[i] <- tmp
copy

We went a tiny bit fancy again here, and made the shuffle work on generic arrays; we also pass in the Random instance we want to use, so that we can control / repeat shuffles if we want, by passing a seeded Random. Does this work? Let’s check in FSI:

> [| 1 .. 10 |] |> shuffle (Random ());;
val it : int [] = [|6; 7; 2; 10; 8; 5; 4; 9; 3; 1|]

Looks reasonable. Let’s move on – we can now implement the makeTrainTest function.

let makeTrainTest (allData:_[], seed) =

let rnd = Random(seed)
let totRows = allData.Length
let numTrainRows = int (float totRows * 0.80) // 80% hard-coded

let copy = shuffle rnd allData
copy.[.. numTrainRows-1], copy.[numTrainRows ..]

Done. A couple of remarks here. First, F# is a bit less lenient than C# around types, so we have to be explicit when converting the number of rows to 80%, first to float, then back to int. As an aside, this used to annoy me a bit in the beginning, but I have come to really like having F# as this slightly psycho-rigid friend who nags me when I am taking a dangerous path (for instance, dividing two integers and hoping for a percentage).

Besides that, I think the code is markedly clearer. The complexity of the shuffle has been nicely contained, and we just have to slice the array to get a training and test sets. As an added bonus, we got rid of the out parameters, and that always feels nice and fuzzy.

I’ll leave it at for today; next time we’ll look at the second part, the learning algorithm itself. Before closing shop, let me make a couple of comments. First, the code is a tad shorter, but not by much. I haven’t really tried, and deliberately made only the changes I thought were needed. What I like about it, though, is that all the indexes are gone, except for the shuffle. In my opinion, this is a good thing. I find it difficult to keep it all in my head when more than one index is involved; when I need to also remember what columns contain special values, I get worried – and just find it hard to figure out what is going on. By contrast, I think makeTrainTest, for instance, conveys pretty directly what it does. makeAllData, in spite of some complexity, also maps closely the way I think about my goal: “I want to generate rows of inputs” – this is precisely what the code does. There is probably an element of culture to it, though; looping over arrays has a long history, and is familiar to every developer, and what looks readable to me might look entirely weird to some.

Easier, or more complicated than before? Anything you like or don’t like – or find unclear? Always interested to hear your opinion! Ping me on Twitter if you have comments.

21. February 2015 12:23

A few weeks ago, I came across DiffSharp, an automatic differentiation library in F#. As someone whose calculus skills have always been rather mediocre (thanks Wolfram Alpha!), but who needs to deal with gradients and the like on a regular basis because they are quite useful in machine learning and numerical methods, the project looked pretty interesting: who wouldn’t want exact and efficient calculations of derivatives? So I figured I would take a couple of hours to experiment with the library. This post is by no means an in-depth evaluation, but rather intended as “notes from the road” from someone entirely new to DiffSharp.

## Basics

Suppose I want to compute the derivative of f(x) = √ x at, say, 42.0. Double-checking Wolfram Alpha confirms that f has derivative f’(x) = 1 / (2 x √ x) .

Once DiffSharp is installed via Nuget, we can automatically evaluate f’(x) :

#r @"..\packages\DiffSharp.0.5.7\lib\DiffSharp.dll"

let f x = sqrt x
diff f 42. |> printfn "Evaluated: %f"
1. / (2. * sqrt 42.)  |> printfn "Actual: %f"

Evaluated: 0.077152
Actual: 0.077152

val f : x:Dual -> Dual
val it : unit = ()

First off, obviously, it worked. Without any need for us to perform anything, DiffSharp took in our implementation of f, and computed the correct value. This is really nifty.

The piece which is interesting here is the inferred signature of f. If I were to remove the line that immediately follows the function declaration, f would have the following signature:

val f : x:float –> float

The moment you include the line diff f 42., the inferred type changes drastically, and becomes

val f : x:Dual –> Dual

This is pretty interesting. Because we call diff on f, which expects Duals (a type that is defined in DiffSharp), our function isn’t what we originally defined it to be – and calling f 42.0 at that point (for instance) will fail, because 42.0 is a float, and not a Dual. In other words, DiffSharp leverages type inference pretty aggressively, to convert functions into the form it needs to perform its magic.

Edit: Atilim Gunes Baydin suggested another way around that issue, which is inlining f. The following works perfectly well, and allows to both differentiate f, and use this against floats:

let inline f x = sqrt x
let f' = diff f
f 42.

Thanks for the input!

This has a couple of implications. First, if you work in a script, you need to be careful about how you send your code to the F# interactive for execution. If you process the sample code above line by line in FSI, the evaluation will fail, because f will be inferred to be float –> float. Then, you will potentially need to annotate your functions with type hints, to help inference. As an example, the following doesn’t work:

let g x = 3. * x
diff g 42.

As is, g is still inferred to be of type float –> float, because of the presence of the constant term, which is by default inferred as a float. That issue can be addressed at least two ways – by explicitly marking x or 3. as dual in g, like this:

let g x = (dual 3.) * x
let h (x:Dual) = 3. * x

That’s how far we will go on this – if you want to dive in deeper, the Type Inference page discusses the topic in much greater detail

## A tiny example

So why is this interesting? As I mentioned earlier, differentiation is used heavily in numeric algorithms to identify values that minimize a function, a prime example being the gradient descent algorithm. The simplest example possible would be finding a (local) minimum of a single-argument function: starting from an arbitrary value x, we can iteratively follow the direction of steepest descent, until no significant change is observed.

Here is a quick-and-dirty implementation, using DiffSharp:

let minimize f x0 alpha epsilon =
let rec search x =
let fx' = diff f x
if abs fx' < epsilon
then x
else
let x = x - alpha * fx'
search x
search x0

Edit, 3/4/2015: fixed issue in code, using abs fx’ instead of fx’

Because DiffSharp handles the differentiation part automatically for us, with only 10 lines of code, we can now pass in arbitrary functions we want to minimize, and (with a few caveats…), and get a local minimum, no calculus needed:

let epsilon = 0.000001

let g (x:Dual) = 3. * pown x 2 + 2. * x + 1.
minimize g 0. 0.1 epsilon |> printfn "Min of g at x = %f"

let h (x:Dual) = x + x * sin(x) + cos(x) * (3. * x  - 7.)
minimize h 0. 0.1 epsilon |> printfn "Min of h at x = %f"

>
Min of g at x = -0.333333
Min of h at x = -0.383727

Let’s make sure this is reasonable. g is a quadratic function, which has a minimum or maximum at –b/2*a, that is, –2 / 2 x 3 - this checks out. As for h, inspecting the function plot confirms that it has a minimum around the identified value:

Edit, 3/4/2015: changed function h to a simpler shape.

## Conclusion

I have barely started to scratch the surface of DiffSharp in this post, but so far, I really, really like its promise. While I limited my examples to single-variable functions, DiffSharp supports multivariate functions, and vector operations as well. The way it uses type inference is a bit challenging at first, but seems a reasonable price to pay for the resulting magic. My next step will probably be a less tiny example, perhaps a logistic regression against realistic data. I am very curious to try out the algebra bits – and also wondering in the back of my head how to best use the library in general. For instance, how easy is it to construct a function from external data, and turn it into the appropriate types for DiffSharp to work its magic? How well does this integrate with other libraries, say, Math.NET? We’ll see!

In the meanwhile, I’d recommend checking out the project page, which happens to also be beautifully documented! And, as always, you can ping me on twitter for comments or question.

#### Need help with F#?

The premier team for
F# training & consulting.