Mathias Brandewinder on .NET, F#, VSTO and Excel development, and quantitative analysis / machine learning.
by Mathias 13. September 2014 17:18

Let’s face it, @fsibot in its initial release came with a couple flaws undocumented features. One aspect that was particularly annoying was the mild Tourette’s syndrom that affected the bot; on a fairly regular basis, it would pick up the same message, and send the same answer over and over again to the brave soul that tried to engage in a constructive discussion.

I wasn’t too happy about that (nobody likes spam), and, being all about the enterprise and stuff, I thought it was time to inject a couple more buzzwords. In this post, I’ll briefly discuss how I ended up using the Azure Service Bus to address the problem, with a sprinkle of Azure Storage for good measure, and ended up liking it quite a bit.

So what was the problem?

The issue came from a combination of factors. Fundamentally, @fsibot is doing two things: pulling on a regular basis recent mentions from Twitter, and passing them to the F# Compiler Services to produce a message with the result of the evaluation.

Mentions are pulled via the Twitter API, which offers two options: grab the latest 20, or grab all mentions since a given message ID. If you have no persistent storage, this implies that when the service starts, you pull the 20 most recent ones, and once you have retrieved some messages, you can start pulling only from the last one seen.

This is a great strategy, if your service runs like a champ and never goes down (It’s also very easy to implement – a coincidence, surely). Things start to look much less appealing when the service goes down. In that scenario, the service reboots, and starts re-processing the 20 most recent mentions. In a scenario where, say, a couple of enthusiastic F# community members decide to thoroughly test the bots’ utter lack of security, and send messages that cause it to have fits and go down in flames multiple times in a short time span, this is becoming a real problem.

So what can we do to fix this?

A first obvious problem is that a failure in one part of the service should not bring down the entire house. Running unsafe code in the F# Compiler Service should not impact the retrieval of Twitter mentions. In order to decouple the problem, we can separate these into two separate services, and connect them via a queue. This is much better: if the compiler service fails, messages keep being read and pushed to the queue, and when it comes back on line, they can be processed as if nothing happened. At that point, the only reasons that will disrupt the retrieval of mentions is either a problem in that code specifically, or a reboot of the machine itself.

So how did I go about implementing that? The most lazy way possible, of course. In that case, I used the Azure Service Bus queue. I won’t go into all the details of using the Service Bus; this tutorial does a pretty good job at covering the basic scenario, from creating a queue to sending and receiving messages. I really liked how it ended up looking from F#, though. In the first service, which reads recent mentions from Twitter, the code simply looks like this:

let queueMention (status:Status) =
    let msg = new BrokeredMessage ()
    msg.MessageId <- status.StatusID |> string
    msg.Properties.["StatusID"] <- status.StatusID
    msg.Properties.["Text"] <- status.Text
    msg.Properties.["Author"] <- status.User.ScreenNameResponse
    mentionsQueue.Send msg

From the Status (a LinqToTwitter class) I retrieve, I extract the 3 fields I care about, create a BrokeredMessage (the class used to communicate via the Azure Service Bus), add key-value pairs to  Properties and send it to the Queue.

On the processing side, this is the code I got:

let (|Mention|_|) (msg:BrokeredMessage) =
    match msg with
    | null -> None
    | msg ->
        try
            let statusId = msg.Properties.["StatusID"] |> Convert.ToUInt64
            let text = msg.Properties.["Text"] |> string
            let user = msg.Properties.["Author"] |> string
            Some { StatusId = statusId; Body = text; User = user; }
        with 
        | _ -> None

let rec pullMentions( ) =
    let mention = mentionsQueue.Receive ()
    match mention with
    | Mention tweet -> 
        tweet.Body
        |> processMention
        |> composeResponse tweet
        |> respond
        mention.Complete ()
    | _ -> ignore ()

    Thread.Sleep pingInterval
    pullMentions ()

I declare a partial Active Pattern (the (|Mention|_|) “banana clip” bit), which allows me to use pattern matching against a BrokeredMessage, a class which by itself knows nothing about F# and discriminated unions. That piece of code itself is not beautiful (just it’s a try-catch block, trying to extract data from the BrokeredMessage into my own Record type), but the part I really like is the pullMentions () method: I can now directly grab messages from the queue, match against a Mention, and here we go, a nice and clean pipeline all the way through.

So now that the two services are decoupled, one has a fighting chance to survive when the other goes down. However, it is still possible for the Twitter reads to fail, too, and in that case we will still get mentions that get processed multiple times.

One obvious way to resolve this is to actually persist the last ID seen somewhere, so that when the Service starts, it can read that ID and restart from there. This is what I ended up doing, storing that ID in a blob (probably the smallest blob in all of Azure); the code to write and read that ID to a blob is pretty simple, and probably doesn’t warrant much comment:

let updateLastID (ID:uint64) =
    let lastmention = container.GetBlockBlobReference blobName
    ID |> string |> lastmention.UploadText

let readLastID () =
    let lastmention = container.GetBlockBlobReference blobName
    if lastmention.Exists ()
    then 
        lastmention.DownloadText () 
        |> System.Convert.ToUInt64
        |> Some
    else None

However, even before doing this, I went an even lazier road. One of the niceties about using the Service Bus is that the queue behavior is configurable in multiple ways. One of the properties available (thanks @petarvucetin for pointing it out!) is Duplicate Detection. As the name cleverly suggests, it allows you to specify a time window during which the Queue will detect and discard duplicate BrokeredMessages, a duplicate being defined as “a message with the same MessageID”.

So I simply set a window of 24 hours for Duplicate Detection, and the BrokeredMessage.MessageID equal to the Tweet Status ID. If the Queue sees a message, and the same message shows up withing 24 hours, no repeat processing. Nice!

Why did I add the blob then, you might ask? Well, the Duplicate Detection only takes care of most problem cases, but not all of them. Imagine that a Mention comes in, then less than 20 mentions arrive for 24 hours, and then the service crashes – in that case, the message WILL get re-processed, because the Duplicate Detection window has expired. I could have increased that to more than a day, but it already smelled like a rather hacky way to solve the problem, so I just added the blob, and called it a day.

So what’s the point here? Nothing earth shattering, really – I just wanted to share my experience using some of the options Azure offers, in the context of solving simple but real problems on @fsibot. What I got out of it is two things. First, Azure Service Bus and Azure Storage were way easier to use than what I expected. Reading the tutorials took me about half an hour, implementing the code took another half an hour, and it just worked. Then (and I will readily acknowledge some F# bias here), my feel is that Azure and F# just play very nicely together. In that particular case, I find that Active Patterns provide a very clean way to parse out BrokeredMessages, and extract out code which can then simply be plugged in the code with a Pattern Match, and, when combined with classic pipelines, ends up creating very readable workflows.

by Mathias 24. August 2014 15:35

My recollection of how this all started is somewhat fuzzy at that point. I remember talking to @tomaspetricek about the recent “A pleasant round of golf” with @relentlessdev event in London. The idea of Code Golf is to write code that fits in as few characters as possible – a terrible idea in most cases, but an interesting one if you want to force your brain into unknown territory. Also, a very fun idea, with lots of possibilities. If I recall correctly, the discussion soon drifted to the conclusion that if you do it right (so to speak), your code should fit in a tweet. Tweet, or GTFO, as the kids would say (or so I hear).

Of course, I began obsessing about the idea, that’s what I do. The discussion kept going at LambdaJam, with @rickasaurus, @pblasucci and @bbqfrito (beers, too). So I thought I had to try it out: what if you set up a twitter bot, which would respond to your F# inquiries, and send back an evaluation of whatever F# expression you sent it?

As it turns out, it’s not that difficult to do, thanks to the fsharp Compiler Services, which lets you, among many things, host an FSI session. So without further due, I give you @fsibot. Tweet a valid expression to @fsibot, and it will run it in an F# interactive session, and reply with the result:

Note that you need to send an expression, as opposed to an interaction. As an example, printfn “Hello, world” won’t do anything, but sprintf “Hello, world” (which evaluates to a string) will.

What else is there to say?

A couple of things. First, my initial plan was to run this on an Azure worker role, which seemed to make a lot of sense. Turns out, after spending countless hours trying to figure out why it was working just great on my machine, using the Azure emulator, but exploding left and right the moment I deployed it in production, I just gave up, and changed track, rewriting it as a Windows Service hosted in an Azure virtual machine (it’s still a cloud-based architecture!), using the awesome TopShelf to simplify my life (thank you @phatboyg for saving my weekend, and @ReedCopsey for pointing me in the right direction).

You can find the whole code here on GitHub. As you might notice, the whole TopShelf part is in C# – nothing wrong with it, but I plan on moving this over to F# as soon as I can, using existing work by @henrikfeldt, who discreetly produces a lot of awesome code made in Sweden.

Another lesson learnt, which came by way of @panesofglass, was that if your code doesn’t do anything asynchronous, using async everywhere is probably not such a hot idea. Duh – but I recently got enamored with mailbox processors and async workflows, and started initially building a gigantic pipe factory, until Ryan aptly pointed out that this was rather counter-productive. So I simplified everything. Thanks for the input, Ryan!

That’s it! I am not entirely sure the bot will handle gracefully non-terminating expressions, but in traditional San Francisco fashion, I’ll call this a Minimum Viable Product, and just ship it – we can pivot later. Now have fun with it :) And if you have some comments, questions or suggestions, feel free to ping me on twitter as @brandewinder.

Source code on GitHub

by Mathias 16. June 2014 22:15

Like many a good man, I too got caught into the 2048 trap, which explains in part why I have been rather quiet on this blog lately (there are a couple other reasons, too).

In case you don't know what 2048 is yet, first, consider yourself lucky - and, fair warning, you might want to back away now, while you still have a chance. 2048 is a very simple and fun game, and one of the greatest time sinks since Tetris. You can play it here, and the source code is here on GitHub.

I managed to dodge the bullet for a while, until @PrestonGuillot, a good friend of mine, decided to write a 2048 bot as a fun weekend project to sharpen his F# skills, and dragged me down with him in the process. This has been a ton of fun, and this post is a moderately organized collection of notes from my diary as a recovering 2048 addict.

Let's begin with the end result. The video below shows a F# bot, written by my friend @Blaise_V, masterfully playing the game. I recorded it a couple of weeks ago, accelerating time "for dramatic purposes":

One of the problems Preston and I ran into early was how to handle interactions with the game. A recent post by @shanselman was praising Canopy as a great library for web UI testing, which gave me the idea to try it for that purpose. In spite of my deep incompetence of things web related, I found the Canopy F# DSL super easy to pick up, and got something crude working in a jiffy. With a bit of extra help from the awesome @lefthandedgoat, the creator of Canopy (thanks Chris!), it went from crude to pretty OK, and I was ready to focus on the interesting bits, the game AI.

I had so much fun in the process, I figured others might too, and turned this into another Community for F# Dojo, which you can find here.

More...

by Mathias 12. April 2014 11:05

A lightweight post this week. One of my favorite F# type providers is the World Bank type provider, which enables ridiculously easy access to a boatload of socio-economic data for every country in the world. However, numbers are cold – wouldn’t it be nice to visualize them using a map? Turns out it’s pretty easy to do, using another of my favorites, the R type provider. The rworldmap R package, as its name suggests, is all about world maps, and is a perfect fit with the World Bank data.

The video below shows you the results in action; I also added the code below, for good measure. The only caveat relates to the integration between the Deedle data frame library and R. I had to manually copy the Deedle.dll and Deedle.RProvider.Plugin.dll into packages\RProvider.1.0.5\lib for the R Provider to properly convert Deedle data frames into R data frames. Enjoy!

Here is the script I used:

#I @"..\packages\"
#r @"R.NET.1.5.5\lib\net40\RDotNet.dll"
#r @"RProvider.1.0.5\lib\RProvider.dll"
#r @"FSharp.Data.2.0.5\lib\net40\FSharp.Data.dll"
#r @"Deedle.0.9.12\lib\net40\Deedle.dll"
#r @"Deedle.RPlugin.0.9.12\lib\net40\Deedle.RProvider.Plugin.dll"

open FSharp.Data
open RProvider
open RProvider.``base``
open Deedle
open Deedle.RPlugin
open RProviderConverters

let wb = WorldBankData.GetDataContext()
wb.Countries.France.CapitalCity
wb.Countries.France.Indicators.``Population (Total)``.[2000]

let countries = wb.Countries

let pop2000 = series [ for c in countries -> c.Code => c.Indicators.``Population (Total)``.[2000]]
let pop2010 = series [ for c in countries -> c.Code => c.Indicators.``Population (Total)``.[2010]]
let surface = series [ for c in countries -> c.Code => c.Indicators.``Surface area (sq. km)``.[2010]]

let df = frame [ "Pop2000" => pop2000; "Pop2010" => pop2010; "Surface" => surface ]
df?Codes <- df.RowKeys

open RProvider.rworldmap

let map = R.joinCountryData2Map(df,"ISO3","Codes")
R.mapCountryData(map,"Pop2000")

df?Density <- df?Pop2010 / df?Surface
df?Growth <- (df?Pop2010 - df?Pop2000) / df?Pop2000

let map2 = R.joinCountryData2Map(df,"ISO3","Codes")
R.mapCountryData(map2,"Density")
R.mapCountryData(map2,"Growth")

Have a great week-end, everybody! And big thanks to Tomas for helping me figure out a couple of things about Deedle.

by Mathias 22. March 2014 13:11

A couple of days ago, I got into the following Twitter exchange:

 

So why do I think FsCheck + XUnit = The Bomb?

I have a long history with Test-Driven Development; to this day, I consider Kent Beck’s “Test-Driven Development by Example” one of the biggest influences in the way I write code (any terrible code I might have written is, of course, to be blamed entirely on me, and not on the book).

In classic TDD style, you typically proceed by writing incremental test cases which match your requirements, and progressively write the code that will satisfy the requirements. Let’s illustrate on an example, a password strength validator. Suppose that my requirements are “a password must be at least 8 characters long to be valid”. Using XUnit, I would probably write something along these lines:

namespace FSharpTests

open Xunit
open CSharpCode

module ``Password validator tests`` =

    [<Fact>]
    let ``length above 8 should be valid`` () =
        let password = "12345678"
        let validator = Validator ()
        Assert.True(validator.IsValid(password))

… and in the CSharpCode project, I would then write the dumbest minimal implementation that could passes that requirement, that is:

public class Validator
{
    public bool IsValid(string password)
    {
        return true;
    }
}

Next, I would write a second test, to verify the obvious negative:

namespace FSharpTests

open Xunit
open CSharpCode

module ``Password validator tests`` =

    [<Fact>]
    let ``length above 8 should be valid`` () =
        let password = "12345678"
        let validator = Validator ()
        Assert.True(validator.IsValid(password))

    [<Fact>]
    let ``length under 8 should not be valid`` () =
        let password = "1234567"
        let validator = Validator ()
        Assert.False(validator.IsValid(password))

This fails, producing the following output in Visual Studio:

Classic-Test-Result

… which forces me to fix my implementation, for instance like this:

public class Validator
{
    public bool IsValid(string password)
    {
        if (password.Length < 8)
        {
            return false;
        }

        return true;
    }
}

Let’s pause here for a couple of remarks. First, note that while my tests are written in F#, the code base I am testing against is in C#. Mixing the two languages in one solution is a non-issue. Then, after years of writing C# test cases with names like Length_Above_8 _Should_Be_Valid, and arguing whether this was better or worse than LengthAbove8 ShouldBeValid, I find that having the ability to simply write “length above 8 should be valid”, in plain old English (and seeing my tests show that way in the test runner as well), is pleasantly refreshing. For that reason alone, I would encourage F#-curious C# developers to try out writing tests in F#; it’s a nice way to get your toes in the water, and has neat advantages.

But that’s not the main point I am interested here. While this process works, it is not without issues. From a single requirement, “a password must be at least 8 characters long to be valid”, we ended up writing 2 test cases. First, the cases we ended up are somewhat arbitrary, and don’t fully reflect what they say. I only tested two instances, one 7 characters long, one 8 characters long. This is really relying on my ability as a developer to identify “interesting cases” in a vast universe of possible passwords, hoping that I happened to cover sufficient ground.

This is where FsCheck comes in. FsCheck is a port of Haskell’s QuickCheck, a property-based testing framework. The term “property” is somewhat overloaded, so let’s clarify: what “Property” means in that context is a property of our program that should be true, in the same sense as mathematically, a property of any number x is “x * x is positive”. It should always be true, for any input x.

Install FsCheck via Nuget, as well as the FsCheck XUnit extension; you can now write tests that verify properties by marking them with the attribute [<Property>], instead of [<Fact>], and the XUnit test runner will pick them up as normal tests. For instance, taking our example from right above, we can write:

namespace FSharpTests

open Xunit
open FsCheck
open FsCheck.Xunit
open CSharpCode

module Specification =

    [<Property>]
    let ``square should be positive`` (x:float) = 
        x * x > 0.

Let’s run that – fail. If you click on the test results, here is what you’ll see:

Square-Test

FsCheck found a counter-example, 0.0. Ooops! Our specification is incorrect here, the square value doesn’t have to be strictly positive, and could be zero. This is an obvious mistake, let’s fix the test, and get on with our lives:

[<Property>]
let ``square should be positive`` (x:float) = 
    x * x >= 0.

More...

Comments

Comment RSS