Mathias Brandewinder on .NET, F#, VSTO and Excel development, and quantitative analysis / machine learning.
by Mathias 22. December 2015 14:43

This is my modest contribution to the F# Advent Calendar 2015. Thanks to @sergey_tihon for organizing it! Check out the epic stuff others have produced so far on his website or under the #fsAdvent hashtag on Twitter. Also, don’t miss the Japan Edition of #fsAdvent for more epicness…

Sometime last year, in a moment of beer-fueled inspiration, I ended up putting together @fsibot, the ultimate mobile F# IDE for the nomad developer with a taste for functional-first programming. This was fun, some people created awesome things with it, other people, not so much, and I learnt a ton.

People also had feature requests (of course they did), some obviously crucial (Quines! We need quines!), some less so. Among others came the suggestion to support querying the World Bank for data, and returning results as a chart.

So... Let's do it! After a bit of thought, I decided I would not extend @fsibot to support this, but rather build a separate bot, with its own external DSL. My thinking here was that adding this as a feature to @fsibot would clutter the code; also, this is a specialized task, and it might make sense to create a dedicated language for it, to make it accessible to the broader public who might not be familiar with F# and its syntax.

You can find the code for this thing here.

The World Bank Type Provider

Let's start with the easy part - accessing the World Bank data and turning it into a chart. So what I want to do is something along the lines of 'give me the total population for France between 2000 and 2005', and make a nice columns chart out of this. The first step is trivial using the World Bank type provider, which can be found in the FSharp.Data library:

open FSharp.Data

let wb = WorldBankData.GetDataContext ()
let france = wb.Countries.France
let population = france.Indicators.``Population, total``
let series = [ for year in 2000 .. 2005 -> year, population.[year]]

Creating a chart isn't much harder, using FSharp.Charting:

open FSharp.Charting

let title = sprintf "%s, %s" (france.Name) (population.Name)
let filename = __SOURCE_DIRECTORY__ + "/chart.png"

Chart.Line(series, Title=title)
|> Chart.Save(filename)

Wrapping up calls to the Type Provider

Next, we need to take in whatever string the user will send us over Twitter, and convert it into something we can execute. Specifically, what we want is to take user input along the lines of "France, Total population, 2000-2005", and feed that information into the WorldBank type provider.

Suppose for a moment that we had broken down our message into its 4 pieces, a country name, an indicator name, and two years. We could then call the WorldBank type provider, along these lines:

type WB = WorldBankData.ServiceTypes
type Country = WB.Country
type Indicator = Runtime.WorldBank.Indicator

let findCountry (name:string) =
    wb.Countries 
    |> Seq.tryFind (fun c -> c.Name = name)

let findIndicator (name:string) (c:Country) =
    c.Indicators 
    |> Seq.tryFind (fun i -> i.Name = name)

let getValues (year1,year2) (indicator:Indicator) =
    [ for year in year1 .. year2 -> year, indicator.[year]]

We can then easily wrap this into a single function, like this:

let getSeries (country,indicator,year1,year2) =
    findCountry country
    |> Option.bind (findIndicator indicator)
    |> Option.map (getValues (year1,year2))

Defining our language

This is a bit limiting, however. Imagine that we wanted to also support queries like "France, Germany, Italy, Total population, total GDP, 2000". We could of course pass in everything as lists, say,

 ["France";"Germany"], ["Total population"], [2000],

… but we'd have to then examine how many elements the list contains to make a decision. Also, more annoyingly, this allows for cases that should not be possible: ideally, we wouldn't want to even allow requests such as

[], [], [2000; 2010; 2020].

One simple solution is to carve out our own language, using F# Discriminated Unions. Instead of lists, we could, for instance, create a handful of types to represent valid arguments:

type PLACE =
    | COUNTRY of string
    | COUNTRIES of string list

type MEASURE =
    | INDICATOR of string

type TIMEFRAME = 
    | OVER of int * int
    | IN of int

This is much nicer: we can now clean up our API using pattern matching, eliminating a whole class of problems:

let cleanAPI (place:PLACE) (values:MEASURE) (timeframe:TIMEFRAME) =
    match (place, values, timeframe) with
    | COUNTRY(country), INDICATOR(indicator), OVER(year1,year2) ->             
        // do stuff
    | COUNTRIES(countries), INDICATOR(indicator), OVER(year1,year2) -> 
        // do different stuff
    | // etc...

Parsing user input

The only problem we are left with now is to break a raw string - the user request - into a tuple of arguments. If we have that, then we can compose all the pieces together, piping them into a function that will take a string and go all the way down to the type provider.

We are faced with a decision now: we can go the hard way, powering our way through this using Regex and string manipulation, or the easy way, using a parser like FParsec. Let's be lazy and smart!

Note to self: when using FParsec from a script file, make sure you #r FParsecCS before FParsec. I spent a couple of hours stuck trying to understand what I was doing wrong because of that one.

Simply put, FParsec is awesome. It allows you to define small functions to parse input strings, test them on small pieces of input, and compose them together into bigger and badder parsers. Let's illustrate: suppose that in our DSL, we expect user requests to contain a piece that looks like "IN 2010", or "OVER 2000 - 2010" to define the timeframe.

In the first case, we want to recognize the string “IN”, followed by spaces, followed by an integer; if we find that pattern, we want to retrieve the integer and create an instance of IN:

let pYear = spaces >>. pint32 .>> spaces
let pIn =
    pstring "IN" >>. pYear
    |>> IN

If we run the parser on a well-formed string, we get what we expect:

run pIn "IN  2000 "
> 
val it : ParserResult<TIMEFRAME,unit> = Success: IN 2000

If we pass in an incorrectly formed string, we get a nice error diagnosis:

run pIn "IN some year "
> 
val it : ParserResult<TIMEFRAME,unit> =
  Failure:
Error in Ln: 1 Col: 4
IN some year 
   ^
Expecting: integer number (32-bit, signed)

Beautiful! The second case is rather straightforward, too:

let pYears = 
    tuple2 pYear (pstring "-" >>. pYear)
     
let pOver = 
    pstring "OVER" >>. pYears
    |>> OVER

Passing in a well-formed string gives us back OVER(2000,2010):

run pOver "OVER 2000- 2010"
> 
val it : ParserResult<TIMEFRAME,unit> = Success: OVER (2000,2010)

Finally we can compose these together, so that when we encounter either IN 2000, or OVER 2000 - 2005, we parse this into a TIMEFRAME:

let pTimeframe = pOver <|> pIn

I won't go into the construction of the full parser - you can just take a look here. The trickiest part was my own doing. I wanted to allow messages without quotes, that is,

COUNTRY France

and not

COUNTRY "France"

The second case is much easier to parse (look for any chars between ""), especially because there are indicators like, for instance, "Population, total". The parser is pretty hacky, but hey, it mostly works, so... ship it!

Ship it!

That's pretty much it. At that point, all the pieces are there. I ended up copy pasting taking inspiration from the existing @fsibot code, using LinqToTwitter to deal with reading and writing to Twitter, and TopShelf to host the bot as a Windows service, hosted on an Azure VM, and voila! You can now tweet to @wbfacts, and get back a nice artisanal chart, hand-crafted just for you, with the freshest data from the World Bank:

A couple of quick final comments:

  • One of the most obvious issues with the bot is that Twitter offers very minimal support for IntelliSense (and by minimal, I mean 'none'). This is a problem, because we lose discoverability, a key benefit of type providers. To compensate for that, I added a super-crude string matching strategy, which will give a bit of flexibility around misspelled country or indicator names. This is actually a fun problem - I was a bit pressed by time, but I'll probably revisit it later.
  • In the same vein, it would be nice to add a feature like "find me an indicator with a name like GDP total". That should be reasonably easy to do, by extending the language to support instructions like HELP and / or INFO.
  • The bot seems like a perfect case for some Railway-Oriented Programming. Currently the wiring is pretty messy; for instance, our parsing step returns an option, and drops parsing error messages from FParsec. That message would be much more helpful to the user than our current message that only states that “parsing failed". With ROP, we should be able to compose a clean pipeline of functions, along the lines of parseArguments >> runArguments >> composeResponse.
  • The performance of looking up indicators by name is pretty terrible, at least on the first call on a country. You have been warned :)
  • That's right, there is no documentation. Not a single test, either. Tests show a disturbing lack of confidence in your coding skills. Also, I had to ship by December 22nd :)

That being said, in spite of its many, many warts, I am kind of proud of @wbfacts! It is ugly as hell, the code is full of duct-tape, the parser is wanky, and you should definitely not take this as ‘best practices’. I am also not quite clear on how the Twitter rate limits work, so I would not be entirely surprised if things went wrong in the near future… In spite of all this, hey, it kind of runs! Hopefully you find the code or what it does fun, and perhaps it will even give you some ideas for your own projects. In the meanwhile, I wish you all happy holidays!

You can find the code for this thing here.

This is my modest contribution to the F# Advent Calendar 2015. Thanks to @sergey_tihon for organizing it! Check out the epic stuff others have produced so far on his website or under the #fsAdvent hashtag on Twitter. Also, don’t miss the Japan Edition of #fsAdvent for more epicness…

I also wanted to say thanks to Tomas Petricek, for opening my eyes to discriminated unions as a modeling tool, and Phil Trelford for introducing me to FParsec, which is truly a thing of beauty. They can be blamed to an extent for inspiring this ill-conceived project, but whatever code monstrosity is in the repository is entirely my doing :)

And… ping me on Twitter as @brandewinder if you have questions or comments!

by Mathias 14. April 2013 12:20

Last Thursday, I gave a talk at the Bay.NET user group in Berkeley, introducing F# to C# developers. First off, I have to thank everybody who came – you guys were great, lots of good questions, nice energy, I had a fantastic time!

My goal was to highlight why I think F# is awesome, and of course this had to include a Type Provider demo, one of the most amazing features of F# 3.0. So I went ahead, and demoed Tomas Petricek’s World Bank Type Provider, and Howard Mansell’s R Type Provider – together. The promise of Type Providers is to enable information-rich programming; in this case, we get immediate access to a wealth of data over the internet, in one line of code, entirely discoverable by IntelliSense in Visual Studio - and we can use all the visualization arsenal of R to see what’s going on. Pretty rad.

Rather than just dump the code, I thought it would be fun to turn that demo into a video. The result is a 7 minutes clip, with only minor editing (a few cuts, and I sped up the video x3 because the main point here isn’t how terrible my typing skills are). I think it’s largely self-explanatory, the only points that are worth commenting upon are:

  • I am using a NuGet package for the R Type Provider that doesn’t officially exist yet. I figured a NuGet package would make that Type Provider more usable, and spent my week-end creating it, but haven’t published it yet. Stay tuned!
  • The most complex part of the demo is probably R’s syntax from hell. For those of you who don’t know R, it’s a free, open-source statistical package which does amazingly cool things. What you need to know to understand this video is that R is very vector-centric. You can create a vector in R using the syntax myData <- c(1,2,3,4), and combine vectors into what’s called a data frame, essentially a collection of features. The R type provider exposes all R packages and functions through a single static type, aptly named R – so for instance, one can create a R vector from F# by typing let myData = R.c( [|1; 2; 3; 4 |]).

That’s it! Let me know what you think, and if you have comments or questions.

Comments

Comment RSS