Mathias Brandewinder on .NET, F#, VSTO and Excel development, and quantitative analysis / machine learning.
31. July 2014 14:15

It is the summer, a time to cool off and enjoy vacations – so let’s keep it light, and hopefully fun, today! A couple of days ago, during his recent San Francisco visit, @tomaspetricek brought up an idea that I found intriguing. What if you had two images, and wanted to recreate an image similar to the first one, using only the pixels from the second?

To make this real, let’s take two images - a portrait by Velasquez, and one by Picasso, which I have conveniently cropped to be of identical size. What we are trying to do is to re-arrange the pixels from the Picasso painting, and recombine them to get something close to the Velasquez:

You can find the original images here: http://1drv.ms/XmlTN4

My thinking on the problem was as follows: we are trying to arrange a set of pixels into an image as close as possible to an existing image. That’s not entirely trivial. Being somewhat lazy, rather than work hard, I reverted to my patented strategy “what is the simplest thing that could possibly work (TM)”.

Two images are identical if each of their matching pixels are equal; the greater the difference between pixels, the less similar they are. In that frame, one possible angle is to try and match each pixel and limit the differences.

So how could we do that? If I had two equal groups of people, and I were trying to pair them by skill level, here is what I would do: rank each group by skill, and match the lowest person from the first group with his counterpart in the second group, and so on and so forth, until everyone is paired up. It’s not perfect, but it is easy.

Problem here is that there is no obvious order over pixels. Not a problem – we’ll create a sorting function, and replace it with something else if we don’t like the result. For instance, we could sort by “maximum intensity”; the value of a pixel will be the greater of its Red, Green and Blue value.

At that point, we have an algorithm. Time to crank out F# and try it out with a script:

open System.IO
open System.Drawing

let combine (target:string) ((source1,source2):string*string) =
// open the 2 images to combine
let img1 = new Bitmap(source1)
let img2 = new Bitmap(source2)
// create the combined image
let combo = new Bitmap(img1)
// extract pixels from an image
let pixelize (img:Bitmap) = [
for x in 0 .. img.Width - 1 do
for y in 0 .. img.Height - 1 do
yield (x,y,img.GetPixel(x,y)) ]
// extract pixels from the 2 images
let pix1 = pixelize img1
let pix2 = pixelize img2
// sort by most intense color
let sorter (_,_,c:Color) = [c.R;c.G;c.B] |> Seq.max
// sort, combine and write pixels
(pix1 |> List.sortBy sorter,
pix2 |> List.sortBy sorter)
||> List.zip
|> List.iter (fun ((x1,y1,_),(_,_,c2)) ->
combo.SetPixel(x1,y1,c2))
// ... and save, we're done
combo.Save(target)

… and we are done. Assuming you downloaded the two images in the same place as

let root = __SOURCE_DIRECTORY__

let velasquez = Path.Combine(root,"velasquez.bmp")
let picasso = Path.Combine(root,"picasso.bmp")

let picasquez = Path.Combine(root,"picasquez.bmp")
let velasso = Path.Combine(root,"velasso.bmp")

(velasquez,picasso) |> combine velasso
(picasso,velasquez) |> combine picasquez

… which should create two images like these:

Not bad for 20 lines of code. Now you might argue that this isn’t the nicest, most functional code ever, and you would be right. There are a lot of things that could be done to improve that code; for instance, handling pictures of different sizes, or injecting an arbitrary Color sorting function – feel free to have fun with it!

Also, you might wonder why I picked that specific, and somewhat odd, sorting function. Truth be told, it happened by accident. In my first attempt, I simply summed the 3 colors, and the results were pretty bad. The reason for it is, Red, Green and Blue are encoded as bytes, and summing up 3 bytes doesn’t necessarily do what you would expect. Rather than, say, convert everything to int, I went the lazy route again…

Original images: http://1drv.ms/XmlTN4

16. June 2014 22:15

Like many a good man, I too got caught into the 2048 trap, which explains in part why I have been rather quiet on this blog lately (there are a couple other reasons, too).

In case you don't know what 2048 is yet, first, consider yourself lucky - and, fair warning, you might want to back away now, while you still have a chance. 2048 is a very simple and fun game, and one of the greatest time sinks since Tetris. You can play it here, and the source code is here on GitHub.

I managed to dodge the bullet for a while, until @PrestonGuillot, a good friend of mine, decided to write a 2048 bot as a fun weekend project to sharpen his F# skills, and dragged me down with him in the process. This has been a ton of fun, and this post is a moderately organized collection of notes from my diary as a recovering 2048 addict.

Let's begin with the end result. The video below shows a F# bot, written by my friend @Blaise_V, masterfully playing the game. I recorded it a couple of weeks ago, accelerating time "for dramatic purposes":

One of the problems Preston and I ran into early was how to handle interactions with the game. A recent post by @shanselman was praising Canopy as a great library for web UI testing, which gave me the idea to try it for that purpose. In spite of my deep incompetence of things web related, I found the Canopy F# DSL super easy to pick up, and got something crude working in a jiffy. With a bit of extra help from the awesome @lefthandedgoat, the creator of Canopy (thanks Chris!), it went from crude to pretty OK, and I was ready to focus on the interesting bits, the game AI.

I had so much fun in the process, I figured others might too, and turned this into another Community for F# Dojo, which you can find here.

More...

12. April 2014 11:05

A lightweight post this week. One of my favorite F# type providers is the World Bank type provider, which enables ridiculously easy access to a boatload of socio-economic data for every country in the world. However, numbers are cold – wouldn’t it be nice to visualize them using a map? Turns out it’s pretty easy to do, using another of my favorites, the R type provider. The rworldmap R package, as its name suggests, is all about world maps, and is a perfect fit with the World Bank data.

The video below shows you the results in action; I also added the code below, for good measure. The only caveat relates to the integration between the Deedle data frame library and R. I had to manually copy the Deedle.dll and Deedle.RProvider.Plugin.dll into packages\RProvider.1.0.5\lib for the R Provider to properly convert Deedle data frames into R data frames. Enjoy!

Here is the script I used:

#I @"..\packages\"
#r @"R.NET.1.5.5\lib\net40\RDotNet.dll"
#r @"RProvider.1.0.5\lib\RProvider.dll"
#r @"FSharp.Data.2.0.5\lib\net40\FSharp.Data.dll"
#r @"Deedle.0.9.12\lib\net40\Deedle.dll"
#r @"Deedle.RPlugin.0.9.12\lib\net40\Deedle.RProvider.Plugin.dll"

open FSharp.Data
open RProvider
open RProvider.base
open Deedle
open Deedle.RPlugin
open RProviderConverters

let wb = WorldBankData.GetDataContext()
wb.Countries.France.CapitalCity
wb.Countries.France.Indicators.Population (Total).[2000]

let countries = wb.Countries

let pop2000 = series [ for c in countries -> c.Code => c.Indicators.Population (Total).[2000]]
let pop2010 = series [ for c in countries -> c.Code => c.Indicators.Population (Total).[2010]]
let surface = series [ for c in countries -> c.Code => c.Indicators.Surface area (sq. km).[2010]]

let df = frame [ "Pop2000" => pop2000; "Pop2010" => pop2010; "Surface" => surface ]
df?Codes <- df.RowKeys

open RProvider.rworldmap

let map = R.joinCountryData2Map(df,"ISO3","Codes")
R.mapCountryData(map,"Pop2000")

df?Density <- df?Pop2010 / df?Surface
df?Growth <- (df?Pop2010 - df?Pop2000) / df?Pop2000

let map2 = R.joinCountryData2Map(df,"ISO3","Codes")
R.mapCountryData(map2,"Density")
R.mapCountryData(map2,"Growth")

Have a great week-end, everybody! And big thanks to Tomas for helping me figure out a couple of things about Deedle.

1. March 2014 14:32

During some recent meanderings through the confines of the internet, I ended up discovering the Winnow Algorithm. The simplicity of the approach intrigued me, so I thought it would be interesting to try and implement it in F# and see how well it worked.

The purpose of the algorithm is to train a binary classifier, based on binary features. In other words, the goal is to predict one of two states, using a collection of features which are all binary. The prediction model assigns weights to each feature; to predict the state of an observation, it checks all the features that are “active” (true), and sums up the weights assigned to these features. If the total is above a certain threshold, the result is true, otherwise it’s false. Dead simple – and so is the corresponding F# code:

type Observation = bool []
type Label = bool
type Example = Label * Observation
type Weights = float []

let predict (theta:float) (w:Weights) (obs:Observation) =
(obs,w) ||> Seq.zip
|> Seq.filter fst
|> Seq.sumBy snd
|> ((<) theta)

We create some type aliases for convenience, and write a predict function which takes in theta (the threshold), weights and and observation; we zip together the features and the weights, exclude the pairs where the feature is not active, sum the weights, check whether the threshold is lower that the total, and we are done.

In a nutshell, the learning process feeds examples (observations with known label), and progressively updates the weights when the model makes mistakes. If the current model predicts the output correctly, don’t change anything. If it predicts true but should predict false, it is over-shooting, so weights that were used in the prediction (i.e. the weights attached to active features) are reduced. Conversely, if the prediction is false but the correct result should be true, the active features are not used enough to reach the threshold, so they should be bumped up.

And that’s pretty much it – the algorithm starts with arbitrary initial weights of 1 for every feature, and either doubles or halves them based on the mistakes. Again, the F# implementation is completely straightforward. The weights update can be written as follows:

let update (theta:float) (alpha:float) (w:Weights) (ex:Example) =
let real,obs = ex
match (real,predict theta w obs) with
| (true,false) -> w |> Array.mapi (fun i x -> if obs.[i] then alpha * x else x)
| (false,true) -> w |> Array.mapi (fun i x -> if obs.[i] then x / alpha else x)
| _ -> w

Let’s check that the update mechanism works:

> update 0.5 2. [|1.;1.;|] (false,[|false;true;|]);;
val it : float [] = [|1.0; 0.5|]

The threshold is 0.5, the adjustment multiplier is 2, and each feature is currently weighted at 1. The state of our example is [| false; true; |], so only the second feature is active, which means that the predicted value will be 1. (the weight of that feature). This is above the threshold 0.5, so the predicted value is true. However, because the correct value attached to that example is false, our prediction is incorrect, and the weight of the second feature is reduced, while the first one, which was not active, remains unchanged.

Let’s wrap this up in a convenience function which will learn from a sequence of examples, and give us directly a function that will classify observations:

let learn (theta:float) (alpha:float) (fs:int) (xs:Example seq) =
let updater = update theta alpha
let w0 = [| for f in 1 .. fs -> 1. |]
let w = Seq.fold (fun w x -> updater w x) w0 xs
fun (obs:Observation) -> predict theta w obs

We pass in the number of features, fs, to initialize the weights at the correct size, and use a fold to update the weights for each example in the sequence. Finally, we create and return a function that, given an observation, will predict the label, based on the weights we just learnt.

And that’s it – in 20 lines of code, we are done, the Winnow is implemented.

More...

15. February 2014 12:51

My favorite column in MSDN Magazine is Test Run; it was originally focused on testing, but the author, James McCaffrey, has been focusing lately on topics revolving around numeric optimization and machine learning, presenting a variety of methods and approaches. I quite enjoy his work, with one minor gripe –his examples are all coded in C#, which in my opinion is really too bad, because the algorithms would gain much clarity if written in F# instead.

Back in June 2013, he published a piece on Amoeba Method Optimization using C#. I hadn’t seen that approach before, and found it intriguing. I also found the C# code a bit too hairy for my feeble brain to follow, so I decided to rewrite it in F#.

In a nutshell, the Amoeba approach is a heuristic to find the minimum of a function. Its proper respectable name is the Nelder-Nead method. The reason it is also called the Amoeba method is because of the way the algorithm works: in its simple form, it starts from a triangle, the “Amoeba”; at each step, the Amoeba “probes” the value of 3 points in its neighborhood, and moves based on how much better the new points are. As a result, the triangle is iteratively updated, and behaves a bit like an Amoeba moving on a surface.

Before going into the actual details of the algorithm, here is how my final result looks like. You can find the entire code here on GitHub, with some usage examples in the Sample.fsx script file. Let’s demo the code in action: in a script file, we load the Amoeba code, and use the same function the article does, the Rosenbrock function. We transform the function a bit, so that it takes a Point (an alias for an Array of floats, essentially a vector) as an input, and pass it to the solve function, with the domain where we want to search, in that case, [ –10.0; 10.0 ] for both x and y:

#load "Amoeba.fs"

open Amoeba
open Amoeba.Solver

let g (x:float) y =
100. * pown (y - x * x) 2 + pown (1. - x) 2

let testFunction (x:Point) =
g x.[0] x.[1]

solve Default [| (-10.,10.); (-10.,10.) |] testFunction 1000

Running this in the F# interactive window should produce the following:

val it : Solution = (0.0, [|1.0; 1.0|])
>

The algorithm properly identified that the minimum is 0, for a value of x = 1.0 and y = 1.0. Note that results may vary: this is a heuristic, which starts with a random initial amoeba, so each run could produce slightly different results, and might at times epically fail.

More...