Mathias Brandewinder on .NET, F#, VSTO and Excel development, and quantitative analysis / machine learning.
30. September 2012 13:08

After four weeks of vacations, I am back home, ready to continue my series of posts converting the samples from Machine Learning in Action from Python to F#.

Today’s post covers Chapter 5 of the book, dedicated to Logistic Regression. Logistic Regression is another classification method. It uses numeric data to determine how to separate observations into two classes, identified by 0 or 1.

The entire code presented in this post can be found on GitHub, commit 398677f

## The idea behind the algorithm

The main idea behind the algorithm is to find a function which, using the numeric data that describe an individual observation as input, will return a number between 0 and 1. Ideally, that function will return a number close to respectively 0 or 1 for observations belonging to group 0 or 1.

To achieve that result, the algorithm relies on the Sigmoid function, f(x) = 1 / (1 + exp(-x)) .

For any input value, the Sigmoid function returns a value in ] 0 ; 1 [. A positive value will return a value greater than 0.5, and the greater the input value, the closer to 1. One could think of the function as returning a probability: for very high or low values of x, there is a high certainty that it belongs to one of the two groups, and for values close to zero, the probability of each group is 50% / 50%.

The only thing needed then is a transformation taking the numeric values describing the observations from the dataset, and mapping them to a single value, such that applying the Sigmoid function to it produces results close to the group the observation belongs to. The most straightforward way to achieve this is to apply a linear combination: an observation with numeric values [ x1; x2; … xk ] will be converted into w0 + w1 x x1 + w2 x x2 … + wk x xk, by applying weights [ w0; w1; … wk ] to each of the components of the observation. Note how the weights have one extra element w0, which is used for a constant term.

If our observations had two components X and Y, each observation can be represented as a point (X, Y) in the plane, and what we are looking for is a straight line w0 + w1 x X + w2 x Y, such that every observation of group 0 is on one side of the line, and every observation of group 1 on the other side.

We now replaced one problem by another – how can we find a suitable set of weights W?

I won’t even attempt a full explanation of the approach, and will stick to fuzzy, high-level intuition. Basically, the algorithm starts with an arbitrary set of weights, and iteratively adjusts the weights, by comparing the results of the function and what it should be (the actual group), and adjusting them to reduce error.

Note: I’ll skip the Gradient Ascent method, and go straight to the second part of Chapter 5, which covers Stochastic Gradient Ascent, because the code is both easier to understand and more suitable to large datasets. On the other hand, the deterministic gradient ascent approach is probably clearer for the math inclined. If that’s your situation, you might be interested in this MSDN Magazine article, which presents a C# implementation of the Logistic Regression.

Let’s illustrate the update procedure, on an ultra-simplified example, where we have a single weight W. In that case, the predicted value for an observation which has value X will be sigmoid (W x X) , and the algorithm adjustment is given by the following formula:

W <- W + alpha x (Label – sigmoid (W x X))

where Label is the group the observation belongs to (0 or 1), and alpha is a user-defined parameter, between 0 and 1. In other words, W is updated based on the error, Label – sigmoid (W x X) . First, obviously, if there is no error, W will remain unchanged, there is nothing to adjust. Let’s consider the case where Label is 1, and both X and W are positive. In that case, Label – sigmoid (W x X) will be positive (between 0 and 1), and W will be increased. As W increases, the sigmoid becomes closer to 1, and the adjustments become progressively smaller. Similarly, considering all the cases for W and X (positive and negative), one can verify that W will be adjusted in a direction which reduces the classification error. Alpha can be described as “how aggressive” the adjustment should be – the closer to 1, the more W will be updated.

That’s the gist of the algorithm – the full-blown deterministic gradient algorithm proceeds to update the weights by considering the error on the entire dataset at once, which makes it more expensive, whereas the stochastic gradient approach updates the weights sequentially, taking the dataset observations one by one, which makes it convenient for larger datasets.

## Simple implementation

Enough talk – let’s jump into code, with a straightforward implementation first. We create a module “LogisticRegression”, and begin with building the function which predicts the class of an observation, given weights:

module LogisticRegression =

open System

let sigmoid x = 1.0 / (1.0 + exp -x)

// Vector dot product
let dot (vec1: float list)
(vec2: float list) =
List.zip vec1 vec2
|> List.map (fun e -> fst e * snd e)
|> List.sum

(vec2: float list) =
List.zip vec1 vec2
|> List.map (fun e -> fst e + snd e)

// Vector scalar product
let scalar alpha (vector: float list) =
List.map (fun e -> alpha * e) vector

// Weights have 1 element more than observations, for constant
let predict (weights: float list)
(obs: float list) =
1.0 :: obs
|> dot weights
|> sigmoid

More...

18. August 2012 03:39

This is the continuation of my series exploring Machine Learning, converting the code samples of “Machine Learning in Action” from Python to F# as I go through the book. Today’s post covers Chapter 4, which is dedicated to Naïve Bayes classification – and you can find the resulting code on GitHub.

Disclaimer: I am new to Machine Learning, and claim no expertise on the topic. I am currently reading“Machine Learning in Action”, and thought it would be a good learning exercise to convert the book’s samples from Python to F#.

## The idea behind the Algorithm

The canonical application of Bayes naïve classification is in text classification, where the goal is to identify to which pre-determined category a piece of text belongs to  – for instance, is this email I just received spam, or ham (“valuable” email)?

The underlying idea is to use individual words present in the text as indications for what category it is most likely to belong to, using Bayes Theorem, named after the cheerful-looking Reverend Bayes.

Imagine that you received an email containing the words “Nigeria”, “Prince”, “Diamonds” and “Money”. It is very likely that if you look into your spam folder, you’ll find quite a few emails containing these words, whereas, unless you are in the business of importing diamonds from Nigeria and have some aristocratic family, your “normal” emails would rarely contain these words. They have a much higher frequency within the category “Spam” than within the Ham, which makes them a potential flag for undesired business ventures.

On the other hand, let’s assume that you are a lucky person, and that typically, what you receive is Ham, with the occasional Spam bit. If you took a random email in your inbox, it is then much more likely that it belongs to the Ham category.

Bayes’ Theorem combines these two pieces of information together, to determine the probability that a particular email belongs to the “Spam” category, if it contains the word “Nigeria”:

P(is “Spam”|contains ”Nigeria”) = P(contains “Nigeria|is ”Spam”) x P(is “Spam”) / P(contains “Nigeria”)

In other words, 2 factors should be taken into account when deciding whether an email containing “Nigeria” is spam: how over-represented is that word in Spam, and how likely is it that any email is spammy in the first place?

The algorithm is named “Naïve”, because it makes a simplifying assumption about the text, which turns out to be very convenient for computations purposes, namely that each word appears with a frequency which doesn’t depend on other words. This is an unlikely assumption (the word “Diamond” is much more likely to be present in an email containing “Nigeria” than in your typical family-members discussion email).

We’ll leave it at that on the concepts –  I’ll refer the reader who want to dig deeper to the book, or to this explanation of text classification with Naïve Bayes.

## A simple F# implementation

For my first pass, I took a slightly different direction from the book, and decided to favor readability over performance. I assume that we are operating on a dataset organized as a sequence of text samples, each of them labeled by category, along these lines (example from the book “Machine Learning in Action”):

Note: the code presented here can be found found on GitHub

let dataset =
[| ("Ham",  "My dog has flea problems help please");
("Spam", "Maybe not take him to dog park stupid");
("Ham",  "My dalmatian is so cute I love him");
("Spam", "Stop posting stupid worthless garbage");
("Ham",  "Mr Licks ate my steak how to stop him");
("Spam", "Quit buying worthless dog food stupid") |]

We will need to do some word counting to compute frequencies, so let’s start with a few utility functions:

    open System
open System.Text.RegularExpressions

// Regular Expression matching full words, case insensitive.
let matchWords = new Regex(@"\w+", RegexOptions.IgnoreCase)

// Extract and count words from a string.
// http://stackoverflow.com/a/2159085/114519
let wordsCount text =
matchWords.Matches(text)
|> Seq.cast<Match>
|> Seq.groupBy (fun m -> m.Value)
|> Seq.map (fun (value, groups) ->
value.ToLower(), (groups |> Seq.length))

// Extracts all words used in a string.
let vocabulary text =
matchWords.Matches(text)
|> Seq.cast<Match>
|> Seq.map (fun m -> m.Value.ToLower())
|> Seq.distinct

// Extracts all words used in a dataset;
// a Dataset is a sequence of "samples",
// each sample has a label (the class), and text.
let extractWords dataset =
dataset
|> Seq.map (fun sample -> vocabulary (snd sample))
|> Seq.concat
|> Seq.distinct

// "Tokenize" the dataset: break each text sample
// into words and how many times they are used.
let prepare dataset =
dataset
|> Seq.map (fun (label, sample) -> (label, wordsCount sample))

We use a Regular Expression, \w+, to match all words, in a case-insensitive way. wordCount extracts individual words and the number of times they occur, while vocabulary simply returns the words encountered. The prepare function takes a complete dataset, and transforms each text sample into a Tuple containing the original classification label, and a Sequence of Tuples containing all lower-cased words found and their count.

More...

8. July 2012 11:34

I recently spent some time thinking about code performance optimization problems, which led me to dust my old class notes on Queuing Theory. In general, Queuing studies the behavior of networks of Queues, where jobs arrive at various time intervals, and wait in line until they can be processed / served by a Node / Server, potentially propagating new Jobs going to other connected queues upon completion.

The simplest case in Queues is known as the Single-Server Queue – a queue where all jobs wait in a single line, and are processed by a single server. Single queues are interesting in their own right, and can be found everywhere. They are also important, because they are the building blocks of Networks of Queues, and understanding what happens at a single Queue level helps understand how they work when connected together.

The question we’ll be looking at today is the following: given the characteristics of a Queue, what can we say about its performance over time? For instance, can we estimate how often the Queue will be busy? How many jobs will be backed up waiting for service on average? How large can the queue get? How long will it take, on average, to process an incoming job?

<Side note: the picture above is not completely random. It depicts the Marsupilami, a fictional creature who is an expert in managing long tails, known in French as “Queues”>

We’ll use a simple Monte-Carlo simulation model to approach these questions, and see whether the results we observe match the results predicted by theory.

What we are interested in is observing the state of the Queue over time. Two events drive the behavior of the queue:

• a new Job arrives in the queue to be processed, and either gets processed immediately by the server, or is placed at the end of the line,
• the server completes a Job, picks the next one in line if available and works on it until it’s done.

From this description, we can identify a few elements that are important in modeling the queue:

• whether the Server is Idle or Busy,
• whether Jobs are waiting in the Queue to be processed,
• how long it takes the Server to process a Job,
• how new Jobs arrive to the Queue over time

## Modeling the Queue

Let’s get coding! We create a small F# script (Note: the complete code sample is available at the bottom of the post, as well as on FsSnip), and begin by defining a Discriminated Union type Status, which represents the state of the Server at time T:

type Status = Idle | Busy of DateTime * int

Our server can be in two states: Idle, or Busy, in which case we also need to know when it will be done with its current Job, and how many Jobs are waiting to be processed.

In order to model what is happening to the Queue, we’ll create a State record type:

type State =
{ Start: DateTime;
Status: Status;
NextIn: DateTime }

Start and Status should be self-explanatory – they represent the time when the Queue started being in that new State, and the Server Status. The reason for NextIn may not be as obvious - it represents the arrival time of the next Job. First, there is an underlying assumption here that there is ALWAYS a next job: we are modeling the Queue as a perpetual process, so that we can simulate it for as long as we want. Then, the reason for this approach is that it simplifies the determination of the transition of the Queue between states:

let next arrival processing state =
match state.Status with
| Idle ->
{ Start = state.NextIn;
NextIn = state.NextIn + arrival();
Status = Busy(state.NextIn + processing(), 0) }
| Busy(until, waiting) ->
match (state.NextIn <= until) with
| true ->
{ Start = state.NextIn;
NextIn = state.NextIn + arrival();
Status = Busy(until, waiting + 1) }
| false ->
match (waiting > 0) with
| true ->
{ Start = until;
Status = Busy(until + processing(), waiting - 1);
NextIn = state.NextIn }
| false ->
{ Start = until;
Status = Idle;
NextIn = state.NextIn }

This somewhat gnarly function is worth commenting a bit. Its purpose is to determine the next state the Queue will enter in, given its current state, and 2 functions, arrival and processing, which have the same signature:

val f : (unit -> TimeSpan)

Both these functions take no arguments, and return a TimeSpan, which represents respectively how much time will elapse between the latest arrival in the Queue and the next one (the inter-arrival time), and how much time the Server will take completing its next Job. This information is sufficient to derive the next state of the system:

• If the Server is Idle, it will switch to Busy when the next Job arrives, and the arrival of next Job is schedule based on job inter-arrival time,
• If the Server is Busy, two things can happen: either the next Job arrives before the Server completes its work, or not. If a new Job arrives first, it increases the number of Jobs waiting to be processed, and we schedule the next Arrival. If the Server finishes first, if no Job is waiting to be processed, it becomes Idle, and otherwise it begins processing the Job in front of the line.

We are now ready to run a Simulation.

let simulate startTime arr proc =
let nextIn = startTime + arr()
let state =
{ Start = startTime;
Status = Idle;
NextIn = nextIn }
Seq.unfold (fun st ->
Some(st, next arr proc st)) state

We initialize the Queue to begin at the specified start time, with a cold-start (Idle), and unfold an infinite sequence of States, which can go on if we please.

## Running the Simulation

Let’s start with a Sanity check, and validate the behavior of a simple case, where Jobs arrive to the Queue every 10 seconds, and the Queue takes 5 seconds to process each Job.

First, let’s write a simple function to pretty-display the state of the Queue over time:

let pretty state =
let count =
match state.Status with
| Idle -> 0
| Busy(_, waiting) -> 1 + waiting
let nextOut =
match state.Status with
| Idle -> "Idle"
| Busy(until, _) -> until.ToLongTimeString()
let start = state.Start.ToLongTimeString()
let nextIn = state.NextIn.ToLongTimeString()
printfn "Start: %s, Count: %i, Next in: %s, Next out: %s" start count nextIn nextOut

Now we can define our model:

let constantTime (interval: TimeSpan) =
let ticks = interval.Ticks
fun () -> interval

let arrivalTime = new TimeSpan(0,0,10);
let processTime = new TimeSpan(0,0,5)

let simpleArr = constantTime arrivalTime
let simpleProc = constantTime processTime

let startTime = new DateTime(2010, 1, 1)
let constantCase = simulate startTime simpleArr simpleProc

Let’s simulate 10 transitions in fsi:

> Seq.take 10 constantCase |> Seq.iter pretty;;
Start: 12:00:00 AM, Count: 0, Next in: 12:00:10 AM, Next out: Idle
Start: 12:00:10 AM, Count: 1, Next in: 12:00:20 AM, Next out: 12:00:15 AM
Start: 12:00:15 AM, Count: 0, Next in: 12:00:20 AM, Next out: Idle
Start: 12:00:20 AM, Count: 1, Next in: 12:00:30 AM, Next out: 12:00:25 AM
Start: 12:00:25 AM, Count: 0, Next in: 12:00:30 AM, Next out: Idle
Start: 12:00:30 AM, Count: 1, Next in: 12:00:40 AM, Next out: 12:00:35 AM
Start: 12:00:35 AM, Count: 0, Next in: 12:00:40 AM, Next out: Idle
Start: 12:00:40 AM, Count: 1, Next in: 12:00:50 AM, Next out: 12:00:45 AM
Start: 12:00:45 AM, Count: 0, Next in: 12:00:50 AM, Next out: Idle
Start: 12:00:50 AM, Count: 1, Next in: 12:01:00 AM, Next out: 12:00:55 AM
val it : unit = ()

Looks like we are doing something right – the simulation displays an arrival every 10 seconds, followed by 5 seconds of activity until the job is processed, and 5 seconds of Idleness until the next arrival.

Let’s do something a bit more complicated – arrivals with random, uniformly distributed inter-arrival times:

let uniformTime (seconds: int) =
let rng = new Random()
fun () ->
let t = rng.Next(seconds + 1)
new TimeSpan(0, 0, t)

let uniformArr = uniformTime 10

let uniformCase = simulate startTime uniformArr simpleProc

Here, arrival times will take any value (in seconds) between 0 and 10, included – with an average of 5 seconds between arrivals. A quick run in fsi produces the following sample:

> Seq.take 10 uniformCase |> Seq.iter pretty;;
Start: 12:00:00 AM, Count: 0, Next in: 12:00:02 AM, Next out: Idle
Start: 12:00:02 AM, Count: 1, Next in: 12:00:03 AM, Next out: 12:00:07 AM
Start: 12:00:03 AM, Count: 2, Next in: 12:00:11 AM, Next out: 12:00:07 AM
Start: 12:00:07 AM, Count: 1, Next in: 12:00:11 AM, Next out: 12:00:12 AM
Start: 12:00:11 AM, Count: 2, Next in: 12:00:11 AM, Next out: 12:00:12 AM
Start: 12:00:11 AM, Count: 3, Next in: 12:00:16 AM, Next out: 12:00:12 AM
Start: 12:00:12 AM, Count: 2, Next in: 12:00:16 AM, Next out: 12:00:17 AM
Start: 12:00:16 AM, Count: 3, Next in: 12:00:24 AM, Next out: 12:00:17 AM
Start: 12:00:17 AM, Count: 2, Next in: 12:00:24 AM, Next out: 12:00:22 AM
Start: 12:00:22 AM, Count: 1, Next in: 12:00:24 AM, Next out: 12:00:27 AM
val it : unit = ()

Not surprisingly, given the faster arrivals, we see the Queue getting slightly backed up, with Jobs waiting to be processed.

How would the Queue look like after, say, 1,000,000 transitions? Easy enough to check:

>  Seq.nth 1000000 uniformCase |> pretty;;
Start: 10:36:23 PM, Count: 230, Next in: 10:36:25 PM, Next out: 10:36:28 PM
val it : unit = ()

Interesting – looks like the Queue is getting backed up quite a bit as time goes by. This is a classic result with Queues: the utilization rate, defined as arrival rate / departure rate, is saturated. When the utilization rate is strictly less than 100%, the Queue is stable, and otherwise it will build up over time, accumulating a backlog of Jobs.

Let’s create a third type of model, with Exponential rates:

let exponentialTime (seconds: float) =
let lambda = 1.0 / seconds
let rng = new Random()
fun () ->
let t = - Math.Log(rng.NextDouble()) / lambda
let ticks = t * (float)TimeSpan.TicksPerSecond
new TimeSpan((int64)ticks)

let expArr = exponentialTime 10.0
let expProc = exponentialTime 7.0
let exponentialCase = simulate startTime expArr expProc

The arrivals and processing times are exponentially distributed, with an average time expressed in seconds. In our system, we expect new Jobs to arrive on average every 10 seconds, varying between 0 and + infinity, and Jobs take 7 seconds on average to process. The queue is not saturated, and should therefore not build up, which we can verify:

> Seq.nth 1000000 exponentialCase |> pretty;;
Start: 8:55:36 PM, Count: 4, Next in: 8:55:40 PM, Next out: 8:55:36 PM
val it : unit = ()

A queue where both arrivals and processing times follow that distribution is a classic in Queuing Theory, known as a M/M/1 queue. It is of particular interest because some of its characteristics can be derived analytically – we’ll revisit that later.

## Measuring performance

We already saw a simple useful measurement for Queues, the utilization rate, which defines whether our Queue will explode or stabilize over time. This is important but crude – what we would really be interested in is measuring how much of a bottleneck the Queue creates. Two measures come to mind in this frame: how long is the Queue on average, and how much time does a Job take to go through the entire system (queue + processing)?

Let’s begin with the Queue length. On average, how many Jobs should we expect to see in the Queue (including Jobs being currently processed)?

The question is less trivial than it looks. We could naively simulate a Sequence of States, and average out the number of Jobs in each State, but this would be incorrect. To understand the issue, let’s consider a Queue with constant arrivals and processing times, where Jobs arrive every 10 seconds and take 1 second to process. The result will be alternating 0s and 1s – which would give a naïve average of 0.5 Jobs in queue. However, the system will be Busy for 1 seconds, and Idle for 9 seconds, with an average number of Jobs of 0.1 over time.

To correctly compute the average, we need to compute a weighted average, counting the number of jobs present in a state, weighted by the time the System spent in that particular state.

Let’s consider for illustration the example above, where we observe a Queue for 10 seconds, with 3 Jobs A, B, C arriving and departing. The average number of Jobs in the System is 3 seconds with 0, 5 seconds with 1 and 2 seconds with 2, which would give us (3x0 + 5x1 + 2x2)/10, i.e. 9/10 or 0.9 Jobs on average. We could achieve the same result by accumulating the computation over time, starting at each transition point: 2s x 0 + 2s x 1 + 1s x 2 + 2s x 1 + 1s x 2 + 1s x 1 + 1s x 0 = 9 “Jobs-seconds”, which over 10 seconds gives us the same result as before.

Let’s implement this. We will compute the average using an Accumulator, using Sequence Scan: for each State of the System, we will measure how much time was spent, in Ticks, as well as how many Jobs were in the System during that period, and accumulate the total number of ticks since the Simulation started, as well as the total number of “Jobs-Ticks”, so that the average until that point will simply be:

Average Queue length = sum of Job-Ticks / sum of Ticks.

let averageCountIn (transitions: State seq) =
// time spent in current state, in ticks
let ticks current next =
next.Start.Ticks - current.Start.Ticks
// jobs in system in state
let count state =
match state.Status with
| Idle -> (int64)0
| Busy(until, c) -> (int64)c + (int64)1
// update state = total time and total jobsxtime
// between current and next queue state
let update state pair =
let current, next = pair
let c = count current
let t = ticks current next
(fst state) + t, (snd state) + (c * t)
// accumulate updates from initial state
let initial = (int64)0, (int64)0
transitions
|> Seq.pairwise
|> Seq.scan (fun state pair -> update state pair) initial
|> Seq.map (fun state -> (float)(snd state) / (float)(fst state))

Let’s try this on our M/M/1 queue, the exponential case described above:

> averageCountIn exponentialCase |> Seq.nth 1000000 ;;
val it : float = 2.288179686

According to theory, for an M/M/1 Queue, that number should be rho / (1-rho), i.e. (7/10) / (1-(7/10)), which gives 2.333. Close enough, I say.

Let’s look at the Response Time now, that is, the average time it takes for a Job to leave the system once it entered the queue.

We’ll use an idea similar to the one we used for the average number of Jobs in the System. On our illustration, we can see that A stays in for 3 seconds, B for 4s and C for 2 seconds. In that case, the average is time A + time B + time C / 3 jobs, i.e. (3 + 4 + 2)/3 = 3 seconds. But we can also decompose the time spent by A, B and C in the system by summing up not by Job, but by period between transitions. In this case, we would get Time spent by A, B, C = 2s x 0 jobs + 2s x 1 job + 1s x 2 jobs + 2s x 1 job + 1s x 2 jobs + 1s x 1 job + 1s x 0 jobs = 9 “job-seconds”, which would give us the correct total time we need.

We can use that idea to implement the average time spent in the system in the same fashion we did the average jobs in the system, by accumulating the measure as the sequence of transition unfolds:

let averageTimeIn  (transitions: State seq) =
// time spent in current state, in ticks
let ticks current next =
next.Start.Ticks - current.Start.Ticks
// jobs in system in state
let count state =
match state.Status with
| Idle -> (int64)0
| Busy(until, c) -> (int64)c + (int64)1
// count arrivals
let arrival current next =
if count next > count current then (int64)1 else (int64)0
// update state = total time and total arrivals
// between current and next queue state
let update state pair =
let current, next = pair
let c = count current
let t = ticks current next
let a = arrival current next
(fst state) + a, (snd state) + (c * t)
// accumulate updates from initial state
let initial = (int64)0, (int64)0
transitions
|> Seq.pairwise
|> Seq.scan (fun state pair -> update state pair) initial
|> Seq.map (fun state ->
let time = (float)(snd state) / (float)(fst state)
new TimeSpan((int64)time))

Trying this out on our M/M/1 queue, we theoretically expect an average of 23.333 seconds, and get 22.7 seconds:

> averageTimeIn exponentialCase |> Seq.nth 1000000 ;;
val it : TimeSpan = 00:00:22.7223798 {Days = 0;
Hours = 0;
Milliseconds = 722;
Minutes = 0;
Seconds = 22;
Ticks = 227223798L;
TotalDays = 0.0002629905069;
TotalHours = 0.006311772167;
TotalMilliseconds = 22722.3798;
TotalMinutes = 0.37870633;
TotalSeconds = 22.7223798;}

Given the somewhat sordid conversions between Int64, floats and TimeSpan, this seems plausible enough.

## A practical example

Now that we got some tools at our disposition, let’s look at a semi-realistic example. Imagine a subway station, with 2 turnstiles (apparently also known as “Baffle Gates”), one letting people in, one letting people out. On average, it takes 4 seconds to get a person through the Turnstile (some people are more baffled than others) – we’ll model the processing time as an Exponential.

Now imagine that, on average, passengers arrive to the station every 5 seconds. We’ll model that process as an exponential too, even tough it’s fairly unrealistic to assume that the rate of arrival remains constant throughout the day.

// turnstiles admit 1 person / 4 seconds
let turnstileProc = exponentialTime 4.0
// passengers arrive randomly every 5s
let passengerArr = exponentialTime 5.0

Assuming the Law of conservation applies to subway station passengers too, we would expect the average rate of exit from the station to also be one every 5 seconds. However, unlike passengers coming in the station, passengers exiting arrived there by subway, and are therefore likely to arrive in batches. We’ll make the totally realistic assumption here that trains are never late, and arrive like clockwork at constant intervals, bringing in the same number of passengers. If trains arrive every 30 seconds, to maintain our average rate of 1 passenger every 5 seconds, each train will carry 6 passengers:

let batchedTime seconds batches =
let counter = ref 0
fun () ->
counter := counter.Value + 1
if counter.Value < batches
then new TimeSpan(0, 0, 0)
else
counter := 0
new TimeSpan(0, 0, seconds)
// trains arrive every 30s with 5 passengers
let trainArr = batchedTime 30 6

How would our 2 Turnstiles behave? Let’s check:

// passengers arriving in station
let queueIn = simulate startTime passengerArr turnstileProc
// passengers leaving station
let queueOut = simulate startTime trainArr turnstileProc

let prettyWait (t:TimeSpan) = t.TotalSeconds

printfn "Turnstile to get in the Station"
averageCountIn queueIn |> Seq.nth 1000000 |> printfn "In line: %f"
averageTimeIn queueIn |> Seq.nth 1000000 |> prettyWait |> printfn "Wait in secs: %f"

printfn "Turnstile to get out of the Station"
averageCountIn queueOut |> Seq.nth 1000000 |> printfn "In line: %f"
averageTimeIn queueOut |> Seq.nth 1000000 |> prettyWait |> printfn "Wait in secs: %f";;
Turnstile to get in the Station
In line: 1.917345
Wait in secs: 9.623852
Turnstile to get out of the Station
In line: 3.702664
Wait in secs: 18.390027

The results fits my personal experience: the Queue at the exit gets backed up quite a bit, and passengers have to wait an average 18.4 seconds to exit the Station, while it takes them only 9.6 seconds to get in.

It also may seem paradoxical. People are entering and exiting the Station at the same rate, and turnstiles process passengers at the same speed, so how can we have such different behaviors at the two turnstiles?

The first point here is that Queuing processes can be counter-intuitive, and require thinking carefully about what is being measured, as we saw earlier with the performance metrics computations.

The only thing which differs between the 2 turnstiles is the way arrivals are distributed over time – and that makes a lot of difference. Arrivals are fairly evenly spaced, and there is a good chance that a Passenger who arrives to the Station finds no one in the Queue, and in that case, he will wait only 4 seconds on average. By contrast, when passengers exit, they arrive in bunches, and only the first one will find no-one in the Queue – all the others will have to wait for that first person to pass through before getting their chance, and therefore have by default a much larger “guaranteed wait time”.

That’s it for today! There is much more to Queuing than single-queues (if you are into probability and Markov chains, networks of Queues are another fascinating area), but we will leave that for another time. I hope you’ll have found this excursion in Queuing interesting and maybe even useful. I also thought this was an interesting topic illustrating F# Sequences – and I am always looking forward to feedback!

Complete code (F# script) is also available on FsSnip.net

open System

// Queue / Server is either Idle,
// or Busy until a certain time,
// with items queued for processing
type Status = Idle | Busy of DateTime * int

type State =
{ Start: DateTime;
Status: Status;
NextIn: DateTime }

let next arrival processing state =
match state.Status with
| Idle ->
{ Start = state.NextIn;
NextIn = state.NextIn + arrival();
Status = Busy(state.NextIn + processing(), 0) }
| Busy(until, waiting) ->
match (state.NextIn <= until) with
| true ->
{ Start = state.NextIn;
NextIn = state.NextIn + arrival();
Status = Busy(until, waiting + 1) }
| false ->
match (waiting > 0) with
| true ->
{ Start = until;
Status = Busy(until + processing(), waiting - 1);
NextIn = state.NextIn }
| false ->
{ Start = until;
Status = Idle;
NextIn = state.NextIn }

let simulate startTime arr proc =
let nextIn = startTime + arr()
let state =
{ Start = startTime;
Status = Idle;
NextIn = nextIn }
Seq.unfold (fun st ->
Some(st, next arr proc st)) state

let pretty state =
let count =
match state.Status with
| Idle -> 0
| Busy(_, waiting) -> 1 + waiting
let nextOut =
match state.Status with
| Idle -> "Idle"
| Busy(until, _) -> until.ToLongTimeString()
let start = state.Start.ToLongTimeString()
let nextIn = state.NextIn.ToLongTimeString()
printfn "Start: %s, Count: %i, Next in: %s, Next out: %s" start count nextIn nextOut

let constantTime (interval: TimeSpan) =
let ticks = interval.Ticks
fun () -> interval

let arrivalTime = new TimeSpan(0,0,10);
let processTime = new TimeSpan(0,0,5)

let simpleArr = constantTime arrivalTime
let simpleProc = constantTime processTime

let startTime = new DateTime(2010, 1, 1)
let constantCase = simulate startTime simpleArr simpleProc

printfn "Constant arrivals, Constant processing"
Seq.take 10 constantCase |> Seq.iter pretty;;

let uniformTime (seconds: int) =
let rng = new Random()
fun () ->
let t = rng.Next(seconds + 1)
new TimeSpan(0, 0, t)

let uniformArr = uniformTime 10
let uniformCase = simulate startTime uniformArr simpleProc

printfn "Uniform arrivals, Constant processing"
Seq.take 10 uniformCase |> Seq.iter pretty;;

let exponentialTime (seconds: float) =
let lambda = 1.0 / seconds
let rng = new Random()
fun () ->
let t = - Math.Log(rng.NextDouble()) / lambda
let ticks = t * (float)TimeSpan.TicksPerSecond
new TimeSpan((int64)ticks)

let expArr = exponentialTime 10.0
let expProc = exponentialTime 7.0
let exponentialCase = simulate startTime expArr expProc

printfn "Exponential arrivals, Exponential processing"
Seq.take 10 exponentialCase |> Seq.iter pretty;;

let averageCountIn (transitions: State seq) =
// time spent in current state, in ticks
let ticks current next =
next.Start.Ticks - current.Start.Ticks
// jobs in system in state
let count state =
match state.Status with
| Idle -> (int64)0
| Busy(until, c) -> (int64)c + (int64)1
// update state = total time and total jobsxtime
// between current and next queue state
let update state pair =
let current, next = pair
let c = count current
let t = ticks current next
(fst state) + t, (snd state) + (c * t)
// accumulate updates from initial state
let initial = (int64)0, (int64)0
transitions
|> Seq.pairwise
|> Seq.scan (fun state pair -> update state pair) initial
|> Seq.map (fun state -> (float)(snd state) / (float)(fst state))

let averageTimeIn  (transitions: State seq) =
// time spent in current state, in ticks
let ticks current next =
next.Start.Ticks - current.Start.Ticks
// jobs in system in state
let count state =
match state.Status with
| Idle -> (int64)0
| Busy(until, c) -> (int64)c + (int64)1
// count arrivals
let arrival current next =
if count next > count current then (int64)1 else (int64)0
// update state = total time and total arrivals
// between current and next queue state
let update state pair =
let current, next = pair
let c = count current
let t = ticks current next
let a = arrival current next
(fst state) + a, (snd state) + (c * t)
// accumulate updates from initial state
let initial = (int64)0, (int64)0
transitions
|> Seq.pairwise
|> Seq.scan (fun state pair -> update state pair) initial
|> Seq.map (fun state ->
let time = (float)(snd state) / (float)(fst state)
new TimeSpan((int64)time))

// turnstiles admit 1 person / 4 seconds
let turnstileProc = exponentialTime 4.0
// passengers arrive randomly every 5s
let passengerArr = exponentialTime 5.0

let batchedTime seconds batches =
let counter = ref 0
fun () ->
counter := counter.Value + 1
if counter.Value < batches
then new TimeSpan(0, 0, 0)
else
counter := 0
new TimeSpan(0, 0, seconds)
// trains arrive every 30s with 5 passengers
let trainArr = batchedTime 30 6

// passengers arriving in station
let queueIn = simulate startTime passengerArr turnstileProc
// passengers leaving station
let queueOut = simulate startTime trainArr turnstileProc

let prettyWait (t:TimeSpan) = t.TotalSeconds

printfn "Turnstile to get in the Station"
averageCountIn queueIn |> Seq.nth 1000000 |> printfn "In line: %f"
averageTimeIn queueIn |> Seq.nth 1000000 |> prettyWait |> printfn "Wait in secs: %f"

printfn "Turnstile to get out of the Station"
averageCountIn queueOut |> Seq.nth 1000000 |> printfn "In line: %f"
averageTimeIn queueOut |> Seq.nth 1000000 |> prettyWait |> printfn "Wait in secs: %f"
27. May 2012 10:20

Markov chains are a classic in probability model. They represent systems that evolve between states over time, following a random but stable process which is memoryless. The memoryless-ness is the defining characteristic of Markov processes,  and is known as the Markov property. Roughly speaking, the idea is that if you know the state of the process at time T, you know all there is to know about it – knowing where it was at time T-1 would not give you additional information on where it may be at time T+1.

While Markov models come in multiple flavors, Markov chains with finite discrete states in discrete time are particularly interesting. They describe a system which is changes between discrete states at fixed time intervals, following a transition pattern described by a transition matrix.

Let’s illustrate with a simplistic example. Imagine that you are running an Airline, AcmeAir, operating one plane. The plane goes from city to city, refueling and doing some maintenance (or whatever planes need) every time.

Each time the plane lands somewhere, it can be in three states: early, on-time, or delayed. It’s not unreasonable to think that if our plane landed late somewhere, it may be difficult to catch up with the required operations, and as a result, the likelihood of the plane landing late at its next stop is higher. We could represent this in the following transition matrix (numbers totally made up):

 Current \ Next Early On-time Delayed Early 10% 85% 5% On-Time 10% 75% 15% Delayed 5% 60% 35%

Each row of the matrix represents the current state, and each column the next state. The first row tells us that if the plane landed Early, there is a 10% chance we’ll land early in our next stop, an 80% chance we’ll be on-time, and a 5% chance we’ll arrive late. Note that each row sums up to 100%: given the current state, we have to end up in one of the next states.

How could we simulate this system? Given the state at time T, we simply need to “roll” a random number generator for a percentage between 0% and 100%, and depending on the result, pick our next state – and repeat.

Using F#, we could model the transition matrix as an array (one element per state) of arrays (the probabilities to land in each state), which is pretty easy to define using Array comprehensions:

let P =
[|
[| 0.10; 0.85; 0.05 |];
[| 0.10; 0.75; 0.15 |];
[| 0.05; 0.60; 0.35 |]
|]

(Note: the entire code sample is also posted on fsSnip.net/ch)

To simulate the behavior of the system, we need a function that given a state and a transition matrix, produces the next state according to the transition probabilities:

// given a roll between 0 and 1
// and a distribution D of
// probabilities to end up in each state
// returns the index of the state
let state (D: float[]) roll =
let rec index cumul current =
let cumul = cumul + D.[current]
match (roll <= cumul) with
| true -> current
| false -> index cumul (current + 1)
index 0.0 0

// given the transition matrix P
// the index of the current state
// and a random generator,
// simulates what the next state is
let nextState (P: float[][]) current (rng: Random) =
let dist = P.[current]
let roll = rng.NextDouble()
state dist roll

// given a transition matrix P
// the index i of the initial state
// and a random generator
// produces a sequence of states visited
let simulate (P: float[][]) i (rng: Random) =
Seq.unfold (fun s -> Some(s, nextState P s rng)) i

The state function is a simple helper; given an array D which is assumed to contain probabilities to transition to each of the states, and a “roll” between 0.0 and 1.0, returns the corresponding state. nextState uses that function, by first retrieving the transition probabilities for the current state i, “rolling” the dice, and using state to compute the simulated next state. simulate uses nextState to create an infinite sequence of states, starting from an initial state i.

We need to open System to use the System.Random class – and we can now use this in the F# interactive window:

> let flights = simulate P 1 (new Random());;

val flights : seq<int>

> Seq.take 50 flights |> Seq.toList;;

val it : int list =
[1; 0; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 0; 1; 1; 2; 2; 2; 1; 1; 1; 1; 1; 1;
1; 1; 2; 1; 1; 1; 1; 1; 1; 1; 1; 2; 1; 1; 1; 1; 1; 2; 1; 1; 1; 1; 1; 1; 1]
>

Our small sample shows us what we expect: mostly on-time (Fly AcmeAir!), with some episodical delayed or early flights.

How many delays would we observe on a 1000-flights simulation? Let’s try:

> Seq.take 1000 flights |> Seq.filter (fun i -> i = 2) |> Seq.length;;
val it : int = 174
> 

We observe about 17% of delayed flights. This is relevant information, but a single simulation is just that – an isolated case. Fortunately, Markov chains have an interesting property: if it is possible to go from any state to any state, then the system will have a stationary distribution, which corresponds to its long term equilibrium. Essentially, regardless of the starting point, over long enough periods, each state will be observed with a stable frequency.

One way to understand better what is going on is to expand our frame. Instead of considering the exact state of the system, we can look at it in terms of probability: at any point in time, the system has a certain probability to be in each of its states.

For instance, imagine that given current information, we know that our plane will land at its next stop either early or on time, with a 50% chance of each. In that case, we can determine the probability that its next stop will be delayed by combining the transition probabilities:

p(delayed in T+1) = p(delayed in T) x P(delayed in T+1 | delayed in T) +  p(on-time in T) x P(delayed in T+1 | on-time in T) + p(early in T) x P(delayed in T+1 | early in T)

p(delayed in T+1) = 0.0 x 0.35 + 0.5 x 0.15 + 0.5 x 0.05 = 0.1

This can be expressed much more concisely using Vector notation. We can represent the state as a vector S, where each component of the vector is the probability to be in each state, in our case

S(T) = [ 0.50; 0.50; 0.0 ]

In that case, the state at time T+1 will be:

S(T+1) = S(T) x P

Let’s make that work with some F#. The product of a vector by a matrix is the dot-product of the vector with each column vector of the matrix:

// Vector dot product
let dot (V1: float[]) (V2: float[]) =
Array.zip V1 V2
|> Array.map(fun (v1, v2) -> v1 * v2)
|> Array.sum

// Extracts the jth column vector of matrix M
let column (M: float[][]) (j: int) =
M |> Array.map (fun v -> v.[j])

// Given a row-vector S describing the probability
// of each state and a transition matrix P, compute
// the next state distribution
let nextDist S P =
P
|> Array.mapi (fun j v -> column P j)
|> Array.map(fun v -> dot v S)

We can now handle our previous example, creating a state s with a 50/50 chance of being in state 0 or 1:

> let s = [| 0.5; 0.5; 0.0 |];;

val s : float [] = [|0.5; 0.5; 0.0|]

> let s' = nextDist s P;;

val s' : float [] = [|0.1; 0.8; 0.1|]

> 

We can also easily check what the state of the system should be after, say, 100 flights:

> let s100 = Seq.unfold (fun s -> Some(s, nextDist s P)) s |> Seq.nth 100;;

val s100 : float [] = [|0.09119496855; 0.7327044025; 0.1761006289|]

After 100 flights, starting from either early or on-time, we have about 17% of chance of being delayed. Note that this is consistent with what we observed in our initial simulation. Given that our Markov chain has a stationary distribution, this is to be expected: unless our simulation was pathologically unlikely, we should observe the same frequency of delayed flights in the long run, no matter what the initial starting state is.

Can we compute that stationary distribution? The typical way to achieve this is to bust out some algebra and solve V = P x V, where V is the stationary distribution vector and P the transition matrix.

Here we’ll go for a numeric approximation approach. Rather than solving the system of equations, we will start from a uniform distribution over the states, and apply the transition matrix until the distance between two consecutive states is under a threshold Epsilon:

// Euclidean distance between 2 vectors
let dist (V1: float[]) V2 =
Array.zip V1 V2
|> Array.map(fun (v1, v2) -> (v1 - v2) * (v1 - v2))
|> Array.sum

// Evaluate stationary distribution
// by searching for a fixed point
// under tolerance epsilon
let stationary (P: float[][]) epsilon =
let states = P.[0] |> Array.length
[| for s in 1 .. states -> 1.0 / (float)states |] // initial
|> Seq.unfold (fun s -> Some((s, (nextDist s P)), (nextDist s P)))
|> Seq.map (fun (s, s') -> (s', dist s s'))
|> Seq.find (fun (s, d) -> d < epsilon)

Running this on our example results in the following stationary distribution estimation:

> stationary P 0.0000001;;
val it : float [] * float =
([|0.09118958333; 0.7326858333; 0.1761245833|], 1.1590625e-08)

In short, in the long run, we should expect our plane to be early 9.1% of the time, on-time 73.2%, and delayed 17.6%.

Note: the fixed point approach above should work if a unique stationary distribution exists. If this is not the case, the function may never converge, or may converge to a fixed point that depends on the initial conditions. Use with caution!

Armed with this model, we could now ask interesting questions. Suppose for instance that we could improve the operations of AcmeAir, and reduce the chance that our next arrival is delayed given our current state. What should we focus on – should we reduce the probability to remain delayed after a delay (strategy 1), or should we prevent the risk of being delayed after an on-time landing (strategy 2)?

One way to look at this is to consider the impact of each strategy on the long-term distribution. Let’s compare the impact of a 1-point reduction of delays in each case, which we’ll assume gets transferred to on-time. We can then create the matrices for each strategy, and compare their respective stationary distributions:

> let strat1 = [|[|0.1; 0.85; 0.05|]; [|0.1; 0.75; 0.15|]; [|0.05; 0.61; 0.34|]|]
let strat2 = [|[|0.1; 0.85; 0.05|]; [|0.1; 0.76; 0.14|]; [|0.05; 0.60; 0.35|]|];;

val strat1 : float [] [] =
[|[|0.1; 0.85; 0.05|]; [|0.1; 0.75; 0.15|]; [|0.05; 0.61; 0.34|]|]
val strat2 : float [] [] =
[|[|0.1; 0.85; 0.05|]; [|0.1; 0.76; 0.14|]; [|0.05; 0.6; 0.35|]|]

> stationary strat1 0.0001;;
val it : float [] * float =
([|0.091; 0.7331333333; 0.1758666667|], 8.834666667e-05)
> stationary strat2 0.0001;;
val it : float [] * float = ([|0.091485; 0.740942; 0.167573|], 1.2698318e-05)
> 

The numbers tell the following story: strategy 2 (improve reduction of delays after on-time arrivals) is better: it results in 16.6% delays, instead of 17.6% for strategy 1. Intuitively, this makes sense, because most of our flights are on-time, so an improvement in this area will have a much larger impact in the overall results that a comparable improvement on delayed flights.

There is (much) more to Markov chains than this, and there are many ways the code presented could be improved upon – but I’ll leave it at that for today, hopefully you will have picked up something of interest along the path of this small exploration!

I also posted the complete code sample on fsSnip.net/ch.

20. May 2012 15:10

During a recent Internet excursions, I ended up on the Infinite Monkey Theorem wiki page. The infinite monkey is a somewhat famous figure in probability; his fame comes from the following question: suppose you gave a monkey a typewriter, what’s the likelihood that, given enough time randomly typing, he would produce some noteworthy literary output, say, the complete works of Shakespeare?

Somewhat unrelatedly, this made me wonder about the following question: imagine that I had a noteworthy literary output and such a monkey – could I get my computer to distinguish these?

For the sake of experimentation, let’s say that our “tolerable page” is the following paragraph by Jorge Luis Borges:

Everything would be in its blind volumes. Everything: the detailed history of the future, Aeschylus' The Egyptians, the exact number of times that the waters of the Ganges have reflected the flight of a falcon, the secret and true nature of Rome, the encyclopedia Novalis would have constructed, my dreams and half-dreams at dawn on August 14, 1934, the proof of Pierre Fermat's theorem, the unwritten chapters of Edwin Drood, those same chapters translated into the language spoken by the Garamantes, the paradoxes Berkeley invented concerning Time but didn't publish, Urizen's books of iron, the premature epiphanies of Stephen Dedalus, which would be meaningless before a cycle of a thousand years, the Gnostic Gospel of Basilides, the song the sirens sang, the complete catalog of the Library, the proof of the inaccuracy of that catalog. Everything: but for every sensible line or accurate fact there would be millions of meaningless cacophonies, verbal farragoes, and babblings. Everything: but all the generations of mankind could pass before the dizzying shelves—shelves that obliterate the day and on which chaos lies—ever reward them with a tolerable page.

Assuming my imaginary typewriter-pounding monkey is typing each letter with equal likelihood, my first thought was that by comparison, a text written in English would have more structure and predictability – and we could use Entropy to measure that difference in structure.

Entropy is the expected information of a message; the general idea behind it is that a signal where every outcome is equally likely is unpredictable, and has a high entropy, whereas a message where certain outcomes are more frequent than others has more structure and lower entropy.

The formula for Entropy, lifted from Wikipedia, is given below; it corresponds to the average quantity of information of a message X, where X can take different values x:

For instance, a series of coin tosses with the proverbial fair coin would produce about as many heads and tails, and the entropy would come out as –0.5 x log2(0.5) – 0.5 x log2(0.5) = 1.0, whereas a totally unfair coin producing only heads would have an entropy of –1.0 x log2(1.0) – 0.0 = 0.0, a perfectly predictable signal.

How could I apply this to my problem?

First, we need a mechanical monkey. Given a sample text (our benchmark), we’ll extract its alphabet (all characters used), and create a virtual typewriter where each key corresponds to one of these characters. The monkey will then produce monkey literature, by producing a string as long as the original text, “typing” on the virtual keyboard randomly:

let monkey (text: string) =
let rng = new System.Random()
let alphabet = Seq.distinct text |> Seq.toArray
let alphabetLength = Array.length alphabet
let textLength = text.Length
[| for i in 1 .. textLength ->
alphabet.[rng.Next(0, alphabetLength)] |]

We store the Borges paragraph as:

let borges = "Everything would be in its blind volumes. (etc...)

… and we can now run the Monkey on the Borges paragraph,

> new string(monkey borges);;

which produces a wonderful output (results may vary – you could, after all, get a paragraph of great literary value):

ovfDG4,xUfETo4Sv1dbxkknthzB19Dgkphz3Tsa1L——w—w iEx-Nra mDs--k3Deoi—hFifskGGBBStU11-iiA3iU'S R9DnhzLForbkhbF:PbAUwP—ir-U4sF u w-tPf4LLuRGsDEP-ssTvvLk3NyaE f:krRUR-Gbx'zShb3wNteEfGwnuFbtsuS9Fw9lgthE1vL,tE4:Uk3UnfS FfxDbcLdpihBT,e'LvuaN4royz ,Aepbu'f1AlRgeRUSBDD.PwzhnA'y.s9:d,F'T3xvEbvNmy.vDtmwyPNotan3Esli' BTFbmywP.fgw9ytAouLAbAP':txFvGvBti Fg,4uEu.grk-rN,tEnvBs3uUo,:39altpBud3'-Aetp,T.chccE1yuDeUT,Pp,R994tnmidffcFonPbkSuw :pvde .grUUTfF1Flb4s cw'apkt GDdwadn-Phn4h.TGoPsyc'pcBEBb,kasl—aepdv,ctA TxrhRUgPAv-ro3s:aD z-FahLcvS3k':emSoz9NTNRDuus3PSpe-Upc9nSnhBovRfoEBDtANiGwvLikp4w—nPDAfknr—p'-:GnPEsULDrm'ub,3EyTmRoDkG9cERvcyxzPmPbD Fuit:lwtsmeUEieiPdnoFUlP'uSscy—Nob'st1:dA.RoLGyakGpfnT.zLb'hsBTo.mRRxNerBD9.wvFzg,,UAc,NSx.9ilLGFmkp—:FnpcpdP—-ScGSkmN9BUL1—uuUpBhpDnwS9NddLSiBLzawcbywiG—-E1DBlx—aN.D9u-ggrs3S4y4eFemo3Ba g'zeF:EsS-gTy-LFiUn3DvSzL3eww4NPLxT13isGb:—vBnLhy'yk1Rsip—res9t vmxftwvEcc::ezvPPziNGPylw:tPrluTl3E,T,vDcydn SyNSooaxaT llwNtwzwoDtoUcwlBdi',UrldaDFeFLk 3goos4unyzmFD9.vSTuuv4.wzbN.ynakoetb—ecTksm—-f,N:PtoNTne3EdErBrzfATPRreBv1:Rb.cfkELlengNkr'L1cA—lfAgU-vs9  Lic-m,kheU9kldUzTAriAg:bBUb'n—x'FL Adsn,kmar'p BE9akNr194gP,hdLrlgvbymp dplh9sPlNf'.'9

Does the entropy of these 2 strings differ? Let’s check.

let I p =
match p with
| 0.0 -> 0.0
| _ -> - System.Math.Log(p, 2.0)

let freq text letter =
let count =
Seq.fold (fun (total, count) l ->
if l = letter
then (total + 1.0, count + 1.0)
else (total + 1.0, count)) (0.0, 0.0) text
(letter, snd count / fst count)

let H text =
let alphabet = Seq.distinct text
Seq.map (fun l -> snd (freq text l)) alphabet
|> Seq.sumBy (fun p -> p * I(p))

I computes the self-information of a message of probability p, freq computes the frequency of a particular character within a string, and H, the entropy, proceeds by first extracting all the distinct characters present in the text into an “alphabet”, and then maps each character of the alphabet to its frequency and computes the expected self-information.

We have now all we need – let’s see the results:

> H borges;;
val it : float = 4.42083025
> H monkeyLit;;
val it : float = 5.565782825

Monkey lit has a higher entropy / disorder than Jorge Luis Borges’ output. This is reassuring.

How good of a test is this, though? In the end, what we measured with Entropy is that some letters were more likely to come up than others, which we would expect from a text written in English, where the letter “e” has a 12% probability to show up. However, if we gave our Monkey a Dvorak keyboard, he may fool our test; we could also create an uber Mechanical Monkey which generates a string based on the original text frequency:

let uberMonkey (text: string) =
let rng = new System.Random()
let alphabet = Seq.distinct text |> Seq.toArray
let textLength = text.Length
let freq = Array.map (fun l -> freq text l) alphabet
let rec index i p cumul =
let cumul = cumul + snd freq.[i]
if cumul >= p then i else index (i+1) p cumul
[| for i in 1 .. textLength ->
let p = rng.NextDouble()
alphabet.[index 0 p 0.0] |]

This somewhat ugly snippet computes the frequency of every letter in the original text, and returns random chars based on the frequency. The ugly part is the index function; given a probability p, it returns the index of the first char in the frequency array such that the cumulative probability of all chars up to that index is greater than p, which will return each index based on its frequency.

Running the uberMonkey produces another milestone of worldwide literature:

lk  aeew omauG dga rlhhfatywrrg   earErhsgeotnrtd utntcnd  o,  ntnnrcl gltnhtlu eAal yeh uro  it-lnfELiect eaurefe Busfehe h f1efeh hs eo.dhoc , rbnenotsno, e et tdiobronnaeonioumnroe  escr l hlvrds anoslpstr'thstrao lttamxeda iotoaeAB ai sfoF,hfiemnbe ieoobGrta dogdirisee nt.eba   t oisfgcrn  eehhfrE' oufepas Eroshhteep snodlahe sau  eoalymeonrt.ss.ehstwtee,ttudtmr ehtlni,rnre  ae h  e chp c crng Rdd  eucaetee gire dieeyGhr a4ELd  sr era tndtfe rsecltfu  t1tefetiweoroetasfl bnecdt'eetoruvmtl ii fi4fprBla Fpemaatnlerhig  oteriwnEaerebepnrsorotcigeotioR g  bolrnoexsbtuorsr si,nibbtcrlte uh ts ot  trotnee   se rgfTf  ibdr ne,UlA sirrr a,ersus simf bset  guecr s tir9tb e ntcenkwseerysorlddaaRcwer ton redts— nel ye oi leh v t go,amsPn 'e  areilynmfe ae  evr  lino t, s   a,a,ytinms   elt i :wpa s s hAEgocetduasmrlfaar  de cl,aeird fefsef E  s se hplcihf f  cerrn rnfvmrdpo ettvtu oeutnrk —toc anrhhne  apxbmaio hh  edhst, mfeveulekd. vrtltoietndnuphhgp rt1ocfplrthom b gmeerfmh tdnennletlie hshcy,,bff,na nfitgdtbyowsaooomg , hmtdfidha l aira chh olnnehehe acBeee  n  nrfhGh dn toclraimeovbca atesfgc:rt  eevuwdeoienpttdifgotebeegc ehms ontdec e,ttmae llwcdoh

… and, as expected, if we run our Entropy function on uberMonkeyLit, we get

> H uberMonkeyLit;;
val it : float = 4.385303632

This is pretty much what we got with the Borges original. The uberMonkey produced a paragraph just as organized as Borges, or so it seems.

Obviously, the raw Entropy calculation is not cutting it here. So what are we missing? The problem is that we are simply looking at the frequency of characters, which measures a certain form of order / predictability; however, there is “more” order than that in English. If I were to tell you that the 2 first characters of a text are “Th”, chances are, you would bet on “e” for the third position – because some sequences are much more likely than others, and “The” much more probable than “Thw”. The “raw” Entropy would consider the two following sequences “ABAABABB” and “ABABABAB” equally ordered (each contains 4 As and 4 Bs), whereas a human eye would consider that the second one, with its neat succession of A and Bs, may follow a pattern, where knowing the previous observations of the sequence conveys some information about what’s likely to show up next.

We’ll leave it at that for today, with an encouraging thought for those of you who may now worry that world literature could be threaten by offshore monkey typists. According to Wikipedia again,

In 2003, lecturers and students from the University of Plymouth MediaLab Arts course used a £2,000 grant from the Arts Council to study the literary output of real monkeys. They left a computer keyboard in the enclosure of six Celebes Crested Macaques in Paignton Zoo in Devon in England for a month, with a radio link to broadcast the results on a website.

Not only did the monkeys produce nothing but five pages consisting largely of the letter S, the lead male began by bashing the keyboard with a stone, and the monkeys continued by urinating and defecating on it. Phillips said that the artist-funded project was primarily performance art, and they had learned "an awful lot" from it.

£2,000 may seem a bit steep to watch monkeys defecating on typewriters; on the other hand, it seems that human writers can sleep quietly, without worrying about their jobs.

#### Need help with F#?

The premier team for
F# training & consulting.