Mathias Brandewinder on .NET, F#, VSTO and Excel development, and quantitative analysis / machine learning.
28. June 2010 13:14

A client asked me recently a fun probability question, which revolved around figuring out the probability of success of a research program. In a simplified form, here is the problem: imagine that you have multiple labs, each developing products which have independent probabilities of succeeding – what is the probability of more than a certain number of products being eventually successful?

Let’s illustrate on a simple example. Product A has a 30% probability of success, and product B a 60% probability of success. Combining these into a probability tree, we work out that there is an 18% chance of having 2 products successful, 18% + 12 % + 42% = 72% chance of having 1 or more products succeed, and 28% chances of a total failure.

It’s not a very complicated theoretical problem. Practically, however, when the number of products increases, the number of outcomes becomes large, fairly fast – and working out every single combination by hand is extremely tedious.

Fortunately, using a simple trick, we can generate these combinations with minimal effort. The representation of integers in base 2 is a decomposition in powers of 2, resulting in a unique sequence of 0 and 1. In our simplified example, if we consider the numbers 0, 1, 2 and 3, their decomposition is

0 = 0 x 2^2 + 0 x 2^1 –> 00

1 = 0 x 2^2 + 1 ^ 2^1 –> 01

2 = 1 x 2^2 + 0 x 2^1 –> 10

3 = 1 x 2^2 + 1 x 2^2 –> 11

As a result, if if consider a 1 to encode the success of a product, and a 0 its failure, the binary representation of integers from 0 to 3 gives us all possible outcomes for our two-products scenario.

More...

1. October 2008 17:22

In my previous post, I described how the Bass model can be used to forecast the market potential for a newly introduced product, using limited post-introduction data. In this post, I will apply the method to a real-world situation, to see how the method holds up in practice, what practical problems may arise, and how to address them.

# The data

My objective is to evaluate the long-term share of internet traffic of Chrome, the new Google browser. I will be using actual traffic data from a medium-sized website, the technology blog of Donn Felker. In case you wonder why I didn’t use my own data, unfortunately my own traffic is not steady enough to get a “statistically decent” sample of Chrome users, and Donn was gracious enough to share his data with me (Thank you!).

The data I will be using is the percentage of visits coming from users using Chrome as a browser. It covers September 2 to September 17, 2008, the 2 first weeks of Chrome on the market.

23. September 2008 17:37

On September 2, 2008, Google launched its browser, Chrome, with great buzz in the geekosphere. I gave it a spin, but stayed with Firefox (old habits die hard), and did not give it more thought until I came across this post where Donn Felker ventures his gut feeling for what the browser market will look like in 2009.

I believe that his forecast, while totally subjective, qualifies as an “expert opinion”, and is essentially correct, and wondered what quantitative analysis methods would add to it – and decided to give it a shot.

Properly representing the introduction of a new product on the market is a classic problem in quantitative modeling. At least two factors make it tricky: there is only limited data available (because it’s a new product), and the underlying model cannot be linear (because it starts from 0, and has a finite growth).

In 1969, Frank Bass proposed a model which is now a classic. It represents adoption as the combination of two factors: innovation and imitation. Innovators are the guys you see in line at the Apple store when a new iGizmo is launched; they have to have it first, regardless of how many people have it already. Imitators are the cautious ones, who will jump on board when enough people are using the product already – the more people already adopted, the more imitation will take place.

In terms of dynamics, innovators determine the early pick-up of the product, and create the initial critical mass of users– and imitators drive the bulk of the growth, going from early adoption to peak.

The mathematical formulation of the model goes like this:

(from http://www.valuebasedmanagement.net/methods_bass_curve_diffusion_innovation.html)

It is a very elegant and lightweight model, which takes only 3 parameters, and is surprisingly good at replicating actual adoption. The Excel model attached provides an illustration of the dynamics of the model, depending on its input parameters, the total population, and the rates of innovation and imitation.

Bass.xls (27.50 kb)
More...

7. July 2008 18:19
Just thought I would point out this page at Acaso Analytics, where Billy Boyle used my previous post on how to use a simple S-shaped curve to model the introduction of a new product on a market, and created a very cool interactive dashboard which illustrates how the curve looks like, and what happens to it when the parameters change. I am a big fan of quantitative models, and enjoyed his other posts as well, which are an eclectic collection of "illustrated" famous quantitative models. Nothing tells the story behind a mathematical model better than a good chart, or, better, an interactive one!
8. June 2008 06:47

One of my clients recently asked me to modify an Excel model, so that the adoption of products entering the market would follow a S-curve. After some digging and googling, I came across this excellent post by Juan C. Mendez, where he proposes a clean and very practical way to use the logistic function, and calibrate it through 3 input parameters: the peak value, and the time at which the curve reaches 10% and 90% of its peak value.

The beauty of his approach is that his function is compact so it can be typed in easily in a worksheet cell, and the input very understandable. However, I found it a bit restrictive: transforming it for values other than 10% and 90% requires some recalibration, and more importantly, it cannot accomodate values that are not "symmetrical" around 50%.

So I set to work through a generalized solution to the following problem: find a S-Curve that fits any arbitrary value, rather than just 10% and 90%.