This post continues my journey converting the Python samples from Machine Learning in Action into F#. On the program today: chapter 7, dedicated to AdaBoost. This is also the last chapter revolving around classification. After almost 6 months spending my week-ends on classifiers, I am rather glad to change gears a bit!

## The idea behind the algorithm

**Algorithm outline**

AdaBoost is short for “Adaptative Boosting”. Boosting is based on a very common-sense idea: instead of trying to find one perfect classifier that fits the dataset, the algorithm will train a sequence of classifiers, and, at each step, will analyze the latest classifiers’ results, and focus the next training round on reducing classification mistakes, by giving a bigger weight to the misclassified observations. In other words, “get better by working on your weaknesses”.

The second idea in AdaBoost, which I found very interesting and somewhat counter-intuitive, is that multiple poor classification models taken together can constitute a highly reliable source. Rather than discarding previous classifiers, AdaBoost combines them all into a meta-classifier. AdaBoost computes a weight Alpha for each of the “weak classifiers”, based on the proportion of examples properly classified, and classifies observations by taking a majority vote among the weak classifiers, weighted by their Alpha coefficients. In other words, “decide based on all sources of information, but take into account how reliable each source is”.

In pseudo-code, the algorithm looks like this:

Given examples = observations + labels,

Start with equal weight for each example.

Until overall quality is good enough or iteration limit reached,

- From the available weak classifiers,
- Pick the classifier with the lowest weighted prediction error,
- Compute its Alpha weight based on prediction quality,
- Update weights assigned to each example, based on Alpha and whether example was properly classified or not

**The weights update mechanism**

Let’s dive into the update mechanism for both the training example weights and the weak classifiers Alpha weights. Suppose that we have

- a training set with 4 examples & their label [ (E1, 1); (E2, –1); (E3, 1); (E4, –1) ],
- currently weighted [ 20%; 20%; 30%; 30% ],
*(note: example weights must sum to 100%)* - f is the best weak classifier selected.

If we apply a weak classifier f to the training set, we can check what examples are mis-classified, and compute the weighted error, i.e. the weighted proportion of mis-classifications:

Example | Label | Weight | f(E) | f is… | weighted error |

E1 | 1 | 0.2 | 1 | correct | 0.0 |

E2 | -1 | 0.2 | 1 |
incorrect |
0.2 |

E3 | 1 | 0.3 | 1 | correct | 0.0 |

E4 | -1 | 0.3 | -1 | correct | 0.0 |

0.2 |

This gives us a weighted error rate of 20% for f, given the weights.

The weight given to f in the final classifier is given by

Alpha = 0.5 x ln ((1 - error) / error)

Here is how Alpha looks, plotted as a function of the proportion correctly classified (i.e. 1 – error):

If 50% of the examples are properly classified, the classifier is totally random, and gets a weight of 0 – its output is ignored. Higher quality models get higher weights – and models with high level of misclassification get a strong negative weight. This is interesting; in essence, this treats them as a great negative source of information: if you know that I am always wrong, my answers are still highly informative – you just need to flip the answer…

## Comments

- How dense is the product of Sparse Matrices? (5)
- Testing and mocking your C# code with F# (5)
- Field notes from the F# tour (2)
- Throttling F# Events using the Reactive Extensions (3)

Comment RSSHow To Reverse Aging wrote: Hello, I enjoy reading all of your article. I want... [More]

thetaoofbadassus.tumblr.com wrote: I have read so many articles about the blogger lov... [More]

reverseagingprocess.newsvine.com wrote: Normally I do not read post on blogs, however I wo... [More]

How To Cure Herpes wrote: Thanks for finally writing about > Throttling F... [More]