The Kaggle/StackOverflow contest officially closed a few days ago, which makes it a perfect time to have a miniature retrospective on that experience. The objective of the contest was to write an algorithm to predict whether a StackOverflow question would be closed by moderators, and the reason why.
The contest was announced just a couple of days before what was supposed to be 4 weeks of computer-free vacation travelling around Europe. Needless to say, a quick change of plans followed; I am a big fan of StackOverflow, and Machine Learning has been on my mind quite a bit lately, so I packed my smallest laptop with Visual Studio installed. At the same time, the wonders of the Interwebs resulted in the formation of Team Charon - the awesome @lu_a_jalla and me, around the loosely defined project of "having fun with this, using 100% F#".
Now that the contest is over, here are a few notes on the experience, focusing on process and tools, and not the modeling aspects – I’ll get back to that in a later post.
This is my first unquestionably positive experience with a dispersed team - every morning I was genuinely looking forward to code check-ins, something I can't say of every experience I have had with remote teams. I recall reading somewhere that there was only one valid reason to work with a dispersed team: when you really want to work with that person, and it is the only way to work together. I tend to agree, and this was tremendously fun. There are not that many opportunities to have meaningful interactions involving both F# and Machine Learning, and I learnt quite a bit in the process, in large part because this was team work.
As a side note, I find it amazing how ridiculously easy it is today to set up a collaborative environment. Set up a GitHub repository, use Skype and Twitter – and you are good to go. The only thing technology hasn’t quite solved yet are these pesky time zones: Minsk and San Francisco are still 11 hours apart. This is were a team of night owls might help…
Whenever there is a deadline, make sure the when and what is clear. Had I followed this simple rule, I would have been on time for the final submission. Instead, I missed it by a couple of hours, because I didn't check what "you have three days left" meant exactly, which is too bad, because otherwise we could have ended up in 27th position, among 160+ competitors:
... which is a result I am pretty proud of, given that this was my first “official” attempt at Machine Learning stuff, and some of the competitors looked pretty qualified. During the initial phase, we went as high as 10th position, and ended up in 40th position, in the top 25%.