Sramana Mitra: Who are these people? What is their motivation to participate in this contest? Of course, this is all very time consuming. Thousands of people can be doing this on their own time. There must be something going on in the psychology that you have studied and understood. What is that?
Jeremy Howard: It is something for me to emphasize because I am one of those people. Basically, data science is a creative area of expertise. It is like a game or a puzzle, and there is also a community. It is interesting and enjoyable. In general, people who work with data and code are very passionate about what they are doing. The problem is that for a lot of people, their actual day-to-day work is not quite as interesting as they would like, or perhaps their unique insights are not being listened to as they would like. So, the competitions have given people a lot; they have given them interesting data sets, interesting problems and perhaps the most exciting thing, which is the ability to be judged through pure meritocracy. You will be judged by how many predictions you got right and not based on what school you went to or what somebody else says about you.
You can see that from my background. Academically, it would have been hard for me to convince anyone in the industry that I had anything useful to say about data science and machine learning, but becoming Kaggle’s number one ranked participant is like winning tennis matches – people can look at that and say, “Jeremy knows what he is doing.” That was a good way for me to test myself but also to take a huge leap up in the industry.
SM: This must have spawned a lot of interesting startups. What has been your experience in the community so far?
JH: We recently launched a startup program. We allow people to submit their requests for Kaggle to run a competition for their startup. For organizations that clear a few hurdles, we will actually waive 100% of their fees and run it for free. All they have to do is put up the prize money. What we are looking for are startups that have a clear understanding of what they are trying to do and put together an interesting data set. At the moment, one is trying to understand the resale value of industrial equipment. There is a competition running which has a lot of data about industrial equipment and auction sales. That startup will have a unique algorithm soon which will be used to power its business.
My favorite one that has finished is a company called Jetpac. Jetpac’s CTO is actually a well-known guy in the data science community. He is the guy who built the data science toolkit, and there are also data science books. His name is Pete Warden. He probably understands better than anybody that the focus on Kaggle [is to find] the best machine learning people in the world. When they wanted to build an algorithm to assess the quality of [a project], they got the Kaggle community to build that through a competition. That algorithm then became the core IT of their business. Then they went on to a pretty big series A, and now they have their software up and running and it is going very well.
SM: You are a for-profit business. Is that correct?