Sramana Mitra: What did you do with that compensation? Did you want to make it available to everybody? When did it get started with Kaggle?
Jeremy Howard: Kaggle was started by [CEO] Anthony Goldbloom. I was the next guy involved. Anthony had been working with the Economist magazine and big data stories. Then he realized there was a huge demand for analytics in organizations, but also a lot of money involved in creating big data infrastructure. Very few people were getting much value out of this. He had a hypothesis that competitions would be a good way to engage the communities and engage the best people to solve these problems. As it turns out, his hypothesis was correct.
In each of the hundreds of competitions we run, and in every single school, academic and industry benchmarks have been surpassed. In every case the algorithm that is being developed is the best algorithm of its type in the world so far. I got involved about five months after Anthony got it up and running. I originally got involved as a competitor on the side. I was competing in these competitions myself.
SM: When was that?
JH: That was toward the end of 2010. A lot of people were talking about and spending money on software around data – things like Hadoop and data warehousing. That itself is not a particularly new thing.
SM: The companies that come out today as big data companies actually have been around for a long time working in the big data area. It was just that they didn’t have this big trend in their favor – they never got any TLC from the media.
JH: There have been other times where that happened. More than 20 years ago I was working on large database analytics in banking, with neural networks on large data warehouses. That was more than 20 years ago, and there was certainly quite a bit of press at that time for things like business intelligence and neural networks.
SM: Then there are distractions. Social media happens, software as a service, cloud computing, etc. Our industry operates in buzzwords.
JH: It does operate in buzzwords. We had business intelligence, we had dashboards, etc. The big difference now is that many more companies are collecting data because they are doing a lot of business online. So, the data collection happens much more naturally. A lot of that stuff used to be locked up, or in little individual spreadsheets.
SM: That is a good gateway into the next question I was going to ask you. When you guys started Kaggle, does that mean you provided some of that data infrastructure on which the people who were competing to develop the algorithms would work, or was the company providing the infrastructure?
JH: Not at all. Maybe the most exciting thing about analytics today is that everything you need to build effective models on large organizations fits onto your laptop computer and is available free. By far the most commonly used software among Kaggle competition winners is software called R. R is open source software available free. We let all of the participants in Kaggle competitions use their own hardware and software. We find that to be much more productive.