Sramana Mitra: Let’s talk about what you are seeing from an industry point of view. What is happening in the industry, and where is it going from where you sit?
Yves de Montcheuil: We are seeing the desire to get value from more and more data. What is now called big data is nothing new. Companies have been amassing vast amounts of information and data for years. The issue is that only the wealthiest of those organizations have been able to extract value from those vast amounts of data. Airlines, for example, have been doing yield management for 30 to 40 years, optimizing the price of every seat sold on an airplane based on historical trends and various other factors. Credit card companies have been doing fraud detection for a long time. Twenty years ago I was already getting calls from American Express asking me if it was me making this charge in this remote location. Unfortunately, it was usually the case.
That [technology] was reserved for those very large companies. What is happening now is a desire for organizations of any size or even for divisions within large organizations to get inside the legs of data that they have been aggregating, growing, and developing over time. That has happened at the same time that new technology enablers for big data have been maturing on the market. Ten years ago companies such as Google or Yahoo started to develop technologies that are now the core of the big data platform, which is primarily Hadoop. Those technologies have matured and became available to everyone through an open source model, which is also the model we are using, so the match is quite interesting here. Those technologies have become more enterprise acceptable thanks to the Hadoop distribution vendors – companies such as Hortonworks or Cloudera, which have secured the distribution, tested it and had system management, automatic setup, and administration tools.
Hadoop is now a technology that is ready for the enterprise. The big challenge that remains with Hadoop is that it is still pretty difficult to use. If you are not a very advanced Java developer with MapReduce expertise or a data scientist, it is extremely complicated to get value from Hadoop. This is what Talend has embarked on for the last two years – to democratize the access to data for everyone. Data is there. The technology to access, process and develop it is there. Talend wants to enable this technology to be usable by organizations of any size, regardless of their level of technology expertise.
SM: What are some of the startups that are working on areas you are talking about?
YM: I just mentioned two of the Hadoop distribution vendors – Hortonworks and Cloudera. Hadoop is a very interesting place to store data and to do some batch processing, but it is also a highly unstructured place to store data. Traditionally you had to choose between doing unstructured data in Hadoop or doing structured data in relational databases such as Oracle or MySQL. There is a new breed of databases that have emerged in the past 18 months called NoSQL databases. There are very interesting vendors in this space as well. There is a company called DataStax, which is a vendor of the Cassandra NoSQL database, which is yet another open source project. There is a company called Neo Technology and their database is called Neo4j, which is a graph NoSQL database. Then there is a company called MongoDB – another NoSQL database vendor. Those ones are really interesting companies to track. They are contributing to make a difference in the big data space.