Sramana Mitra: What have you had to do technically to service this use case?
Shan Connolly: In many cases, the enterprise Hadoop platform acts as the landing ground for all those different sources of data. In many cases, if the information is already in existing systems that are powering the business, those systems are not impacted because you can view Hadoop as a catch basin where it will not only flow through the regular systems, but a copy of the data will be placed in the Hadoop cluster. So you are able to collect all the relevant information from the various ways a customer may interact with you.
Once you have all that information in one place, you are then able to clean and aggregate it and begin to join it up, and maybe even de-anonymize some of it – the data might be coming from anonymous sources, but you might be able to work out the IP address in reverse to where you are able to understand demographically where that shopper is coming from. But in the best case you might be able to map it back to a specific customer or a specific account. All that cleansing and aggregation work typically happens within the Hadoop platform. The results may then be packaged and shared into downstream data warehouses or other traditional systems the enterprise may have, where they might do more reporting and get it in front of more end users.
The use case may vary. But the point is that Hadoop is this giant catch basin that is able to take any format of data and begin to enable you to infer information.
SM: You are talking about Hadoop in general. What role does Hortonworks play in that ecosystem?
SC: We deliver the enterprise Hadoop platform that is easy to deploy – the data platform where you are going to land that information and interact with it, being able to figure out the path through your website that generates 80% hit rate on the commerce you might be driving. We enable our customers to deploy the platform easily, and we assist them on how to get the right data feed into the platform and to integrate it with downstream systems. Then we educate them in how to build applications on top of that. We are not an application vendor. We have plenty of those in our ecosystem, so we might point customers to system integrator partners to help them build a solution. Or we might point them to partners whose technologies make it easier to visualize and figure out customers paths through their website.
The point there is that each customer is different. At the end of the day, Hadoop serves a general purpose for all and gives a very cost-effective and highly scalable platform to do all that work. Our role is to deliver that software, support customers in its use, and bring in the right partners to help them build their solutions on top of that platform.
SM: Basically there is the raw Hadoop that is available in the open source domain, and then Hortonworks provides a lot of the processing, streamlining, and infrastructure around that domain, and that is what makes it enterprise class. That is what your customers pay you for. You in turn help them build applications on top of that even though you are not an application developer yourself.
SC: Exactly. Thirty percent of our business is training. We are also training boutique system integrators or global system integrators on how to properly build solutions on top of Hadoop. Our role is that of an educator of the market as well as an enabler.