Sramana Mitra: If you were to distinguish between what is free and what is paid in the product part, can you give some examples of capabilities that are paid?
Shaun Connolly: Whether we are doing operational tooling, configuration, provisioning and management, those capabilities manifest in an open source project called Apache Ambari, which is an Apache Software Foundation project. The point is that we don’t have any commercially licensed software. Why people come to us is because we have the engineers working in the various open source projects. We work with our partners and customers to address enterprise needs – whether those are performance, scalability, or manageability. We help mature those projects in ways that are useful and beneficial to mainstream enterprise use cases. We aim to make sure it will integrate well with the developer or be able to be deployed on a wide range of platforms.
We create a certified distribution that is free to download and use, but at the end of the day, our customers pay us for support patches, updates as well as being a partner with them, listening to their requirements, and driving the vision of the platform forward in a way that addresses their needs.
SM: Is that a subscription business model?
SC: Yes. We sell an annual software support subscription for Hadoop clusters.
SM: So, basically your subscriptions are 70% of your revenue, and then you have training and additional consulting services.
SM: I would like to double click down into three use cases that you think are particularly insightful and that have interesting discussions of thought leadership.
SC: As we look at mainstream enterprises, we see two main streams of adoption. One is around viewing big data processing from the perspective of unlocking a new opportunity, and the other is deploying at scale and viewing Hadoop as a shared data service.
It is around the efficiency of big data processing where people create large data lakes. The majority of customers we work with – about 70% – start with an opportunity-driven use case. Those opportunity-driven use cases typically start around some of the new forms of data that are available out there, whether that is click stream data, web server or server log data or sensor data – whether it is coming off tractors, cars, aircraft engines, or ATM machines. Some of the other forms are geolocation data coming from mobile devices as well as video, audio, and files and unstructured data. But usually use cases are driven by one, two, or three of those.
In the classic retail case, we have customers who are more traditional brick-and-mortar retailers, but also have a web presence. The common use case would be to have not only a platform where you are able to bring in the click stream data that might be admitted from web access, but also the data they might have in their traditional systems and be able to unify not only interactions but also the a customer’s purchase history over a lifetime from a particular vendor. This is commonly referred to as the customer golden record or the 360-degree view of the customer. That use case definitely doesn’t [show up only] in retail, but also across industries like commercial banking, where there is always a web way of transacting business. Click stream analytics as well as existing data about a particular customer is definitely one of the common use cases.