Sramana Mitra: I would like to take three of your customers where this is happening. Let’s talk through three use cases and customer scenarios.
Todd Goldman: There are two major scenarios that we are seeing of how people are using Hadoop. One is that over time, companies have bought traditional data warehouse appliances. They have taken all their data and shoved it into those data warehouse appliances. This was before Hadoop even existed. The problem they ran into was a cost containment problem. They kept putting more and more data into their appliances, but the appliance is very expensive. Compared to a Hadoop cluster, it might be 100 times more per terabyte. They realize, “Wow, I have a lot of data that I never actually use sitting in this appliance, but I don’t want to get rid of it, because I use it once in a while. Maybe I should offload that to a Hadoop cluster so I don’t have to buy another $30 million appliance, I can buy a $3 million Hadoop cluster for the same cost.”
There is a cost-saving path they want to take, which is, “Let me move off this expensive appliance, I will keep using it for my business intelligence, but I move a lot of the data to Hadoop and I use that to do my preprocessing – it is much cheaper – and then I will put the output of that into my data warehouse appliance. As a result I can save a lot of money and I can keep that extra data sitting around and that I want to access once in a while.” So, there is a cost saving perspective.
We are seeing that a lot. There is a very large Internet financial services company doing that. They have a $30 million appliance, and they are migrating a significant portion of that data warehouse onto Hadoop, just because there is a 100-fold cost savings.
The other example is the innovation idea: “What difference can I make to my business with technology like Hadoop?” A good example of that is Western Union. They are a 160-year-old company, and they transfer money. One of the things they want to do is take that business online. In order for them to do that, they recognize that they need to be able to do risk analysis on whether, when you give them a check to transfer money someplace, this transfer is going to happen safely and that the check is actually good. They want to determine very quickly if the check is good or not, whether you are good or not, and whether they should transfer the money or not. What they do is you come in, you give them the check, and they determine whether you have a black hat or a white hat, or they are not sure. If they are not sure, a person has to evaluate whether that transfer should happen. So, to some extent they can shrink the amount of gray area, and that is pure profit for a company like them. They are using Hadoop to analyze that long history of money transfers they had been doing to become more accurate at it and to try to move some of that business online. Risk analysis is a huge example in financial services.
SM: Are there any other broad trends of where you are seeing a lot of adoption of this kind of very large-scale data analytics?
TG: The other area we are seeing it is in the medical and pharmaceutical area. Pharmaceutical and medical companies, rather than cutting off the large tail of analytics information, have made and are making assumptions of what might be affecting patients. They now have the ability to analyze all of them. People who are looking for cures for cancer are now taking advantage of the ability to cheaply and cost effectively analyze more data.