Sramana Mitra: What does the competitive landscape look like? Of course there is a lot going on in big data. How do you view the map and the competitive landscape? How do you position yourself in that ecosystem with that competitive landscape?
Billy Bosworth: I have been around databases for a long time. This is my 22nd year in the industry, and since I graduated from college I have either been programming databases, administering databases, or writing tools for databases. I have had the real pleasure of seeing this market change, evolve, and grow in many different ways over the years.
As you said, it is about as complex as you would choose to make it. You could try and create 14 different info graphics to explain all this. I will give you the way I think most businesses today are thinking about the technology. We can put them in three different categories. In category one we have what we have just been talking about. The massive scale, high velocity transactional type of systems. That is us, that is where we live. Another technology that services that segment of the market is Amazon with its DynamoDB, or Oracle with its big data appliance. Those are some of the players in that category.
The next category is what I would consider the volume one. This is the footprint of the traditional data warehouse. The data warehouse has gone from really traditional to the columnar technologies like Netezza or Vertica. Now you see Hadoop. Hadoop is getting better and better at bringing its access times and query times closer to those of those new and older data warehouse technologies. In that market you have Cloudera, probably the most recognized name in the field. You also have some vendors that do a little more proprietary Hadoop, like MapR or Hortonworks. They are also dedicated to the open source side of the house. Then you have the big players like IBM and Microsoft. That category is getting very competitive on its own.
The third category is what I would consider the smaller workload distinctive memory. This would be like your Couchbase technology or MongoDB. I call them beyond MySQL technologies. These are designed to be very simple to use and get up and running. They typically do well with smaller workloads. That is the third category in the big data ecosystem. They get lumped in with big data because they can handle different data types well. Developers have a lot more flexibility with this whole world around what is unfortunately called NoSQL. I think a better term is post-relational or flexible schemes.
SM: Would you put SAP’s HANA technology in that space?
BB: The SAP HANA technology is interesting. It is an in-memory technology, but it is so tightly integrated into their analytics piece that it fits more to a faster data warehouse side of the fence. That is how I see it. I don’t have a clear picture in my mind of the success or propagation of HANA beyond pure SAP customers who are choosing it as a back end. I don’t have visibility into that, so I can’t comment beyond that.
SM: You are basically distinguishing your category and the Hadoop distinguished category primarily, hence the real-time vs. batch processing. Is that correct?
BB: That is correct. I would call it your real-time vs. your mid to high latency type of activities. Of course, that is all relative. We are talking about seconds. In the data warehouse world they are trying to get their speed down to seconds. In our world we are talking milliseconds.