Sramana Mitra: What big data applications do you see yourselves being stuck in to? Certain big data applications or big data genres have a certain text analytics component to them.
Seth Redmore: So you have a text. That is the first bit. But there is stuff that can be turned into text. That is what is interesting. Clients are asking, “How do I do media monitoring, and how do I take what people are saying on Twitter and be able to figure out if that has some sort of predictive implications for my lines of business.” That is one of the things that people are working on right now.
A big part of that is understanding intent. That is a big one – the continuous use of social media and understanding where it has predictive value and where it doesn’t. We are also starting to see a lot more with e-discovery. E-discovery was one of the earlier areas to adopt rigorous search stuff. But moving into text analysis has been problematic, because it is harder to explain to a jury. Search is easy to explain to a jury, relatively speaking. Text analysis involves a lot more of a knowledge base and probabilities, etc. That becomes much more difficult. But we are starting to see a lot of movement there. Those problems are interesting because you have a dump of data that you have to deal with very quickly.
In terms of turning things into text, one of the things we are starting to see is video. Say you have a text analysis engine that is watching closed captioning of a sports event. We know you are watching the video, but we are able to tag a certain event. For example, 31 minutes and 22 seconds is when the second goal against South Africa occurred. We know that because of what people are saying and we know who scored it, because it was said and put into the closed captioning. There is a lot of stuff that is being converted into text, so you can use text analysis on this material.
SM: If you look at your OEM customers, what are some big data companies that are doing big data applications but are using your technology in their systems?
SR: I would say Oracle and Endeca are doing stuff that is fairly big data. If you are a BI, there are a couple of things you can do. If you look at text analysis, you can add to the analytical part, and you can add to information retrieval as well. I can consider them both big data problems. It is as interesting to find one piece of useful information as it is to understanding what is happening everywhere. DataSift is dealing with thousands of pieces of content per second. ProQuest is one of our customers, and they have newspapers back to the 19th century as well as a ton of other stuff coming in. Bitly is also one of them. Angus and MicroStrategy are as well. Angus is very much on the predictive analytics side, and MicroStrategy is a straight-up BI vendor. It is just a matter of scale: “How much stuff do you want and how much stuff is it before you consider it big data?” People consider big data to be a separate thing.
SM: It is not, but it is somewhat special when there is learning involved. I think part of the differentiators between old school BI an big data is if you want to do data on a large scale, you are going to need to use machine learning. That is the definition I use. The following question I was going to ask you is, what kind of learning systems are you plugging in to?
SR: I find that definition interesting, but I am unconvinced right now. My belief is that there are problem classes large and small that lend themselves to machine learning, and larger data sets are certainly better for machine learning, but I’m not sure that I would go so far as to say “requires.” Or perhaps that is your point, which is that you are drawing the line at saying, “To be called big data, you must require machine learning in order to solve the relevant problems.” We use machine learning ourselves. Our algorithms are trained – they are supervised learning systems and customers can also train it themselves. If you look at what Angus is doing in particular, they take the results of our models and plug it in to their own models and say, “Here is what this predicts.”