Saed Syad: With big data, even without having any predictive modeling, we can answer some of those questions by looking into the database. What we call targeting or retargeting is some sort of a database query that makes it faster than you do it in memory. Just having the amount of data we have it is not an easy job. We need some sort of abstraction and predictive modeling. What type of predictive modeling can we use? All the traditional predictive modeling goes through three steps: data, modeling, score. It means I have the data, I build a model and I use the modeling to score.
By having a huge amount of data, the question is, “Should we do the sampling here?” It has been proven that when you have all the data, if you can process it, it is much better than to use a sample. We need some of those techniques to support this amount of data, and there are none.
The majority of those techniques we have like decision tree, neural networks, and support vector machines, are iterative techniques. It means you need to run the same data set many times to optimize parameters. If you pass a couple of millions of records, you should spend 24 hours to get a result. The amount of data we have in big data needs a different approach. It is not just the amount of data, but the speed of generating that data. How about the new data? If I have 100 million records, I go to Google iPredict, I upload it and get my model back after five to ten minutes, what should I do with the 50 variables I get back? If I decide to remove one of those variables, should I go to the website again and build another model?
The solutions we have at this stage on the market are limited in the size of the records they can support and on the time you can spend to build a model. The flow of the process is data–model–score. If you change anything on the data, you need to rebuild your model. This is impossible in big data. Other companies came up with having a simple model based on a few variables, and that is all. Those are not really models, it is more of an optimization method based on limited parameters. Last year, when our CEO called me, he explained the situation and I knew this was going to be the killer app. I told many people in this field about the problem we are facing, about the size of the data, and that we need a real-time data mining technology. It should be different and scalable.
Let me explain to you the philosophy behind the real-time learning machines before we did connected real-time data mining to real-time bidding. Even before I started completing my theory on real-time learning machines, I set some basic criteria for that method. Real-time learning should be scalable. Scalability means incremental learning. The data comes in, it should be processed, we should extract knowledge from that data and save that knowledge, not the data. But at the same time, it should be able to do decremental learning – which I call forgetting. For any reason we have a situation where we make a mistake, we need to roll back. So the structure of the real-time machine should support decremental learning.
The third feature is adding new variables on the fly. If in the future, I build a model and I need to add new variables, I shouldn’t go back to the data and build my model, I should do it on the fly. In the traditional way, data score is impossible. Any change in the model means we need to repeat the process. The fourth characteristic is removing a variable on the fly – I call it shrinking. If for any reason we know that we don’t need some of the variables or we can not use it, then we need to remove it. But you shouldn’t have to go back to the data and remove it. The architecture should support this feature of removing variables on the fly. The issue of the scalability means supporting distributed computed. Then the real-time learning should be able to support this. But distributed computing is not enough, because it can be slow.
The last feature is the paraphrasing. This means the structure should support the primary processing. Based on these six main features, I started to design the system. The first thing I learned, amazingly, was about the search engines, back in 1990 to 1995. My goal was beyond just a search engine, which is frequency based predictive modeling. Only frequency-based modeling can not support many of the problems we have, for example, the classification or the regression method – it is not just about the frequency.