Sramana Mitra: It is coherence of the data and coherence of the search engine´s structure as well that operates on that data.
Kon Leong: Yes. The retention policy has to be coherent. If each title is just throwing away data, it is a meaningless system. These are just some of the differentiations. We could go on – there is a huge list. But those are two of the key differentiators.
SM: Tell us more about what you see in the unstructured data management industry in general. What is the current situation, and where do you see the industry going?
KL: As I mentioned, I got started in data processing back in the 1970s. At that time, enterprises bought everything together from big vendors. They were all proprietary. We even had the so called “bunch” – Unisys, NCR, Honeywell, etc. They had the best-in-breed silos, where enterprises would start with inventory management, accounting, general ledger, purchase orders, bill of materials, etc. That was all they knew, and it was all they had. It was a mess. It was batch processing, and the data was out of date by the time it came in.
Then some company out of Europe leaped across the pond and started taking down accounts one by one and pretty much owned it after some time – that was SAP. They had a more holistic approach, the whole thing on one big platform and more unified. So the space became MRP, which then became ERP. Pretty much the same thing has happened in the last dozen years in the unstructured data side, where silos cropped up to solve particular headaches – compliance, e-discovery, storage problems, records management, etc.
But they were all siloed, and now they are suffering the same heartburn from silos. We represent a vanguard of the movement to unify all of it. If you look at the parallelism, it is uncanny. After the ERP/MRP was done, the next thing was to do something with the data, so business intelligence and data warehousing was born. Now you see the same issue and the same turn of events on the unstructured data side, where more textual analytics has come in. You see the same parallelism happening on the unstructured data side. Number one is the unification on the platform, and number two is the analytics side. I am willing to bet that the next move around the corner is to now do a meld between structured and unstructured. I still have not gotten my arms around the full impact of what happens when you merge these two data universes together. If you call me in about six months, I will have a much better answer. But we have already had requests from larger customers about merging them.
SM: Can you give us an example of a use case where customers are asking to merge structured and unstructured data?
KL: Let me say that the world is still getting their arms around what they can do with the unstructured data. We happen to call it corporate e-memory. This basically allows you to extract information from the unstructured side, but to do it across the entire enterprise.
Here is another differentiator that needs to be emphasized. Most of the solutions out there have search engines that are not up to the task of scouring the entire enterprise. It takes too long, and by the time it gets back, you have forgotten what you were searching for. The capabilities are too limited – maybe one or two conditions, but not 300. That has impact on the kind of searching you do and on the actual processes that result from it, like e-discovery. E-discovery today in based on a sandbox approach. Using the sandbox, because the search engine can’t do more than the sandbox, you go and guess where the relevant data is. This has major limitations, because it is garbage in, garbage out. If you guess at the wrong boxes, all of your results are faulty.
You could do some intelligent sampling – that is what predictive coding was intended to do. You sample one, and if you don’t like it you go back and sample another one. It is a torturous back-and-forth sampling of data for stuff that may not pan out. It is at best a very small view of the larger picture. However, if you have a search engine that can scour the entire enterprise within seconds, it turns all these paradigms on their head. The sandbox approach for e-discovery, for example: Why bother with a sandbox when you can scour the entire beach?