This is a terrific conversation on cutting-edge Natural language processing (NLP) with an entrepreneur in Vienna. Read on!
Sramana Mitra: Let’s start by introducing our audience to yourself as well as to Cortical.io.
Francisco Webber: I am the CEO and one of the founders of Cortical.io. It is a startup originating from Vienna, Austria. We are working in the domain of natural language understanding. We have developed our own technology for rendering text information.
Based on that technology, we are offering a certain number of technologies and technical platforms to the market. We started early as a startup. It might sound like a long time but we started in 2011. Believe me, setting up a company that creates the technology beneath the product takes time.
Sramana Mitra: Do a historical overview of the NLP trajectory. What is the prior arc and where are you innovating? What is the approach that you bring to the table that is unique and differentiated?
Francisco Webber: In this context, my background is from a domain that was previously called information retrieval. It is basically the technology behind the search engines – zipping through a lot of text data. What I have learned and practiced there is what I would summarize under statistical modeling. It starts with easy ways of sampling data that you want to work with.
In the simplest case, it would be counting the words in a document. On one hand, it is fascinating to see that that approach delivers a certain amount quality of those results. By working in that domain, I realized early on that it was not the solution to the problem that we need to compute with the meaning of language data.
In the year 2005, I became aware of the research of a Californian brain researcher Jeff Hawkins. You might know him from the Silicon Valley times. He came up with a computational theory of the human neocortex. That was the first time that I encountered someone who was talking about the brain in theoretical terms. The theory he had built which was an ongoing project was extremely impressive to me because it could explain a lot of brain phenomena that we know of and also where we didn’t know how they could be mapped.
I took that theory and said, “Jeff is right with his theory. What does this mean for language?” That set the proper constraint to the problem to guide me to a solution. This turned out to be a non-statistical way of rendering the representation of the language element. With that fundamental step of finding a way in rendering the information differently, it turns out that many problems that you typically have, like this ambiguous language, for example, are computationally intensive if you need to do this purely based on statistics.
It does become a side effect if you choose the representation of texts in what we call today semantic fingerprints where that becomes a trivial operation to discriminate between those two cases. With that discovery, we were lucky enough to get some research funding in Austria to build a prototype with commercial intentions behind it. Within half a year, we could demonstrate that many traditional statistical modeling features were available on that platform too but in an easy and efficient way in the implementation.
It also had an astonishing degree of resistance to noise. We had all these features that you would normally expect in the brain. Mixing data and noise in the data is close to no problem for the way that we work with the data compared to old statistical systems.