Sramana Mitra: Maybe pick three client use cases who have these large amounts of data. Let’s double-click down on understanding specifically what that data contains and what kinds of intelligence are you able to derive out of it. What processes and methodologies have you innovated to bring that together?
Dmitri Williams: Let’s start with an academic one and maybe jump into some commercial ones. The first project that we did was on predictive analytics and we were doing this for the intelligence community. They were interested in looking at someone’s online behavior and understanding something about them offline. Those are all declassified stuff that you can read about online. We would take a look at someone playing a video game like EverQuest 2, which is the game that we were looking at most of the time. Then we’d say when someone plays in a particular way, could you tell if they were more likely to be a republican, or female, or to live in a suburb? We started building models that got pretty good.
You can see how the social and computer sides might work together because the social folks would say, ”What variable in these large data sets should we be going after?” Computer scientists are better tailored to address what kind of models we would build. We had this process. We were bouncing back and forth between theory from the social side and observation and analysis from the computer science side. We got very good at making these predictive models but the thing that really made them hum was when we started including social data and we started getting social network data and social network analysis built into those models.
We really started thinking about hyper-graph technology and how interactions and, over time, data analysis could spur things along. Those are the things that got our models to be really powerful. We realized that tracking a person was one thing but when you looked at them and their relationships with others and how they impacted each other, that was really powerful.
Sramana Mitra: What is the value of that? Who is using that data?
Dmitri Williams: The use case for the government agencies I can’t speak about directly, but it wouldn’t be a great leap of faith to see why some agencies would be interested in being able to predict and learn more, especially if the person was a bad actor. We were really working on the mathematical technique side of it and not really speculating on how they would use it as much.
Sramana Mitra: You said you were also working on the commercial domain. To be able to build a business, you do need to understand the use cases of how that data may be used commercially, right?
Dmitri Williams: Of course. This is what led to the tech transfer process. We realized about halfway through that we had developed, whether we had meant to or not, a predictive system for social behavior. In other words, we could look at a social graph and we could say, “When Dmitri does something, Sramana is more or less likely to do it.” In other words, we could develop influence models that were based on behavior. At that time, and I think still is, the state of the art was to look at what people talk about on Twitter. But we had behavioral data, which as any social scientist will tell you, is the gold standard. You never ask people what they think when you know what they do.