Sramana Mitra: These are good examples of taking data and doing something interesting with it to produce actionable business value but it didn’t really cover multi-modal data. You say it was one of the issues that you’re trying to illustrate. Can we take a use case where there is actually multi-modal data involved?
Naveen Sharma: In this particular case, the multi-modality came from call center data. For example, I talked about call center audio files or recorded conversations. Also, meta data is captured as structured data so we had both kinds of data.
Sramana Mitra: Are you actually analyzing the recorded audio files?
Naveen Sharma: Yes.
Sramana Mitra: How do you do that? Do you actually do analysis in speech or do you do a speech-to-text and then do analysis on the text?
Naveen Sharma: We do both. We do speech-to-text. We crowd source and we also have individual agents who would listen in and capture the conversation and context.
Sramana Mitra: In terms of the state of the art on speech-to-text – and this is not just audio to text, it’s also video to text – there’s no real way of doing video processing really well in these circumstances, so we are facing roughly the same problem in video transcription as well. Would it be fair to say that that’s an area that doesn’t have very good technology yet?
Naveen Sharma: I think that’s fair to say. This is something that our research centers are working on. We have a whole team of imaging scientists and data scientists that are looking at a lot of applications that actually do face recognition and video and can determine sentiment. As I mentioned earlier, we also have innovation that can better determine sarcasm and sentiment in social media context as well.
Sramana Mitra: This is much more straightforward. Face recognition is a much more complex problem. This is a much simpler problem of doing speech-to-text. I’m just amazed that this problem has been around for so long – transcribing audio or video files into text so that you can actually run natural language-based analysis on it. It’s a very straightforward use case and it’s astonishing to me that the world does not have a good solution for this yet.
Naveen Sharma: I agree with you. There are elements of the solution out there but we still have a lot of manual steps there to fill in these steps that aren’t automated. We’re also applying similar video understanding, for a lack of a better word, in other domains too like in retail. We have a growing segment of big data technologies that we are trying to bring into retail segment. We have large customers who work with us and their question is, “Can we try to find out if we could actually use video understanding to help our end consumers?” Are they happy? Are they not happy? Satisfied or dissatisfied? We’re looking at not just the pure data, but also the emotional content of a particular consumer and seeing if that can also become part of our modeling.