If you are considering becoming a 1M/1M premium member and would like to join our mailing list to receive ongoing information, please sign up here.

Subscribe to our Feed

Thought Leaders in Big Data: Interview with Seth Redmore, VP of Product Management at Lexalytics (Part 5)

Posted on Wednesday, Mar 20th 2013

Sramana Mitra: What else is interesting in your story?

Seth Redmore: The problems – what doesn’t work well. I personally think that is the most fun part. The classic part of text analysis is humor and sarcasm. It is very hard to tell. For example, someone says, “I love the Apple store.” If that is all they say in the Tweet, then they like the Apple store. But if the previous Tweet was, “I spent three hours waiting in line and the genius was an idiot,” that changes the context of that Tweet.

Understanding the context of communication is important for understanding whether somebody is being sarcastic or not. Sarcasm is still a really tough nut to crack with respect to this technology. Let’s say you are looking at email content. How much do you count the stuff that has been copied or CC’d? When you get a big email thread, there is all kinds of stuff that is below the most recent reply that you ignore. But a machine isn’t sure whether to ignore it, because it may be contextual or not. Those are the kinds of questions that we read through when we are looking at machines. We are still trying to tell what the right heuristics are to tell machines to deal with us.

SM: Are sarcasm and humor really such intractable categories?

SR: I wouldn’t call humor and sarcasm intractable. I would say they haven’t been interacted yet. Continuous evolution of language requires continuous engineering effort. I am convinced, after having watched Twitter, that teenagers are responsible for probably 95% of the evolution of language since the history of language. In some ways, it is accelerated. But the natural thing for people to want to have their own particular dialects is that their friends speak like this, and if you don’t, you are not part of the group. Take the word “sick.” In the context of a hospital, it means something very different from in the context of a video game board. That is something we humans understand. Our system deals with that just fine, but it is the transition when somebody first starts using the word like that. Understanding that and having something that is able to pick up on that and do so quickly, cleanly and efficiently without human intervention, would be a marvelous thing. But as it is right now, you still need to have that learning period and that transition in training of somebody saying, “Yes, this is positive or this is negative.”

SM: What are the challenges of doing this in different languages? What are you encountering?

SR: English has a very rich set of linguistic resources. There is something called WordNet that allows for a lot of different algorithms. It is basically a big thesaurus – a little more sophisticated than that. A lot of other languages don’t have that, but what they have now is Wikipedia, which has become a tremendous resource for building out other languages. Germans, for example, don’t use a lot of spaces. Understanding where there are boundaries inside a Germanic text is difficult because you may have something that is a three- or four-word phrase in English, and it can be slimmed down to a single word in German. It would still be the same idea, but when written it is one big word. Being able to do word boundary detection is difficult.

Chinese has some of the same issues. They do a lot of puns, humor and pinyin [romanization]. You sometimes have to use the sounds of the words to understand what the humor is that they are trying to get across. There are also word order problems. English is a subject-verb-object language. Other languages are subject-object-verb. Some languages are going to be very difficult to deal with, like Japanese, which we are not going to be dealing with because of the enormous amount of cultural subtlety about it. This makes it very difficult to do. Right now we are just trying to figure out what the right next language is for us to do.

SM: That is very interesting. I have always been interested in AI [artificial intelligence] and have used it in previous products I have done, including natural language processing. I think it is a contemporary issue because there is so much text that is being spit out.

SR: It is how people communicate. It is interesting what you say about AI. I think that having good understanding of human conversation is a fundamental building block of coming to a true artificial consciousness or intelligence. It is very interesting to be part of reverse engineering language and how you can understand what is being communicated. It is a lot more complicated than networking was from that perspective. Networking has its challenges, but it is an engineer technology.

SM: Thank you for reaching out.

SR: Thank you for your time.

This segment is part 5 in the series : Thought Leaders in Big Data: Interview with Seth Redmore, VP of Product Management at Lexalytics
1 2 3 4 5

Hacker News
() Comments

Featured Videos