Sramana Mitra: The clustering itself is dynamic? Are you coming up with algorithmic clusters?
Omer Artun: That is right. The platform we use runs on a Hadoop-based framework, and it is cascading frameworking. It also has algebraic contruction tecniques (ART) integrated. This way, we can run these algorithms in a scalable, multi-tenant way. Basically, each customer gets its own clusters. Or each customer gets its own propensity model. The whole process of pattern recognition, feature generation, feature selection, classifier design, and system design is implemented into cascading in a fully configurable way. Out of the box the marketer gets a bunch of things to use automatically. For example, if they want to predict people with green eyes, we can do that, too.
SM: When you start with one data body to work with – how many clusters do you start with, and how does that evolve? I suspect that the number of clusters is evolving, or does it stabilize at some point?
OA: There are so many tricks when it comes to clustering. You try a different number of clusters and test them for stability. You split two data sets into clusters and then see if the clusters you come up with are similar. If they are not similar, you don’t have a stable clustering scheme. You also look at the cluster’s size. From a practical perspective, you don’t want to have clusters that vary too much in size. You don’t want some clusters to be 30% or 60% and other clusters only 1% or 2%, so you put constrains on what the clustering output should be.
There are also other methods of inner-cluster distance and intra-cluster distance metrics. You basically have to figure out what the optimal numbers of clusters should be, and the system picks the best one. There is an art to it, but that art can be programmed into it. The order of manageable clusters is usually between 5 and 15. If you have fewer, you don’t have a lot of personae; if you have more than 15, it is just not manageable. We will get the answer within that boundary.
SM: I have to believe that in the types of clustering you talked about, for example, the businesswoman buying shoes for herself and then also for her kids, there are far more than 15 combinations of clusters, aren’t there?
OA: Not in a statistical sense. This doesn’t mean we take all the products customers buy and create combinations with that. What we do is create combinations of people in a meaningful way. This doesn’t mean they can’t buy other products. Basically, we are coming up with combinations that are similar to each other.
SM: That is interesting, because I did a startup in the late 1990s. Our entire premise at that time was to create a personal store. We were working in fashion retail as e-commerce. The idea was to cluster the product catalog and also to cluster customers based on eye color, hair color, body type, style preferences, etc., and create personalized stores for each type of customer. What I hear in your description is the possibility to do that to a certain degree, is that right?
OA: Yes. As I mentioned before, you don’t have to do one type of clustering. The way you choose, depending on the problem you are trying to solve, we would do a product-based clustering by looking at the products you buy or the categories you buy from. But then I might have intrinsic demographic variables that I might be doing clustering on as well. Those are the different layers of labels. I might have an 8-cluster solution for product needs. Then I might have a 6-cluster solution for my demographic behavior and a 12-cluster solution for behavioral variables like how much was bought, what was spent, when was the last time I visited, etc. Now you can combine all those options. That’s called micro-segmentation: you can target people who are frequently interested in work shoes and who are in the upper end of the market.
SM: When you show a store online, are you taking all of those settings into account in order to come up with a dynamic store, or is it still something static?
OA: All of these clusters and customers’ cluster IDs are available through our API. This way the IT department can program a business rule that determines what the customer sees when they log in based on values like high end of the market, low average order value, interested in business shoes, etc.
SM: And any chronic content system is able to interface with your API in order to pull that kind of dynamic merchandising information?
OA: Yes. It is available through our rest API.