Dr. Saed Sayad is the chief data scientist at AdTheorent, a platform that provides real-time analytics for advertisers. Saed is an adjunct professor at the University of Toronto and has more than 20 years of experience in data mining and statistics. In this interview he gives us detailed and technical information about the process of real-time analytics and how it is applied to website advertising. He also addresses the three challenges of this process: speed, accuracy, and scalability.
Sramana Mitra: Saed, let’s start with introducing our audience to you, your personal background, as well as the company AdTheorent.
Saed Sayad: I am a chief data scientist at AdTheorent. I have been in the AI field – machine learning and data mining – for the last 20 years. My main area of research has been around real-time data mining because I have always been concerned about the amount of data and the speed of generating that data. I had the sense of the urgency to change the way we process data and build models in the predictive modeling field. I wrote two books – one is an online book and is an introduction to data mining, and the other is a book on real-time data mining, which is a summary of my research in this field.
I also teach at the University of Toronto. Starting 2012, I received a call from the CEO of AdTheorent about the opportunity regarding the data mining project in a real-time bidding environment. Real-time bidding is a young industry. Take online marketing and online ads, a search-based ad, for example. We buy a series of those ads and based on [keywords] and based on on the search match – if there is a match between my requested keywords and the searched keywords, and based on the cost of the ad they can show the related creator. It is some sort of the contract-based pre-process system.
But in real time, it is completely different. When somebody clicks a link to the New York Times, for example, the time between clicking on that link and redirecting that link to the actual website, all the information related to different placements on the New York Times page is on a stock market type of bidding, whereas on the ad exchange it is different. That ad exchange sends all information related to that placement and asks for all the bidders, [or] demand side platforms.
AdTheorent is a DSP. The different DSP agents bid on that one impression. That means each impression goes through that bidding process. When we receive the request for a bid coming from the ad exchange, we have between 100 milliseconds to a maximum of 200 milliseconds to process that request, match it to our campaigns, and if we find a match, put a bid price on it, and send it back to the ad exchange. If we win, we get a response back, and they send the creative [team] in to work on that placement on the New York Times page.
Every day there are billion bidding processes going on in these ad exchanges, and it is growing. For us, on the predictive modeling side, it is about big data. When you have billions of records every day, the question is going to be all about processing this huge amount of data, extracting it, loading it, and then trying to build a predictive model that helps us to see the value of incoming requests and see if there is a higher propensity of click-through for these requests based on the matching we have with our campaigns.
If it has a threshold, we really bid on that request. When we receive an incoming request, we match it with our campaign criteria – gender, age, etc. But on the second step, we score the request based on the predictive models we have. If the score passes the threshold, we are going to bid, because there is some value and there is some ROI. The second step is thinking about how much we need to bid on that request. The challenge here is not only the amount of data we need to process, but building a model. It is not just about having an equation or rule set, which can explain the behavior of something like click-through, it is also about the speed. You have 100 milliseconds to respond and maybe just 10 milliseconds to dedicate to predictive modeling.