Cloud Player: Interview with Obama for America’s Engineering Director, Dylan Richard: How the President Used the Cloud to Beat Romney, by Gathering Clouds
[Editor's note: We hope you enjoy this special guest interview from Gathering Clouds.]
In honor of President Obama’s second inauguration, we had the great pleasure to sit down for an extended conversation with Dylan Richard, Director of Engineering at Obama for America. Our conversation explored:
Dylan Richard, Director of Engineering at Obama for America
Gathering Clouds: What are your feelings on how the presidential election went? What was the experience like?
Dylan Richard: The campaign was the most amazing experience that I could possibly imagine. There was just a staggering amount of work. I was there for 18 months and towards the end, it was 16- to 20-hour days, seven days a week for the final couple of months. I look at that on paper and think, “Wow. That’s an incredible amount of work.” And my entire team was doing the same. My overall feeling coming out of the election was that it was worth that and much more – the incredible benefit at the end of it is still so staggeringly huge to me.
GC: What are your perspectives on what brought your team together for the campaign?
DR: What brought everybody together, truly, was President Obama. He is an outstanding, genuine, effective leader who is, frankly, unparalleled in our lifetime. The message that he communicates is the only reason that we were able to bring people together to do this because it absolutely would not be worth it if it weren’t for him and for the things that he’s going to do and everything he has accomplished so far. So that, coupled with really interesting tech challenges and an amazing team, is how we brought together more and more people.
GC: What were the tech challenges that were attractive to you? The tech mindset is not one to shy away from a difficult problem; so what was it that motivated you to really dig in beyond just a belief in the overall mission?
DR: There are two really interesting challenges: one of which is obvious and the other which is a bit more obscure. The obvious one is just a question of scale. Putting together applications that are going to be used for a presidential campaign means scale like nothing else. There are very few other conditions where we have traffic spikes like during a campaign. Considering some of the graphs and charts that have been available in the media, there were moments of traffic over time where it was steady and at a good pace. During the last week or two, however, it just exploded, as you can imagine. Exponential growth doesn’t even properly portray what went down. We had some people volunteering that were ex-Twitter engineers and they kept saying, “It’s not a hockey stick. It’s like a right angle …” looking at the traffic spike.
We would go from a pretty solid baseline to more than you have ever imagined, and then that number just continued to increase. So the challenge there, from a technical standpoint, was around being able to build things to cope with that reality, and it was really fun. That challenge in particular was a great recruitment tool in and of itself.
Dylan Richard (center) with Obama for America CTO Harper Reed (left) and Mark Trammell (Photo by Daniel X. O’Neil)
The slightly less obvious one is trying to unify data. There had been efforts starting at the end of 2008 and through the 2010 campaign to build more technology in-house, either at Obama for America or at the Democratic National Committee (DNC), and more tools that unify the data in a way that would provide greater insight through an infrastructure that could facilitate that kind of strategy. There are a couple of vendors that we use that have a great deal of data experience, especially in finding ways to pull it all that together and do a big vendor integration project. That might sound boring, but is actually incredibly complicated and a really fun challenge to do that with vendors and with our own systems. But here’s the key part: we had to do it all while the campaign was going on.
So, our challenge was to essentially re-factor an entire infrastructure to something that is more scalable, more unified, while everything is already in use. We all internally described it as “building an airplane while it’s mid-flight.”
GC: What do you understand about how your competition was approaching this challenge? In the end, the Romney camp’s approach to technology didn’t serve them in the same way. What do you understand about what they did vs. what you did, and where do you think they went wrong?
DR: I honestly don’t know much about what they did in terms of what their strategic decisions were in terms of technology. But I’m sure that they worked incredibly hard, and they put in a lot of time and they got to where they thought they needed to be. I can say that it seems that there wasn’t as much infrastructure being brought in-house in the same way for the Romney campaign.
GC: Contextualize the mindset for your team and the related teams going into this. Why did you choose one technology over another, or one vendor over another? What was the criteria that they had to match specifically with regards to the cloud?
DR: There was not really much question. The shape of our curve means that we essentially can’t use anything but cloud. We could theoretically do a baseline that is not cloud and use cloud for spikes and do some sort of hybrid arrangement. But the kind of scale that we’re talking about, there really weren’t many options, other than Amazon Web Services (AWS), who have been amazing. So, the choice was actually fairly self-evident, in the end.
GC: Why AWS?
DR: AWS fits our curves in many ways. They provide the ability to have a small footprint when we’re small and to grow to an enormous footprint very quickly – to be fair, we had an enormous footprint. I think that there was an infrastructure diagram that was up at the AWS re:Invent conference that showed an idea of the scale that we were talking about. So, the ability to grow that big is one of the chief benefits Amazon offers.
The other thing it provided was the ability to scale our costs. Eighteen months ago, people were not as interested in giving to the Obama campaign as they were three months ago. The nature of the campaign was one of building up to an event that has a curve for not just how much traffic was flowing to the website, but for really everything that we were doing to achieve that crescendo at the end. So being able to have our costs stay low at the beginning and only pay for what we needed was an enormous benefit, especially [on] a budget based in contributions.
GC: Beyond that, the benefits you’re talking about are really like a poster-child example of the benefits of cloud overall, but what else did the platform enable you to do? What other benefits were you able to reap?
DR: We were able to automate a great deal of how we do things. To break it down a little bit, our infrastructure had three different environments. We had our production environment, which was in Amazon and was to actual scale. We had a staging environment that was also in Amazon to do all of our testing and validation of builds, etc. but at a smaller scale; and then we had an internal testing environment. We had the ability to automate all of the releases using Puppet and we had our own AppRepo and did releases that way.
GC: How did you account for redundancy at that scale? Amazon has a track record of breaking down from time to time, just like others cloud vendors. You need to plan for failure to a degree. So, at something of that scale, how do you plan your business continuity? How do you plan your disaster recovery?
DR: We were in multiple Availability Zones (AZs) because it’s silly not to be. It’s very easy to be in multiple Availability Zones and paying for two servers as your minimum instead of one. It’s a tiny price to pay for that redundancy. Towards the end of the campaign, we were split across three AZs, all in U.S. East. We had it, on an infrastructure level, set up so that if any of those AZs goes out, we would channel over to the others, and nobody would notice because everything was functioning as it was supposed to. If all of U.S. East were to go out – which would be catastrophic on many levels, not just for us but for the Internet in general – we essentially set up a warm fail-over in U.S. West, particularly as we were getting close to Sandy, as the storm was coming through a week before Election Day. It was “the perfect storm,” in so many ways. Though, if it had knocked us out for two hours, we would have been in a very bad place. So we set up a read-only infrastructure in U.S. West so that if all of U.S. East were to go out, we could very easily switch over to West. That way, should our worst-case scenario come to pass, we could have something up so that all of the people that are working on the campaign (which is, at that point, largely having data to consume and having information that people can use) would be able to do what they need to in support of the effort uninterrupted.
Within each of the applications that we wrote, we had different failover levels – not only for the applications, but for functions within applications so we could define the core features that were needed for the applications to be functional. We had all of the other features on top of that so if we lost a whole bunch of stuff, we would be okay. Let’s say we lost all of our read slaves and we just had the master, and it was only in one AZ because terrible, terrible things have happened at this point (like other AZ failures). We could cut off certain functionalities so that our risk would be exposed to just the most important things at that time. This would give us time to then put all of the application resources in our teams’ hands to be able to make the necessary changes.
President Barack Obama being sworn in for his second term on January 21, 2013.
GC: Did you have incidents that your business continuity plans effectively worked for?
DR: There were a lot of things where we’d had, for instance, an API endpoint that was far more expensive than it needed to be; and this is our own application; and it’s putting far too much pressure on the database for something that we didn’t actually need it to necessarily be using. So the ability to turn off that endpoint was key. If one CPU’s pegged at 100 on the database box, all of a sudden everything is slow or failing and we need to quickly mitigate that, as an example. We did make use of that where needed, but I don’t think that there was any significant downtime of anything.
GC: What other tools were you bringing in to complement AWS, and how were those beneficial to the framing of a successful IT strategy, and for the campaign overall, to achieve your goals?
DR: We work with a couple of external vendors that just do political technology. NGP VAN offers a service called VoteBuilder, which is used extensively, and is for all of the volunteers in the field it is what they use to interact with data – both to input and to consume largely. So walk packets, call lists and that sort of stuff, as well as a significant portion of online stuff from Blue State Digital. They did a large part of our website forms for petitions, etc. and a bunch of our email and fundraising as well. They had been deeply involved in Democratic technology, most notably from the last election. So we had them in place and integrated their solutions along with all of the things that we were building.
So we built a lot of our functional infrastructure and systems between the two organizations and our own internal teams, as well as developing tools to serve our analytics team, which ended up being a rather large Vertica cluster. We had that in-house (actually colocation) at the DNC on physical hardware, but we had that and we had a fallback cluster for that in AWS. So what we tried to do was to build a bridge between everything, and AWS worked really well for that.
This segment is part 1 in the series : How the President Used the Cloud to Beat Romney