Todd Goldman is the vice president and general manager for enterprise data integration at Informatica, a company that provides data integration software and services based on big data analytics. Todd holds a BSEE in engineering from Northwestern University and an MBA in Business from Northwestern University – Kellogg School of Management, having worked for IBM, ScaleMP, and nlyte Software prior to Informatica. In this interview he talks about Informatica’s unique value proposition in the big data space and emphasizes important aspects for entrepreneurs to be aware of when building a startup in this space.
Sramana Mitra: Todd, let’s start with setting some context for our audience, involving your personal background as well as Informatica’s expertise and work in the realm of big data.
Todd Goldman: I have been in high-tech for the past 25 years. I started off in the network management space back when that was new. I started with HP, then hopped around the industry a bit and ended up at America Online, providing online services. When you look back, that was big data in its own way before the term became popular. I had executive roles from Netscape and AOL down to a couple of startups that were around data analytics across large data sets. I was with a company called X Rose, which was acquired by IBM.
Then I joined Informatica about one year ago as vice president and general manager of our data integration and business unit. Informatica is a company that has been around for 20 years. The company has always been focused on data and specifically helping our customers put their information potential to work. What we are really all about is that customers have data in raw form. Think of data as the new oil – which is an interesting analogy, because you can’t really do much with oil in its raw form. If you go up to the Canadian tar sand it is particularly dirty oil. So you have a pipeline at some place and you have to refine it. Then you have to refine it some more and then put it some place where people can actually use it. What Informatica does is we are helping our customers take their raw data, move it to where they need to use it, refine it, combine it, integrate it with other sources of data and turn it into information they can use as intellectual property to differentiate their offering and be competitive in the marketplace.
SM: We cover Informatica regularly. Our audience does know the company on a broader scale. Would you talk to us about what has changed in the realm of big data? Informatica has always been in the data space. What is the impact of big data specifically in your business and what are you doing and seeing?
TG: Informatica has always done big data. The amount of data that amounts to “big” has gotten even bigger. I will take Informatica out of the equation for a second to illustrate what “big” really means. What has changed in recent years is the advent of Hadoop, which has allowed companies to store and process large amounts of data very cheaply. In the past, where you might have had a lot of data and were doing some analytics – let’s say you had some data on clinical trials and you had 1,000 attributes you were studying, a smart person would go and say: “Why can’t I analyze all those attributes? I have 1,000 and I can only analyze 100.” So they would decide which 100 were important, they cut off the long tail and they do their analytics.
Today you don’t have to do that anymore; you can dump it all onto a Hadoop cluster and analyze the whole thing. The amount of data and people’s idea to have to cut off this tail of data to the analytics and to get value, you don’t have to do that anymore – you can use the whole data set, which means you can find new insights. Informatica’s role in all this is that the same problems for data analytics that existed in the past still exist, except that we have moved them to a new technology called Hadoop. Twenty years ago people wrote SQL scripts and hand code and they solved the problems Informatica solved by creating a graphical 4GL and a development environment that was metadata based that allowed people to have lineage and understand where data came from. It made it easy to make changes and as the environment changed, it made it easy to trace problems back because everything was graphical. We have started all over again with those problems. We solved those problems for 20 years. At that time Informatica built its own engine. Then Hadoop came along and everybody said, “Hey, we can do it all on Hadoop.”
What is old is new. The same problems that existed with hand coding back in the day now exists with Hadoop. We have a core engine called PowerCenter that is a virtual data machine and that can connect the data sources, transform data from a set of instructions that a developer creates in our graphical development environment and then move that data someplace else. Now that engine runs on Hadoop as well. Our value proposition to the big data world is that all those people that know that environment are already Hadoop developers and they don’t have to learn Hadoop. The reason that is important is that Hadoop is really hard. It is a distributed data analysis and storage environment. It is not easy. The data scientists that are out there are very hard to find. What we have found is that one person using our development environment is as effective as five people hand coding. In addition, the skill set you need to do data integration using PowerCenter is a lot less than the one you need to hand code in MapReduce.