Josh Rogers is the senior vice president of Data Integration Business at Syncsort. Josh holds an M.B.A. from Harvard Business School and a B.A. in economics from Davidson College. He had previously worked for Bank of America, Endeca, and IBM and has more than six years of experience in data management. In this interview he talks about Syncsort’s solutions to provide improved sort capabilities for big data analytics and gives insights into open problems in this space.
Sramana Mitra: Josh, let’s start with setting some context about Syncsort. What do you do as a company? Who are the customers? What is the primary value proposition?
Josh Rogers: Syncsort is a data integration company that was founded in 1968. We were originally founded around mainframe technologies. Our original solution in the marketplace was a sort utility for the mainframe operating system. The reason I start describing the company at that point is because architectural approach that we applied to solving the processing of data at that time is relative to what we do today.
In the company’s original solution, we figured out that sort workload was taking up a huge amount of the processing power for powering mainframe workloads. What we found was to allow customers to analyze the data that they wanted to analyze. There needed to be a way to increase the performance of the sort workload as well as use less CPU time to accomplish the processing requirements of that workload. So, the company set about to define an innovative architecture to meet those two requirements.
SM: What segments of data sets were/are you working with?
JR: Originally we were working with mainframe data sets. It was largely in financial services, healthcare, and telecommunications. Those are still three of the top four verticals that we operate in.
SM: How has that evolved?
JR: What we found over the past 40 years is the ability to do more with less, as people are analyzing larger and larger data sets. We participated in every stage of computing as customers have tried to break those boundaries. Today we run on half the mainframes [we did before], and it has been a very successful business for us. When Windows and UNIX came into the data center in the early 1990s, we brought that same technology into open systems. In the early 2000s we expanded our sort capabilities to create a high-performance ETL technology. We are used in about 2,000 organizations today. Mostly those are large enterprises, but we also deal with a lot of medium-sized businesses that are in information intensive industries, such as data services or e-commerce organizations that want to do a lot of analysis.
SM: What is the products and services ecosystem around you like? Whom do you sit on top of? Whom do you connect with with APIs? What do you need around you for this to work?
JR: We operate largely in the data warehousing space, which is a component of analytics. Generally, our job in the architecture is to grab data from where it sits and move it into these analytic environments. We have partnerships with companies like Teradata, Netezza, Vertica, Greenplum and a lot of the NPP data warehouses that provide large-scale analytic environments. We also have partnerships with people in the systems integration space. The goal is to make sure that we can get data from any source, bring it into the analytic environment, and do that in a performing way.