Sramana Mitra: Is there another use case you would like to discuss?
David Gibson: The one thing we talked about is the archiving and migration. All these platforms are a bit different in terms of how they are set up. SharePoint has SharePoint groups, Windows has NTFS permissions and share permissions, etc. What we are able to do with our framework is distill it down to user groups and data and what people can do with those. So for an IT person, you don’t necessarily need a subject matter expert to manage access anymore. That is what our execution engine does with the metadata.
Now we are also able to move data around between platforms. Most companies go through at least one migration per year. Sometimes you are consolidating a lot of distributed devices into a few large ones. Sometimes you are moving from FileShare to SharePoint or vice versa. Sometimes you go from one domain to another. All of these require a lot of manual effort. It is also very hard to move a live data set. You can’t tell everybody to stop working for a week. What our data transport engine does is allow you to move data from one platform to another or one domain to another, and we will rewrite the permissions or even make them better on the way there. So, you have a metadata layer now to move data around between platforms, to abstract some of the complexity, to see where your critical data is, where it is exposed, where to reduce access safely, etc.
We can also simulate changes before you make them. This is a big deal for the business. We have one large insurance company that came to us because they found a Unix directory which was world readable. This was obviously a problem. They shut down the directory, and they broke an application that was revenue generating. One of the things we find is that IT is very hesitant to change access controls, because they are afraid they might cut off an application or a high-level executive. This is a job-threatening maneuver. Because we have the access activity, we can play “what if.”
The last use case I would like to talk about is “creating your own private cloud.” Eighty percent of organizations don’t allow Dropbox, but one out of every five employees are already using it. End users have gotten used to file sync services and mobile devices. A couple of years ago we said, “We love to share files with this style of collaboration. It is great, because we have a lot of remote users, we have mobile devices, we want to access the data, but we want to do it in a new way.” There wasn’t a product around us to help us do it, so we built one – DatAnywhere. This allows people to sync files, access files from mobile clients and often share with other parties. But the data is often stored on the file servers.
SM: Which are some of the broad trends you are monitoring?
DG: From a governance perspective, one of the trends is that the value of metadata is becoming more apparent. If you take a look at our blog, you can read about how you can start to use metadata to re-identify what was previously thought to be de-identified data. That is something people are realizing now. With metadata and big data analytics, you can put a lot more pieces together than people previously thought. With the case of [Edward] Snowden, metadata has become the forefront, and I think people are starting to realize that just having a record of activity can lead to a lot of very important information and it can be used for a lot of different things.
That is very similar to how we see our audit trails. You have a file – that is one thing. If you have a record of who is accessing the file, there is a whole new dimension of use cases. We see the importance of metadata and also the ability to combine multiple sources of metadata. When I talked about the re-identification or de-identification of data, often you are taking data from one data source and matching it up from data of another data source. There was a Netflix example where they were reviewing an algorithm – it was about predicting which movies people would like based on their preferences. Somebody took that data set, which was de-identified, matched it up with another data set from IMBD (Internet Movie Database), and was able to re-identify a bunch of users from that. We have written about how somebody was able to take medical records and combine it with census data to re-identify some patients. I am seeing that our consciousness is raised about the importance of metadata and the ability to combine multiple metadata streams to yield results that are surprising.