When one of the largest technology businesses in the world chooses to white label your software as part of its cloud platform play - the first time it has ever used an OEM approach rather than engineering a solution for itself - then you have every right to feel gratified. When it follows on from being ranked in first place for your category of software by not one, but three separate firms of tech analysts, then you are more than justified in feeling thrilled, as Trifacta’s CEO Adam Wilson described himself in a phone interview last week.
It followed the news that Google has embedded Trifacta’s data wrangling software under the name Google Cloud Dataprep, while Forrester and two other reports placed it first for self-service data preparation solutions. “It means we are becoming front of mind for people,” said Wilson. “Ten years ago, this was an academic project - the prototype was called Stanford Data Wrangler - but it got 30,000 users in six months. Our investors saw us as a commercial venture, but we have only been selling in the last three years.”
Trifacta’s strength has been the ability to tap into any data source to support data science, predictive analytics, machine learning and artificial intelligence projects, saving data scientists and data analysts massive amounts of time in data preparation. For Google, the solution fits neatly into the eco-system it is building around data management and analytics in the cloud, with additional support for data from DoubleClick, telematics and even Excel, all flowing into BigQuery and out to its own visualisation tool (Data Studio) or other leading applications.
“It can learn from data and how users interact with it - Google loved that idea.”
“People who have data will now be able to do this work without the data preparation bottleneck,” said Wlison. “There are 200,000 data scientists, 600,000 data engineers, but 100 million knowledge workers globally. They are the ones who need data to make better decisions and to solve problems. If they don’t figure out how to do that, they will be over-run with data because there are not enough people to structure, integrate and cleanse that data for them.”
He notes that one reason for Trifacta’s selection by Google is that it doesn’t just support data practitioners who are using machine learning, it has ML built in to automate many of the data wrangling problems users face. “It can learn from data and how users interact with it - Google loved that idea,” said Wilson.
As a result of the 12 months of software engineering involved in getting Trifacta ready to launch as Google Cloud Dataprep, the vendor now has its own cloud-based solution to sit alongside on-premise deployments. “A big part of this for us was wanting to be in a position where our solution is ready for whatever decision users make about their infrastructure - on-premise, in Google’s cloud or AWS, in a Hadoop cluster. Trifacta will work with data wherever it is stored.”
Given the forthcoming explosion of internet of things data, that future-proofing is essential both for the vendor and its client base. Google clearly has this in mind to ensure raw IoT data logs can be captured and modelled in TensorFlow. Keeping up with new data types and supporting a changing IT environment is what Trifacta was created to do and its vision has just been significantly validated. Not that things will stop here. As Wilson said: “Data now is not the same as it was five years ago and it will be different again in five years’ time.”
Related articles: How Trifacta built the human-AI interface for data wranglers
Case Study: Wrangling with customer service at RBS