Forget trying to bring all of your data sets together into a new wardrobe of information. Instead, a new generation of data blending tools can give you a fast-fashion update that answers your real-time insight needs. David Reed takes a front row seat to find out how.
Fashion is not something usually associated with the realms of data integration or ETL. This most back-room of activities within the data industry attracts little attention from those focused on the task of using that data to solve business problems.
At least, that was the case until several years ago when a raft of new applications emerged to take this relatively stagnant market by storm. Alteryx, Datawatch, Pentaho, Trifacta and others have moved into the space opened up by the demand to put advanced analytics and business intelligence into the hands of business users.
Achieving that goal of data democratisation so managers can get business intelligence in near real-time was unlikely to be possible within the context of conventional data warehousing and BI tools. With the expansion of data sources and their velocity, there is little tolerance for the cycle times to which ETL historically worked, but greater acceptance of an incomplete picture as long as it is provided when and where it is needed.
After all, how many analysts want to spend 80 per cent of their time on data preparation when a data blending or data wrangling tool (pick your preferred term) can get them to a 80 per cent complete view? Sashaying down the catwalk behind data visualisation tools like Tableau and Qlik, Alteryx was one of the first to win acclaim in this newly-fashionable world.
“There is real momentum in the market for our positioning as a self-service tool for data analysts,” notes Stuart Wilson, managing director at Alteryx. “Most people when they buy into Alteryx first of all want its data preparation and blending abilities and that continues to be the case. That momentum is really picking up from the whole self-service proposition.”
Before its arrival, analysts who wanted to support downstream demand from business users for BI found themselves squeezed between those customers’ short-term patience and the IT department’s lengthier schedule. “They are now being fed the information they need and we are seeing a much wider acceptance that people need access to data. That is not new for us, of course, but it is still gaining wider adoption,” says Wilson.
For this to happen, IT departments have needed to be less protective about the calls being made on their systems by these new solutions, recognising instead that they can be viewed as a service provider by supporting the necessary APIs. Given the scale of the big data challenges they are faced with across the enterprise, one of the reasons for their acceptance is the fact that they would not be able to match and integrated those sources within their existing processes.
“If they try to control that, they will be unsuccessful. This is a situation which is only going in one direction because of those data volumes and velocities,” says Wilson. Experian Marketing Services is an example of an Alteryx user which has deployed the solution across its wide range of data sets in order to reduce the processing time required to create client-ready outputs. Compared to extracting reports from its legacy system with terabytes of data ranging in scale from 2 million to 28 million each, it achieved a 55 per cent reduction in time-to-insight.
Wilson is clear that data blending might be fashionable for now, but the over-riding trend is for the new style of predictive analytics. “When people start with Alteryx, it is with an eye on that, but they have to start with data preparation because you can’t do advanced analytics if your data is not right. Even for smaller decisions, executives don’t want to wait one week for an answer while the data is prepared. The organisation may only make that decision once and its impact may be small, so you don’t want to build a permanent process for it, but all those small decisions add up and can give you a competitive advantage,” he says.
If you want to get some idea of just how fashionable data blending or data wrangling has become, then consider the fact that Trifacta has seen ten-fold revenue growth in Europe in the last year, building to more than 500 companies across 35 countries, including Royal Bank of Scotland (RBS), Luxembourg Stock Exchange, UniCredit and Sanofi. Like Alteryx, it has been tailcoating the trend for data visualisations in BI and also partners with Tableau.
Originating out of Stanford University at the start of the big data explosion four years ago, its growth maps closely to that transformation in data sources. “The big data paradigm has accelerated our growth with a lot of disparate types of data to put together and use,” says Jeremy Perlman, EMEA sales VP at Trifacta.
“Our solution takes data preparation out of the domain of IT with its technical processes. If you believe data will transform companies, they can not rely on a small number of IT people to serve data into the business. They need to pull out what the business needs into a self-service environment and reap the rewards of big data,” says Perlman.
Further evidence that data blending is in fashion can be found in the way data practitioners are looking for solutions on the technology catwalk. “We first met Trifacta through our innovations team on the ground in Silicon Valley and were impressed from the outset with their ability to quickly derive value from diverse, unstructured data sets,” said Christian Nelissen, head of data and analytics at the Royal Bank of Scotland.
“When we took a closer look and evaluated them against a range of large and small competitors, they stood out for their commitment to simplifying complex processes, something which is now really helping us to deliver great solutions for our customers. As such, we’re excited about the benefits we’ll see and are on our way to building a world-class data capability that will help us better understand and better serve our customers.”
One point that needs to be born in mind and which Perlman emphasises is that data wrangling software “is not competing with classic ETL.” The catwalk it occupies only goes in one direction, taking data from whatever rail it is hanging on backstage and putting it together as a new look for the front-of-house audience involved (principally analysts or line of business managers).
What it does not do is transform the underlying data sources and create a new, integrated and unified data set which is held permanently in a warehouse. This is not a fix for the multi-dimensional challenges of managing big data. But it is definitely a solution for how to make managers look better.
* Definition: Perfect, on point, A+, flawless (Source - urbandictionary.com)