“Organisations have this vision of what they want to do with data, which is increasingly to combine it with other external sources to get a bigger picture for their marketing, manufacturing, analysis and portfolio management, as well as risk, regulatory and pricing functions. Customer data is still the largest domain of what we see from a data quality perspective, but the objective is evolving from the traditional idea of the golden record and single customer view towards bringing in those other sources.”
So said Ed Wrazen,VP product management, big data, Trillium Software, in an interview with DataIQ shortly after the company’s customer conference in Stuttgart. Hosted by Porsche at its state of the art headquarters and museum, it was a fitting venue to underline Trillium’s theme of continual investment and innovation in technology.
In February, the vendor launched Trillium Refine, a big data preparation solution that combines its deep heritage in data quality management with the new Hadoop and Spark-based environment in which data is being accessed, prepared, improved and linked, all wrapped in a data governance layer to support close monitoring of access and usage.
“A lot of those queries and the data sources they draw on are not structured, so they don’t fit the data model that exists within the data warehouse,” noted Wrazen. “There is also the increasing demand for complex data visualisation which is more dynamic and ad-hoc. So what we are seeing is the analytics and data science organisation wanting access to information in more complex, quicker and more dynamic ways than ever before.”
Decision makers are also increasingly looking to deploy in a more dynamic way using cloud, software-as-a-service or web-based solutions. Said Wrazen: “Different companies have different needs, but increasingly companies do not have the time to invest in a permanent infrastructure - they need an approach that can deliver what they need more flexibly, and that is easier to achieve through a cloud-based service.”
Trillium Refine was developed in part based around feedback from the company’s quarterly BI/analyst user forum. They are struggling with data preparation issues in big data - access, consolidation, cleansing, integration - before they get on to the work they want to do, such as segmentation or predictive modelling. With its launch last quarter, Trillium gives them the ability to see data sources, search and pull data across within a common environment.
“Users told us they often can’t access data when they want it, they rely on IT for extracts. When they do get it, it might be missing key variables or not in the right format. So users have to spend a lot of time getting their data fit before they use it. So we are helping to put data into the hands of users who can use web-based tools to validate, cleanse and integrate it,” he said.
Trillium’s users can now pull in data from Twitter or Facebook straight to the Refine platform and parse it out into a format that can be used and analysed. That would be very difficult using conventional ETL tools. At the same time, the solution generates dashboard reports on these data flows which give an insight into compliance and governance.
“Chief data officers have been pushing for monitoring and control of access with privileges to view and use, as well as ensuring that users have complete and trusted sources,” explained Wrazen. This also helps to fix another problem that poor data quality or inconsistent integration can cause. “From the business side, they have lacked trust in data. When they get a report and compare it to an output from the data warehouse, they don’t correlate. We have seen that many times with self-service analytics - the numbers don’t stack up, they lack reliability and accuracy, so users lose trust.”
Trillium has partnerships with Qlik and Tableau to support data quality processes behind those data visualisation tools. Refine is also built to store, run and process all the integration and quality procedures on Hadoop environments like Cloudera and Hortonworks.
Despite responding to the demands of lines of business, analysts and business intelligence functions for better data preparation in big data environments, Wrazen believes the case for this new view may have been over-stated. “Big data has been slower than most analysts predicted. Organisations have huge investments in their IT and some are still not convinced of the benefit of moving off their existing infrastructure into a new environment like Hadoop. Often, it is because they do not have the hardcore knowledge of how to operate that in a large-scale system,” he said.
Any solution aimed at supporting fit-for-purpose data - whatever that purpose may be - needs to be flexible enough to cope with both the emerging big data sources and legacy systems. As Wrazen points out: “Companies have still got their traditional management information drawing from the data warehouse working to pre-defined data specifications. Those are run and supported by the IT department with ETL bringing in data feeds in a very static way.“
Trillium has responded to where the cutting edge of analytics and insight is now leading business, but it has a careful eye on its existing core business. Wrazen noted: “We see big data as just another platform. It has great potential, but not for everybody.”