Privacy and data protection are daunting issues for businesses and consumers alike, bringing together legislation, ethics, operating practices and technology competence. So can data science help to resolve the risk assessment challenges and identify problem areas? Dr Maurice Coyle, chief data scientist at Truata, spoke to DataIQ about how it is tackling the issue.
Maurice Coyle (MC): I would actually say data science and data governance make a perfect team - in our company, at least. Our data governance team decides “what” needs to be done with our data and our customers’ data in order to achieve compliance with data protection regulations, and the data science team decides “how” that is done.
The role of data science in Truata is really interesting because it touches on all areas of the company. We work with privacy, legal and product teams to define what our products need to do, then we build the algorithms and prototypes that form the basis of these products. When preparing prototypes, we work closely with our world-class data engineers to make them production-ready at scale. And we also work alongside our sales team - we participate in customer meetings to help solve their problems.
MC: In short, everyone! Our legal and privacy teams, data science, product, engineering and infrastructure teams, sales, marketing and finance teams - they all play a part in our product development efforts. We have great people in all areas of the company who are top of their respective fields, and when building products like Truata Calibrate, our new privacy risk assessment software solution, there’s a huge amount of collaboration across the different functions. This results in products that not only are we incredibly proud of, but they solve real problems better than anything else that’s available on the market.
MC: Machine learning is certainly a key area of focus for us and we use it alongside techniques drawn from many areas of mathematical, statistical and information theory to create the algorithms and prototypes that form the basis of our products.
A lot of what we do involves de-constructing data processing techniques to understand where they may result in privacy risks. We then produce privacy-enhanced versions of these techniques to enable the processing while minimising privacy risk, or at the very least equipping organisations to make well-informed decisions around the use of their data.
A core part of any of these offerings is the ability objectively to quantify the privacy risks a dataset or analytical output contains. This is where Truata Calibrate comes in, measuring the singling out, linkability and inference risks, while also taking an attack-based view to quantify the likelihood of a breach.
Embedding privacy-by-design and privacy-by-default principles can mean changing how analytics are performed, for example, by allowing data to be processed or analysed without allowing access to the underlying data. As above, this first requires the ability to quantify the privacy risks within the data and outputs, so we use our solution to achieve this.
MC: We seek to automate as much of the process of quantifying and mitigating privacy risks as possible. Truata Calibrate can automatically quantify all sources of privacy risk and make recommendations for privacy-enhancing transformations that can be applied. Human intervention is required to ensure that organisational standards and domain knowledge are applied correctly. We always seek to provide options for full interactivity or automation, so our products can be configured to suit the particular needs of each customer.
MC: Absolutely. We’re seeing a huge shift towards using sophisticated data science methodologies in the field of regtech. In the first phase of this, manual or low-tech solutions were being digitised and made available in centralised locations such as digital data catalogues.
The next phase, which Truata is driving, sees regulatory technology becoming a lot more automated and proactive in supporting the human experts within an organisation to handle the masses of data they control and the increasing number of data projects they’re being asked to review and approve.
There’s also a movement within chief data officer (CDO) circles to move away from being a cost centre and towards a profit centre. By understanding deeply the risks and uses of all data silos within their organisation, the CDO can unlock lucrative analytics opportunities to drive business growth, operational efficiencies and data monetisation. These can all be measured in terms of their impact on a company’s P&L and so the CDO transitions to be more commercially profitable.
MC: We have phenomenal people in our data science, engineering and product functions, all with deep expertise in their respective fields. We also maintain links with academic researchers and keep up-to-date with developments in relevant academic fields. We use state-of-the-art research and commercial developments, and apply our own creativity to develop our novel suite of products. We build them from the ground up using new algorithms, but informed by what’s happening in the market and the relevant experiences of our team.
MC: We provide very detailed risk reports in ways that different stakeholders can understand. We seek to provide high-level risk metrics so that executive-level stakeholders can quickly review the overall risk profile of data analytics programmes. These top-level metrics give way to several levels of drill-down, where technical and legal stakeholders gain insights related to their specific areas of expertise.
We always seek to utilise familiar language in our risk reports, drawing from official regulations and guidance from data protection authorities. Privacy practitioners are familiar with language surrounding “singling out”, “linkability” and “inference” and this is how Truata Calibrate presents the risks it finds. The underlying mathematical equations and proprietary algorithms drive this familiar vocabulary, so stakeholders at all levels can quickly interpret what the implications of the solution’s findings are.
MC: Similarly to how we enables organisations to take control of their privacy risk management and make informed decisions as to how they store and process data, these techniques can equip individuals with an understanding of how the data that is held on their behalf may compromise their privacy.
A lot of the mistrust around how companies are using their data arises from a lack of transparency and understanding of how data is being stored and used. Communicating these aspects can help individuals to better understand how to control the use of their data and increase their level of comfort and trust in the companies they engage with.
In the future, individuals will be more empowered to control their own destiny with respect to what data is used by brands and we will see data-driven products being created that provide this level of control. Since this can be an incredibly complex thing to do, such products would need to intelligently capture and recommend how data should be collected and used.