Andy McPhee, data engineering director at AstraZeneca has 40 engineers in hub sites as well as a team of 15 offshore.
Those who are in the hubs are sitting alongside the end users of the data in Sweden, Cambridge and Macclesfield in the UK and Gettysburg in the US, all working using the agile methodology. The two groups create virtual engineering teams that are responsible for everything from data collection through to data visualisation and getting data into the hands of the end users.
McPhee’s data engineers are focusing on either early science or late science, in addition to assisting the enabling units of the organisation.
He explained that early science refers to drug discovery, while late science is about clinical trials and is, therefore, more operational. The enabling units are the back-office functions such as finance, HR, legal, compliance and insurance.
The company faced a serious data challenge in the form of data silos with people creating their own versions of the truth because they did not trust the data in the first place. McPhee said this would happen with clinical data. “Clinical operations would have almost 50 different ways of looking at the same data; so much so that it was easier to go out and find statistics around a clinical trial or study by going to external data sources, than our own internal data. This was a trust piece.”
Furthermore, there was very little transparency in the data or in the lineage of how the data was being used across the organisation and so more time was being spent discussing the quality of the data instead of discussing the business strategy.
Back in 2014, AstraZeneca began to implement a strategic initiative of returning to growth to drive the organisation forward, galvanised around data transformation and endorsed by the CEO and the CFO.
Around the same time, AstraZeneca began to use Talend, and since McPhee joined the pharmaceutical company in 2016, the use of Talend has grown five times. “One of the main things is its integration on cloud. We were trying to get multiple systems of data into a single data model so we’d be using Talend to integrate and conform the data, before then applying business rules and business context, again using Talend to get into that state.”
He said that he and his team are also using AWS, Amazon ECS and AWS Fargate to bring data into their data lake to have containerised, ingestion Talend jobs. McPhee said that the benefit is the process is replicable. “We’ve done that for late science and we’re just doing that for early science now and then we’ll be reapplying that to our enabling units. It’s a reuse concept that we’ve been building all the way through and we reapply it to another business problem.”
The return to growth initiative took a five-year view and is now morphing into a ’growth through innovation’ initiative. This current objective is about making use of innovative technologies to facilitate growth and thinking about patients from a digital perspective.
This would involve taking a digital and data approach to making patients’ experiences better, including by putting more emphasis on IoT technologies. McPhee said: “We want to put things out into the hands of the patients and gather data that way. We want to imagine data so we start to use deep learning on top of full body CT scans to determine whether someone’s got lung cancer and we’re starting to use more and more of those innovative technologies to accelerate our growth.”