In March 2017, the national statistics institute of the UK, the Office of National Statistics (ONS), established the Data Science Campus, its data and analytics hub. The Campus is based at the ONS headquarters in Newport, Wales, and was set up as part of a £17m investment in statistics, announced the previous year.
The Campus has the ethos of "data science for the public good" and managing director Tom Smith is very excited about the projects it has undertaken or supported and the partners it has worked with in the year that it has been in existence.
“If you haven’t got a trillion people to look at every single tax return, you want to focus."
One project that Smith mentioned was carried out by HMRC to optimise its efforts and focus scarce resources. HMRC wanted to understand better which companies were more at risk of non-compliance of tax obligations.
Smith said: “If you haven’t got a trillion people to look at every single tax return, you want to focus. It’s about prioritising. You want to maximise your tax intake while causing minimum harm or damage to people who are quite happily and perfectly putting in tax returns in a great way.”
Needing a predictive model, HMRC built a “fairly simple” decision tree. This combined information about tax payers, such as their past compliance, and characteristics from their current tax returns to calculate a probability of non-compliance. From a group of 1,000 people that had a 30% non-compliance rate, HMRC was able to split the group into two, and then split it again and found there was a smaller group of 400 people with a 60% non-compliance rate. The tax and customs agency was then able to focus its interventions more accurately.
Smith said HMRC estimates that using analytical models based on the decision tree approach will bring in £1 billion over four years through error and fraud detection.
Another project Smith mentioned was carried out by the data scientists of the Food Standards Agency. It involved evaluating early warning systems for outbreaks of the norovirus, of which there are an estimated 2.8 million cases in the UK per year, costing the country an estimated £120 million.
When people go down with it, many will tweet about it. Smith’s team crowd-sourced tweets with key words relating to the virus, such as "queasy", "bug", and "vomcan"’. They then ruled out the tweets that included key words not related to the norovirus, such as "hangover", "smashed" and "drugs". They then correlated the frequency of the tweets with lab reports of officially-identified norovirus.
“What you get is a way of predicting or identifying outbreaks in advance, far enough that you can do something about it. If you get it three weeks in advance, you can put up posters warning GPs, warning families, so you actually can do something about it,” said Smith.
"The first cohort of apprentices is already going around the ONS and getting embedded.”
As well as carrying out projects and actually doing data science, the Data Science Campus has the responsibility for skills and capability building. It has set up an apprentice scheme so that school-leavers and career-changers can be trained in the basic skills of data and analytics. “The first cohort of apprentices is already going around the ONS and getting embedded in some of the teams. They are working on the Census, economic statistics and so on,” said Smith.
"A lot of civil servants in government want to go through this sort of training.”
Working in partnership with the ONS Learning Academy, the Data Science Campus also offers an MSc in Data Analytics for Government. The post-graduate course is delivered by University College London, Oxford Brookes University and the University of Southampton. Smith said: “We’re currently in discussion about extending this so that we’ve got geographical spread across the country, because a lot of civil servants in government want to go through this sort of training and upskill.”
Smith said that the Data Science Campus has also teamed up with the Turing Institute to offer a Data Science for Public Good PhD programme and will be announcing the details of that in the near future.
“The Bean Review talked about collaboration and the building of capability in government.”
The Data Science Campus came into being following the publication of the Independent Review of UK Economic Statistics by Professor Sir Charles Bean in March 2016, which came to be known colloquially as the Bean Review. From Smith’s perspective, this review is the “founding document” of the Data Science Campus.
He said: “It talked very much about collaboration and the building of capability in government through the Campus.” For Smith, working collaboratively with government partners as well as external partners means working together and not taking over the project, disappearing for a couple of years and reappearing having invented something completely radical.
"What we must do is publish data to support and underpin decisions the government is working on.”
Smith stipulated that the Data Science Campus is an objective, impartial processor of the data, with no political leanings one way or the other. He said: “We don’t have a policy interest, we don’t have a view on trade or Brexit. What we must do is publish data to support and underpin the discussions and decisions that the rest of government is working on.”
Tom Smith was speaking at the Turing Institute.