Academics and commercial data scientists have two things in common - both need large volumes of data and the computing power to process and analyse them. Whichever side of the fence you sit, there is always competition for access to both and limits to what is available. Which is why the creation of the Cambridge Centre for Data-Driven Discovery (C2D3) earlier this year was a sensible move, combining the high standards of research at the University within a unified resource centre that also pulls in up to ten research partners from across industry, led by foundational partner Aviva.
“It’s an initiative by the University of Cambridge to bring together different disciplines so they can benefit from data science and new methodologies using big data,” C2D3 co-chair Professor Anna Vignoles told DataIQ in an interview. “It supports research across the University and acts as central resource. There is a huge opportunity in this space to do big projects, particularly around training the next generation of data scientists. That is the single biggest skills set needed and there is not enough of them.”
As a cross-disciplinary centre, it is hoped that knowledge transfer will stimulate research ideas and accelerate breakthrough solutions. At the same time, commercial partners will bring real-world problems into the picture that condition how cutting-edge approaches, like machine learning, can actually be deployed. “Blue skies research in academia is what commercial organisations want and is a strength we have,” said Vignoles.
C2D3 is supported by the Cambridge Centre for Data-Driven Discovery (CSD3) which is intended to act as a national capability for data-intensive simulation and advanced analytics. It is the result of a co-investment of £14 million by the University of Cambridge Research Computing Service, STFC and EPSRC, itself a consortium of the universities of Cambridge, Bristol, Leicester, Southampton and Oxford, Kings College London, UCL and Imperial College. It has a petascale computing platform that has been build around two systems supported by Dell EMC, Intel and NVIDIA.
The research centre has a steering committee of ten to oversee the 50 researchers from across faculties. The University has a distributed model, so C2D3 acts a broker, putting teams of people together to work on projects. It is an important task that needs to be done right as researchers may spend two or three years working together.
Her own area of research is around resilience and what makes some individuals better at coping with difficult situations. “You can’t just do the in the psychology department, you need data from a wide variety of sources. Cambridge University doesn’t have a huge number of them - it is a privilege to have one,” she said.
By creating C2D3, it becomes possible to run projects at scale, drawing in other collaborators and leveraging a high-performance computing tech stack. As the founding partner, Aviva is able to bring its commercial data science skills to the table, not least from practitioners who have chosen not to do a Masters, but are practising in the real world. Another benefit of collaboration is to put in place strict data sharing protocols and ethical controls which mean data does not leave industry partners’ premises. Said Vignoles: “We are getting cleverer about what we have to do.”