Data science and analytics professionals save private companies a lot of time and money by identifying trends that lead to the optimisation of resources and the generation of efficiencies. However, for those who want to put their skills to use in the not-for-profit sector, there are obstacles. Among those are concerns about the legal status of their data, only being able to accept assistance during office hours, and not knowing exactly what kind of help they need.
To address this lack of support and the absence of a mechanism to connect the two sides, Lisa Green, an advisor to Domino Data Lab, and Rayid Ghani, a research associate professor of the Department of Computer Science at University of Chicago, have spent the last few months developing DSSG (data science for socia goodl) Solve.
It is an online platform, inspired by the way open source software is developed, where social good organisations can post their projects and volunteers can offer their skills. According to Green: “From anywhere you can access the web you can get involved, in your own time zone and in your own schedule.”
"You can search for a real-world problem with real-world data."
She went on to say: “If you are good at Python but want to get better at R, you can search for a real-world problem with real-world data and collaborate with other people on work that really matters.” Projects can not only be filtered by the programming skill required, but also by the number of hours to complete it, or by the issue area, such as public health or criminal justice.
"The little projects can get done, too."
Green said that a benefit of the SOLVE platform is that it is not just the most urgent and most resource-intensive projects that get worked on. “When you have this scale and anyone with web access can come and work on it, all the little projects can get done, too.”
Ghani gave an example of a social good project on public health that he has worked on with the Chicago city government. It involved predicting which children would be at risk of lead poisoning, as lead was used in paint in the US until it was banned in 1977. All houses built before that year have lead in the paintwork which puts infants at risk, beginning at the time when babies start to crawl.
Ghani said that most local authorities wait until a child tests positive for lead poisoning before taking any action, which is all the more heart-breaking because the effects of lead poisoning – including reduced IQ and reduced attention spans - are irreversible.
"We worked with the Chicago city government to reduce lead poisoning."
Four years ago, the Chicago city government approached the DSSG Summer Program at the University of Chicago and said it didn’t know how to detect at-risk homes efficiently and therefore which homes to prioritise. “We worked with them, we got data from them for the kids that been tested in the past 15 years, all the homes that were inspected and used that to build a machine learning model that would predict the likelihood of a kid getting lead poisoning when they are two to four months old, before they start crawling. Then we can send inspectors before that happens and fix the problem.”
After creating the first model, Ghani went back with his colleagues and built a more robust model and then ran a randomised control trial to test the effectiveness of that model. They then helped to deploy it at the IT department at the city so that it could be used proactively.
The model is currently being implemented in the hospital system. Now, if a pregnant woman comes in for a check-up, the doctor will be alerted if the child is at risk of lead poisoning in the home. Public health officials who can intervene and reduce the risk are alerted before the child is born.
In the past, government agencies did not offer meaningful projects to work on.
Ghani said that, in the past when he enquired about assisting government agencies, he was told he could only merge some spreadsheets or generate some reports, “but it wasn’t a meaningful project.” As director of the Center for Data Science and Public Policy at the University of Chicago, heknows there is a demand for a platform such as SOLVE because he said he receives “thousands of offers” from data professionals to help with data science projects.
Green added that is it important for volunteers to be able to create output that is reusable and is easy for the stakeholders to understand. That way, the people giving up their time will know that their efforts are going toward effecting change in a meaningful way. “Once you realise it’s about predicting the subject and the variables might change a little with the underlying system, you can reuse [the code],” she said.
The Center for Data Science and Public applies data science to research and practice data on public policy. One public safety project involved creating an early intervention system for police departments to identify officers at risk of being involved in an adverse incident, such as unnecessary use of force.
"It is uplifting to think about all the good work that is being done."
The Center also partnered with the White House on the Data-Driven Justice Initiative which developed machine learning techniques to identify people who cycle through health and mental health institutions and prisons and inform efforts to put measures in place to break the cycle.
“It’s depressing, but it is also uplifting to think about all this good work that is being done,” said Lisa Green. Once SOLVE goes live, there will be many more data professionals with the facilities to do that good work.
Lisa Green and Rayid Ghani were speaking at an R-Ladies London event.