If you could use data to solve a global healthcare problem, what would stop you? If the data you needed was available on an open basis - and you could find the human resources - then the answer should be nothing. But if there was a high cost involved, the chance to stop the spread of a disease might get missed.
That is one very real example of how the future of data depends on multiple, interconnected forces which are often pulling in opposing directions. It emerged during the second day of the all-star, sell-out DataFest17 event run by The Data Lab in Edinburgh in March. With a focus on data for good, innovation and the ethics of data and technology, it offered some thought-provoking challenges to the prevailing mood that data will simply conquer all before it.
One of the engines of that optimism is the new access available to tools which, even five years ago, were inaccessible to most organisations. As Hilary Mason, founder of New Tork-based Fast Forward Labs, put it: “Innovation happens when something that was high value becomes cheap enough to play with.” Falling costs for analytics have created the conditions for the current explosion of interest in data and its potential for society and business.
But if this momentum is to be maintained, there are three major questions that need to be answered and which emerged as themes across the day.
1. Mo data, mo money - or free for all?
Estimates for the economic benefit which data and analytics could deliver vary wildly, but generally start in the billions. Yet achieving that uplift requires the basic commodity of data to be available for analysis in order to add value. DataFest17 revealed a tension at the very heart of the new move towards data-enablement - whether monetising data will also lock it down and keep it in the hands of a few major players.
That was evident in the presentations given by UNICEF and Transport for London. For non-governmental organisations (NGOs), the use of mobile phone data to track where refugees are leaving and what direction they are heading in would allow NGOs to co-ordinate their aid efforts better and ensure resources are sent to the most effective locations. “It can actively save lives,” said Natalia Adler, data, research and policy planning specialist, at UNICEF.
But it ran straight into the monetisation barrier when trying to tackle the spread of Aids in Sierra Leone. “We were asked for $3 million per week to get mobile data. The more data is monetised, the harder it is to get hold of for social good,” said Adler. That is a shocking demand which seems hard to justify, not only on a cost basis but also because mobile networks already make anonymised data available for research purposes.
To encourage the release of data for good, UNICEF is a partner in a data collaborative which offers a framework with six different ways for organisations of all types to share data. “The data doesn’t have to come to us,” noted Adler. “You may have data that you don’t know is critical, so come and talk.”
One example she gave of how this works was a collaboration with Intel to use satellite imagery of the Sierra Nevada to understand water supply forecasts for California. Another was analysing the demand among young women for C-sections in Brazil, which has the highest level in the world. She said: “Give us the data and we will work out the solution.”
By contrast, London’s not-for-profit travel operation, Transport for London, adopted an open data strategy nearly ten years ago. “We took another look at what TfL is there to do. People see us as moving units of transport from A to B. We came up with the vision to keep London moving, growing and making life better. Through that lens, we approach our assets in a different way,” said Phil Young, head of online, Transport for London.
TfL takes a 10- to 20-year view of how to achieve those goals - that is how long a major infrastructure project like Crossrail takes to plan and deliver. It also needs to keep an eye on the city’s growing population and the rising expectations of its customers.
“We realised we need our informaton to be on other web sites, not just our own. We needed to push it out to football clubs, theatres, etc. We had created a service update app, then developers came to us with other ideas about how to present the information. That was when we moved from pre-packed to open data,” said Young.
TfL was inspired by the BBC’s approach to TV listings. It identified bus arrivals, journey planners, cycle docks, tube services and parking sensors which were delivered via APIs from 2009. “That led to hundreds of apps growing up around our servce.” Historical tube station crowding data has recently been released and TfL is working on live crowding data. “We also know the time of a journey from any place to any other place in London,” he said.
To help drive new products to market, TfL runs hackathons, incubators and accelerators and now has 11,000 registered developers, with 300 new sign-ups each month and 600 live apps. When Deloitte ran a calculation of the benefit and cost-savings which had been achieved from TfL’s open data strategy as part of the Shakespeare Review in 2013, it came up with a range between £15 million and £58 million.
These examples demonstrate why the data explosion is sending organisations in opposing directions when deciding whether to monetise or open their data. For proponents of the first view, predictable cash in hand to deliver a return on high-cost investments is appealling. For open data advocates, it is the unexpected solutions which an eco-system of developers creates that has value. You pays yoru money (or not) and takes your choice.
2. Building ethics into technology and data
“What type of society do we want and how should technology support us?” That question was posed by Mandy Chessell, distinguished engineer and master inventor, IBM in the context of the data explosion. “The more we share data, the more we give permission,” she noted. “You can say that technology is ethics-agnostic, but it does affect lives and choices.”
Humans have circles of trust where friends and family are closest, while acquaintances and strangers are furthest from us. New devices and digital services start as strangers, but as they deliver value over time, it creates trust and they move into the inner circle.
But Chessell pointed out that, “those circles are getting very blurred because every aspect of life is on the same platform - work, family, self. Data from those different angles is being brought together to make decisons.”
“How do we compartmentalise so information from one place doesn’t interact with another? If a number of those services are under the same roof, it will build trust and the individual will perceive the value to have increased. But if the service providers overdoes it, they will lose value,” said Chessell.
She put up a set of questions which technology developers should consider in order to keep on the right side of the ethical boundary. She also offered a prime example of the way something which looks like a neutral approach to help collect data and improve a service can rapidly come to be seen as creepy. Microsoft built a keystroke logger into Windows 10 with the intention of improving its autocorrection, spelling and language tools. But users see it as intrusive since it effectively records everything they are doing.
That led Richard Marshall, futurist at Gartner, to note: “Be careful - apply ethics before you make the technology available.” Once something is released, the market will decide how it is perceived and it may be too late to shift the argument.
Gartner uses the term “digital humanism” to describe its approach. A company blog from 2015 explains: “Digital humanism stands in contrast to digital machinism - a view that sees the minimisation of human involvement through automation as the central focus of technology. This perspective is driven by the belief that technology is valuable when it allows people to spend less time on mundane, repetitive and inefficient tasks.”
Mason provided a live example of the way technology can release humans from these burdens, leaving them able to pursue more value-adding tasks. She explained: “At one New York City bank, 80% of Sarbanes Oxley filings are being prepared using artificial intelligence and the regulator is parsing them using natural language processing, so there are no humans involved in four out of five transactions.”
But the risk is always that technology becomes a binary option with ethical considerations left to one side. When US senators were asked about their attitude towards a proposed bombing mission, 30 Republicans voted yes and 36 Democrats said no. The problem for both political parties was that the country mentioned did not actually exist. “People won’t hesitate to answer complex questions with yes or no answers. Humans hate complexity,” said Adler. When it comes to ethics, things rapidly get complex which is why technology developers so often avoid confronting them. Whether that is sustainable was left as an open question.
3. Getting the people right
At UNICEF, as in many other organisations, “we are always looking for data scientists.” But as Vicky Byrom, senior analytics consultant at Aquila Insight, said: “A growing number of people are entering data science, so why am I worried?”
For her, part of the problem is that the jobs which data scientsts are being asked to tackle come with built-in problems that should have been resolved before the scientists are brought in. Data quality is a perennial issue - one that costs the global economy $3 trillion annually according to Harvard Business Review - and which can see those big brains get bogged down and lose their focus and energy.
“Projects are starting with data, not with a question. There is a lack of connection between data people and the business,” said Byrom. According to an O’Reilly data science survey in 2016, 2% of data scientists spend no time at all in meetings with the business, 24% spend one to three hours, 42% spend four to eight hours, 26% spend nine to 20 hours, and 5% spend more than 20 hours.
Some are clearly much luckier than others, while those at either end of the spectrum will not be delivering real value, if for different reasons. The paradox is that one environment where such issues should not exist for innovative people have their own problems. “Start-ups have no data - enterprise is where the data is,” said Mason.
She offered some tips for ensuring that data science teams get to their optimum performance level. “Four things are needed when you are trying to innovate. You need a broad team, so insights from one domain can be applied to another. As you build your team, every new member should bring a new skill. Look for a change in the economics that constrain the use of technology, such as when analytics becomes a commondity. And when new data becomes available, that allows you to add machine learning,” she said.
With ever more investment heading into data and analytics, both at enterprise level and start-ups, the opportunities for growing the economy and doing good are expanding. But so, too, are the pressures on resources and people which could constrain those very opportunities as they arise, either through demanding too much of a return or by crossing an ethical boundary. DataFest17 gave a useful insight into how those considerations are being evaluated and resolved.