The money is not in the machines. It’s in the data

David Reed, director of research and editor-in-chief, DataIQ

As investors continue to bet on the future value of machine learning and artificial intelligence start-ups, they should bear in mind one question - where will the data come from to keep those machines running? 

It is easy to assume that every business is now data rich and just needs the right tools to harness this asset. In fact, most organisations are data poor, at least when viewed from the perspective of data science. If you want to train a model to understand the differences between correct and incorrect outputs, even at a basic level where the options are binary, you need a lot of data for it to sift. That includes accurate data, inaccurate data, ambiguous data, plus sets where the output is unknown as well as those where it is clear.

As I have written before, companies can struggle with this issue even at the level of building a recommendation engine for a start-up in ecommerce. That is why data giants like Google and Facebook command the market capitalisations they have - the biggest barrier to new entrants in their space is the huge volume of data which allows the search and social media giants to predict behaviour and invent or optimise services.

By contrast, many large organisations with a real depth of legacy data fail to recognise its true value. Nowhere is this more true than in the realm of healthcare. The accuracy of diagnostic tools is where big money is to be made and the reason for the controversial data exchange of 1.6 million patient records between the Royal Free London and Google Deep Mind. 

“People have played up the need to have great machine learning. It turns out that’s all baloney.”

Leaving aside the data governance issues which led to a slap down from the ICO, consider this response: “I heard that story and thought ‘Hang on a minute, who’s going to profit from that?’” Not a venture capitalist or seed fund manager, but Sir John Bell, a professor of medicine at Oxford university who led the government-commissioned review into the UK’s life sciences industry.

His views make for interesting - and eye-opening - reading. “What you don’t want is somebody rocking up and using NHS data as a learning set for the generation of algorithms and then moving the algorithm to San Francisco and selling it so all the profits come back to another jurisdiction,” he said. “People have played up the need to have great machine learning. It turns out that’s all baloney.”

He’s right. You may well need some big computers and the scientists to run them, but the former is becoming a commodity and the NHS is awash with the latter. Most of the maths and science already exist, they are just waiting for innovative applications based around deep data sets. NHS data is a diamond mine in this sense. Yet those making decisions about how to monetise it are often dazzled by the shiny new machines being put in front of them.

Other organisations may be in the same position, especially those with national reach and long-term legacies. Usually, they are viewed as dinosaurs in need to a digital transformation. As true as that is, their data sets represent a significant asset which could power up new machines to drive that change. It would be good if investors recognised that, as well as banking on the promises being made by ML and AI.


Please note that blogs are the sole view of the author and that they are not neccesarily the view of IQ ddg Ltd and should not be interpreted as advice. Please read our full disclaimer

Director of research and editor-in-chief, DataIQ
An expert commentator on all things data, David has been editor of DataIQ since its inception in 2011.