At the annual meeting of the World Economic Forum in Davos, Switzerland earlier in the year, a panel was held called ‘Mapping data dominance’. The panellists included an academic, a data scientist and entrepreneur, and the director general of a telecoms trade body. From this panel, I heard the phrase ‘the data divide’ for the first time.
This refers to the percentage of the world’s population who are not connected to the internet compared to those who are. As of the end of March 2019, 56.8% of the people in the world were connected. That leaves 3.33 billion people without internet access.
That got me thinking about the deluge or the drought of data a person experiences depending on which side of the divide they are on. Us ‘onliners’ are in constant receipt of data and information whilst simultaneously leaving data trails everywhere we go.
It is also important to think about the type of person who is on both sides of the divide. According to panellist Ngaire Woods, dean of the Blavatnik School of Government at the University of Oxford, twice women are unconnected than men, and four times as many people that are not connected have not been through education.
Internet penetration varies widely by continent as well. North America has the highest connection rate with almost nine out of ten people able to log on. Africa has the lowest with just 37% of people able to get online. Furthermore, nationally there is a marked difference in the access to and speed of internet connections in rural versus urban areas, with major towns and cities having much better connectivity.
From this, we can deduce that a typical internet user is likely to be an urban-dwelling, educated male living somewhere in North America. Having come to understand the term ‘sample bias’ and it seems reasonable to suggest that the internet and its many services are designed for and caters to that typical user. It is important to add that this typical user is happy to give away their data, if they feel they are getting a good product or service in return, with the average US adult having 7 social media accounts.
Sample bias is defined as the collection of a sample in such a way that some members of the intended population are less likely to be included than others. Women, people with less education and people residing in the African continent are, at the moment, less likely to be connected, but they are the ‘intended population’ of the future.
They are less likely to have their data collected. Their data is not going to be used when training and prototyping new technologies and internet services that are yet to be invented.
We are now hearing about the devastating consequences of women’s data not being considered when developing safety features for cars. Vehicular seat belts are more likely to help men but harm women. We are seeing the problems of the data of black people being disproportionately underused when training AI facial recognition models. There is a woeful lack of accuracy in the recognition of black faces.
That got me wondering. How do we build an internet-dependent, data-driven society that meets the needs of all its users and citizens, no matter when they first logged on or what region they are in? How will we accommodate and provide for those who have never given up their data, and those who wish to stop doing so? What if the new users are more cautious with their personal identifiable information and have less disposable income? How will we ’pay’ for online services if data ceases to be ’digital legal tender’?