The data industry, like every other, needs diversity. The best insights and answers will come about as a result of many different people suggesting solutions. So said several data scientists who are new to the sector. But what are the benefits of diversity and what can be done to bring people from a variety of backgrounds into the industry?
The data industry, like every other, needs diversity. The best insights and answers will come about as a result of many different people working on problems and suggesting solutions. So said several data scientists who are new to the sector. Furthermore, machines aren’t clever. They only know as much as we teach them. The set of data that a machine learns from needs to be contextualised to explain and counter biases that inevitably surface. According to Mike Bugembe, chief analytics officer at social giving platform Just Giving, who has worked in analytics and insight for over a decade, it is best if various types of people are doing that contextualising.
He explained that machine learning, in its rawest sense, is learning from history. Historical data, which is called training data, teaches a machine what to do. However, because training data is imperfect, this results in a large number of flaws and biases in the algorithm. He said that an example of this is criminal profiling. Allegedly biased software used in the judicial system in the US was found often to erroneously predict black defendants would reoffend while frequently incorrectly forecasting that white defendants would not. Bugembe said that domain knowledge – the data term for contextual background information - is essential to counteract these biases. In the example of judicial software, domain knowledge would have recognised that injustice had taken place that caused the training data to skew the results.
"Domain knowledge in diversity is where you really get the sweet spot."
“Domain knowledge in one type of person doesn’t work. Domain knowledge in diversity is where you really get the sweet spot,” said Bugembe. He gave an example from his own workplace, when he and his team asked the machine if there was a correlation between the images fundraisers used on their pages and the amount of money they raised. It found that pages with an image of a bicycle tended to raise more than others. “The thing is, if you advised everyone to put a bicycle on their page, it doesn’t mean they will raise the same amount,” he warned.
Bugembe explained that domain knowledge is necessary to explain this result. The reason is that cyclists tend to raise more than most other people because of the socio-economic demographic of cyclists. Most cyclists who fundraise are men over 50 with a wealthy network. Therefore, thanks to domain knowledge, when they are training the algorithm, the Just Giving team can control for event type as they know that cyclists will skew the distribution of the data.
According to several new entrants to the industry, another issue with the sector is that a lack of diversity essentially leads to a smaller variety of solutions to problems and a narrow view of the ways those problems can be solved. For Darshna Shah, a junior data scientist at Elastacloud, different people bring distinct ideas based on their experiences. According to her, if just one type of person is doing data they would probably have a similar approach to how they tackle problems. Therefore, you’re likely to get a finite set of solutions rather than the broad scope you could potentially get if you have a diverse set of people together.
Jonathan Brooks-Bartlett, a data scientist at News UK, echoed that sentiment by saying if you have just one type of person working on something, they will have the same perspective and they won’t be challenged on their beliefs, their thoughts or the way they approach problems. He added that this is true for many areas, not just data science. He said that he benefits from communicating almost daily with the other half of his team which is based in Bangalore, India. He said: “Some of the ways that they approach problems differ. When I start a project, I like being able to talk to them and say: ‘These are the issues and this is how I’m going to approach it’. They quite often respond: ‘I think it might be cool if you do it this way.’ That’s the most useful thing - the perspectives from a diverse range of people,” he said.
"It is like fishing for swordfish, koi carp and clownfish in the pond of the closest park."
Paula Gonzalez, a data scientist and former data science consultant, looked at the issue from a skills angle. She said: “In data science you need a lot of skills that are rarely found in one person. The more diversity you have, the more diverse setting you have in terms of skills.” Essentially, it is unlikely that many different skills will come from a small pool of candidates. An analogy for this is fishing for swordfish, koi carp and clownfish in the pond of the closest park. Working with a diverse group of people allows the net to be cast far wider.
The new entrants to the data industry have had positive experiences of the industry they now work in. For Laila Alabidi, a data scientist at Mudano, her experience in data surpasses that of her previous career in theoretical cosmology. At a recent deep learning conference in Bilboa, Spain, Alabidi realised that so many women were attending, that there was a queue for the ladies’ bathroom, an occurrence that is as frequent as a blue moon at most tech conferences. She also saw people from a wide range of ethnicities and religions. In contrast, she was one of three women in her former department of 100 people at one particular university where she was teaching cosmology.
Alabidi has also found the culture in data to be very collaborative. At the same deep learning conference, she found it easy to engage with other people. As a result, she no longer has the sense that she is the odd one out as she did in cosmology. “I don’t have to prove myself. I don’t have to represent whatever minority I’m being cast as,” she said.
"When you are completely true to yourself, it helps a lot in team work.”
Gonzalez recognises data is a very male-dominated field, but she has had only positive experiences so far. “I am Mexican, a woman, a lesbian, but I have never had any negative experience in any setting. I am completely open about who I am and it has always been for the better. When you are completely true to yourself, it helps a lot in team work.”
Brooks-Bartlett also hasn’t been fazed by not fitting the stereotype – this was the case when he was studying a DPhil at the University of Oxford. He said he became comfortable in his own skin and in what he does and so wasn’t daunted by the fact that he’s slightly different when he went into data science. He said: “I said to myself, ‘I’m not the stereotypical data scientist. I know what I’m doing. I know what my skills are. I’m confident in what I can do.’ So, I’ve taken it in my stride. I’m happy enough basically to approach any challenge no matter if I am fairly different in who I am.”
Brooks-Bartlett he is well aware of the low numbers of women and ethnic minorities in the data industry and stated that this problem needs to be tackled at a very early stage to increase diversity in the data industry pipeline. He remembered being at school and there being no girls in his maths or physics A Level classes. “It’s hard to say, let’s increase diversity in a workplace, in IT or wherever if, even before you get to the point of employing someone, the women or people of a different ethnic background aren’t there,” he said. He explained that it needs to start with school and parents helping their kids to understand what opportunities are out there for them. “It’s about being able to say to someone, it doesn’t matter if you are Black, Asian, female, whatever, from a young age, you can do whatever you want. You can do tech. Don’t let anyone tell you, you can’t.”