Is the data science industry asking too much of its new recruits? Mike Bugembe, chief analytics officer at social giving platform Just Giving, thinks it is. He tells DataIQ's Toni Sekinah the different boots the perfect data scientist would need to fill.
Despite being the "sexiest job of the 21st Century" (@Hal Varian, chief data scientist, Google), those sexy data scientist posts are hard to fill. The acute shortage of data science professionals has been well documented for a number of years now and the problem shows no signs of abating. Mike Bugembe, chief analytics officer at social giving platform Just Giving told DataIQ that he thinks the data sector itself might be its own worst enemy. He said that the words used to describe the industry can lead to confusion that deters possible recruits - terms like data science, big data, AI and Internet of Things are all completely different, but get used interchangeably.
It seems that this conflation also happens the other way around. Christopher Brooks, a research assistant professor at the University of Michigan, said in an FT Capgemini briefing: “There are so many job adverts for positions that aren’t called data scientists – they’re called business intelligence, customer intelligence and so forth – but what they’re looking for is a data scientist.” Bugembe said that, by using different job titles as synonyms for data scientist, it makes it difficult for prospective data professionals to know what they need to do to join the ranks.
But what do people need to know or understand to become a data scientist? Some say they should have a perfect blend of computer science, maths and statistics, and subject matter expertise. Other that they should be able to blend four key skills: communication, statistics, programming and business.
Bugembe said he thinks that, when companies advertise to hire data scientists, the role can be broken down into five distinct positions:
1. Data architect
2. Data engineer
3. Machine learner
4. Big data engineer or data science engineer
The architect deals with the problem of storing and wrangling the data. He said: “Most data science problems are dealing with large sets of data. Where are you going to put that large set of data in order to be able to analyse it? How do you store it? You need somebody who knows how to wrangle large sets of data and move them around and make them available to be computed over a distributed system. You need an architect who understands that.”
Bugembe said that data engineers are fantastic at grabbing the data and moving it around, while the machine learner is the person who “knows how to build algorithms. They are just saying ‘give me the data. I will train the machine’.”
He said that the big data engineer or data science engineer is able to write the code to put the algorithm on a production system that can be visited by hundreds of millions of users every month without it crashing or slowing down. “Someone needs to be able to have the skills to be able to write your algorithm in a way that returns results in milliseconds. Those are your big data engineers, or data science engineers,” Bugembe explained.
The final role of the analyst, according to Bugembe, is also very useful, though perhaps not always necessary. “This is the person who can bridge the gap between the commercial and technical sides of the business.” He said the analyst needs to be able to speak the language of the business units and executives and communicate to them how the activities of the data team are valuable.
“Most people describe a data scientist as all five of those things in one person. It’s a lot to ask. I would almost say impossible, to find someone who can do all of those things,” stated Bugembe. He also said that a company that is able to find a person with the skills for the first three roles alone, would have found “the unicorn.”
The CAO has worked in data, analytics and business intelligence for over a decade and has seen the negative consequences of companies not making the right hires due to confusion on both sides. “You have organisations who don’t really know what they’re looking for, so they’ll hire people that don’t really fit the mould then it goes through that cycle where you’re getting a lot of organisations hiring people and being unsuccessful. Then [the industry] begins to get a bad name,” he explained.
Back in 2014, over 75% of posts in big data were considered “fairly” or “very” difficult to fill. In 2015, innovation foundation Nesta said that 80% of the data-driven companies it spoke to were struggling to find the skilled workers needed to meet their demand. During the first half of 2016, the number of data scientist positions across Europe grew by 45% and at the start of 2017, the Future Today Institute predicted a 50% gap between supply and demand for those with data science skills.
If Bugembe is right and an industry with such a great need for skilled data professionals is muddying the waters with regard to job roles and responsibilities, it’s shooting itself in the foot.
Related articles: What does it take to be a great data scientist?