We have a saying in our business, “more tin, less skin,” which bluntly means that, if a computer can do the job of a human, then nine times out of ten we will opt for the computer.
There are certain disciplines which lend themselves particularly well to the vast, objective processing capabilities of computers. These are the new areas of machine learning and predictive modelling which are designed to distil the massive volumes of data that we leave behind us and use it to identify behaviour, often before the behaviour has even occurred.
The trouble is that too many companies are still stuck in a human-led approach to solving big data problems in a process which is fast becoming out of date. They capture and store huge volumes of data from customer transactions in databases like Hadoop and then task data scientists with the job of searching for the nuggets of information that will enable the business to do its job more efficiently.
This over-reliance on data scientists has resulted in a marked increase in recent years in the demand for PhDs and others with the statistical and mathematical skills to carry out the job and is reflected in the huge upward pressure on salaries as companies fight over what little talent and experience there is. Our question is whether the role of the data scientist is as relevant anymore?
If you think about it, there are very few data scientists who have the perfect fusion of skills to be able to bring together the understanding of the business issues with the statistical insights and software knowledge required. As a result, data science teams tend to be made up of people with different components of these skills, which naturally introduces weaknesses into the overall process.
Further to that, one of the biggest problems with data scientists is the inevitable bias they bring to the table. Unlike an algorithm, humans come with their own pre-conceptions or hypotheses causing them to filter out data that could actually be helpful. Scientists with deep experience in a particular data set may develop too much reliance on pre-existing algorithms without re-examining validity for a particular use case. Hidden biases in both the collection and analysis stages present considerable risks, leading to skewed conclusions, bad business and poor decisions.
The solution then lies in creating platforms that can abstract as much as the bias and technical complexity as possible, putting the data into the hands of those who need it most - the decision-makers.
Significant recent advances in machine-learning technology will ultimately replace the need for the role of a data scientist as we know it, through the ability to understand and react to changes in behaviour in real-time and, as such, is set to become one of the most transformational and disruptive technology waves we have seen in recent years.