DNA of a data scientist

David Reed, director of research and editor-in-chief, DataIQ

Are you looking to hire a data scientist? Do you understand exactly what you are looking for, or do you get confused about exactly what they are supposed to do? You are not alone - many in the data industry are struggling to embrace this new job description, as David Reed discovers.

Data scientist - how do you define the term? As a relatively new job title in the data industry, there is no standard description against which to assess an employment brief or a candidate’s suitability. Wage inflation in the analytics sector is such that using the term instantly adds over £20,000 to the salary ask.

That is one of the reasons why many argue that there is no such job, only an extension of what data analysts have long carried out. This can even translate into anger that a perfectly-acceptable job description has been hijacked and its practice over-stated. So much so that at a recent roundtable during DataIQ Link, the first half of a panel debate sponsored by Triggar had to focus on what a data scientist really does, before moving on to why they might be needed in a commercial organisation.

Ben Stapley, data scientist at Triggar, defends his job title on the basis that he came from a science background, holding a PhD and having lectured on Bioinformatics at University of Manchester, then being principal data scientist at first:utility and senior consultant at BAE Systems Applied Intelligence, before joining the predictive analytics firm.

One of the reasons he perceives for the lack of understanding about (or even hostility towards) the concept of a data scientist is that, “in the US, there has always been a very close relationship between universities and IT companies so practitioners regularly crossover. In the UK, that is less mature.”

His own move into commercial practice was as a result of frustration with the pace of academic research. “Commercial projects have time lines which are very fast compared to academic projects. In commerce, data science is also more applied,” he notes. The task is to translate a business challenge into a requirement expressed in scientific terms, select the most appropriate techniques and data sets to tackle it, consider the outputs from that analysis and present them back to the business as actionable proposals.

In order to do that, data scientists need a library of knowledge about scientific approaches. Unlike academic research, the commercial practitioner is unlikely to be innovating a new approach or theory, but rather will take an existing solution out of the library and put it to word. That is why data models for business often reference virology, anthropology, astrophysics or any number of other disciplines. Within the academic realm, a scientist will stay firmly within one sphere of study.

“It is about the toolbox which the data scientist has. A lot of effort goes in to deciding which techniques and options to apply. What I do is apply those well-established techniques to the problem of customer churn - that is a diagnostic problem about the effect on a particular audience which is very similar to certain medical challenges,” says Tapley.

Existing data analysts might argue that they are capable of the same work, finding models that fit data patterns in order to predict future events. While that is certainly true, data scientists are generally also required to have hard science capabilities covering coding, developing algorithms and deploying new technologies such as machine learning (especially the emergent cognition platforms).

As a result, some data scientists make very clear differences between stages of the practice that leads up to and supports their activities. Daniel Hulme is the founder and CEO of Satalia, a spin-out from UCL that is applying those algorithms and technologies to commercial challenges. With a masters and doctorate in artificial intelligence, his credentials are unarguable and he has also co-founded the ASI - a post-doctoral fellowship to help scientists become industry-friendly data scientists.

He identifies five stages of data-driven decision making - the data phase, information, knowledge, use and wisdom. “Data science sits in the sweet spot in that process because it is the capability that can do everything,” he argues. Data engineers are a necessary resource to work out how to aggregate, store and link data, especially of the unstructured kind.

“If somebody tweets that, ‘my new laptop is sick’, that description could be good or bad. That is where data science comes in, using natural language processing to understand that. Once you have contextualised that data, it gains meaning within the database. But then you need to do something with that,” he says.

On top of this descriptive analytics, data scientists develop predictive analytics to forecast what might happen next in a defined scenario. Hulme notes, “that is where a lot of data scientists work, using scientific techniques and working with business analysts to work out the right questions to ask of data and then interpret those patterns.”

It is this point of fusion between data and business understanding that most stretches the job description of a data scientist. “It is often bad for the data scientist to create an operational solution based on the insight they have gained. They are good at spotting patterns, but it is for the business to decide because they are the real domain experts,” he says.

In financial services, traders have a knowledge of their market, but rely on their data scientists (known as “quants”) to identify and interpret patterns and propose models. Hulme adds: “It is very hard to get a data scientist who has that commercial knowledge.” Even harder is to find one capable of the final step in the practice, which is to develop prescriptive analytics for the business.

That is one reason for the emergence of data science agencies who bring together multiple practitioners into teams who can take on the end-to-end business challenge. Profusion is one example of this new generation of business partner, focusing on the interactions between people and organisations with a line-up of data scientists and PhDs to help.

“The science part comes in when you have a hypothesis, test it to the point where it becomes a viable theory, then follow its impact in the real world,” says CEO Mike Weston. But he also makes an important distinction between data science and analytics. “Science doesn’t deal in probabilities - that is the scope of mathematics. True science is unproven - it remains a theory. In scientific language, it doesn’t deal in proofs - it presents a theory that has been tested and is open to scrutiny.”

That could sound like support for the anti-data scientists and worrying for firms hoping to get an operational solution to their problems. But it also underlines why part of the reason for scepticism around this practice is that the challenges to which it is being applied are often ill-defined (and would be capable of solution by regular analysts) and companies are over-inflating their recruitment needs.

Weston argues this is why marketing, for example, needs data science. “Consumers don’t behave in a rational fashion - they are highly volatile. That is where we specialise using the human element in the data we look at,” he says.

To help to close the gap in understanding between what data sciences really do and how commercial organisations should deploy them, SAS recently launched its Academy in Data Science with an initial cohort of 40 students in New York. It offers training in big data management, advanced analytics, machine learning and data visualisation, along with communication skills. A twin-track course combines classroom instruction, hands-on case studies or team projects, certification exams and coaching in a six-week immersive experience.

Laurie Miles, head of analytics for SAS UK and Ireland, notes: “What we are saying is there is a distinction between data science and data analytics. It is around business acumen and communications skills - it is easy to find people with communications skills or people with hard mathematical skills, but it is really hard to find both in one person.”

That is why the Academy emphasises both on its course and also targets managers, as much as practitioners. “We expect to get people from both cohorts - analysts who need to refine their communications skills and business people who need to hone their maths skills. But when it comes to the real world, people specialise. That is why you need a team which combines specialists in data, modelling, machine learning and business sense,” he adds.

If you want to get to grips with the DNA of data science, this is the moment you discover it is shared across multiple individuals, rather than being present in just one. Or you may equally conclude that what you really wanted all along was a conventional analyst. Miles has sympathy with that view: “Terminology is a challenge. I thought I worked in the IT industry. I don’t - it is the fashion industry.”

Knowledge and strategy director, DataIQ
David is developing the framework for soft skills and career development among data and analytics practitioners. He continues to be editor-in-chief and research director for DataIQ.