A chief analytics officer recently told DataIQ that the term data scientist is impossibly nebulous. Mike Bugembe said that, often, those recruiting for that role expect data scientists to do five jobs at once; that of a data architect, data engineer, machine learner, data science engineer and analyst.
But what exactly is a data scientist? Let’s start from the fresh-eyed perspective of someone who is new to the industry. According to a junior data scientist with big data consultancy Elastacloud, it is someone who has very good statistical skills, but is just as good at thinking outside the box. They also have solid mathematical skills. This is the point of view of Darshna Shah who has been in the industry for seven months.
Shah loves working in her new industry, but as a relative newbie what does she need to do to become a great data scientist? Thankfully, countless words have been written on the essential traits by leaders in the field.
According to Michael E. Driscoll, CEO of interactive analytics company Metamarkets, the three “sexy” skills of data geeks are statistics, "data munging" and visualisation. He elaborated by saying that statistics is a “deep and rigorous discipline” and munging is the “painful” process of cleaning and proofing one’s own data in preparation for analysis because “real world data is messy.” Visualisation, he said, is all about storytelling, both in the sense of the data scientist deepening their own understanding of the data, and communicating the findings to a wider audience.
Beyond these basics, certain traits have been mentioned several times. It seems that to be a great scientist, Shah will have to make sure she is a curious, collaborative, communicative problem-solver.
Andy Peloe, a concept manager at CallCredit, said that a great data scientist is curious about new data, new techniques, new problems and new ways to solve them, while Thomas C. Redman, who refers to himself as “the Data Doc,” called this curiosity, “a sense of wonder.” Redman added that great data scientists are happiest when they are discovering how something works and why it works that way.
Great data scientists must be collaborative because they have to be part of an effective team and, according to astrophysicist Kirk Borne PhD, not work in isolation. In the same vein, Hui Wang, senior director of global risk sciences at PayPal, said this ability to collaborate is essential because the best data science teams have a mix of people - some who have strong business knowledge and others with a broad view of what’s new in academia and industry.
Communication is key with data scientists. Karolis Urbonas, head of data science at Amazon, also said that a great data scientist is the “ultimate communicator” who will ask a lot of questions even if they are 99.9% sure they know the answer. Borne added that great data scientists can tell others the story that the data is telling them.
4) A problem-solver
Making sense of thousands upon thousands of rows of data could be seen as the ultimate riddle. Urbonas stated that a great data scientist is obsessed with solving problems, while Wang said they must be a “passionate problem-solver” with an innate drive to find solutions.
Junior data scientists who take the extra steps to embody these four characteristics should be well on their way to greatness.
Related articles: Dataiku: finding profitable poetry in data