Taking data science from Capitol Hill to Boot Camp

David Reed, director of research and editor-in-chief, DataIQ

A series of fortunate events could be one way to explain how Washington DC-based data scientist Ariel M’ndange-Pfupfu comes to be leading the Boot Camp - a data science training course that starts in Edinburgh on 12th September and is a joint venture between US-based The Data Incubator and Scotland’s The Data Lab.

 The first event was his personal progress away from the theoretical domain of mathematics and towards its more practical application. The second was his discovery of The Data Incubator’s Boot Camp and, having been part of its early cohort, deciding to join the business. But the third was when founder Michael Li discovered that his wife was pregnant and would be giving birth right when the programme was scheduled to run. Li needed to be bedside with her in America, not coaching scientists in Scotland, so M’ndange-Pfupfu stepped in to take over.

Ariel M’ndange-Pfupfu, The Data Incubator“I am coming from the perspective of somebody who want to transfer to data science himself. I have a PHd in materials science and engineering, having studied physics at Stanford University,” he explained to DataIQ. “So I have been getting more and more applied. I realised I didn’t want to pursue life in academia, I wanted to work on practical projects with problems that could be solved and have an impact in the real world.”

When he discovered The Data Incubator school created by Li it was exactly what he had been looking for - a programme that combined training in the advanced techniques of data science and coding along with real-world projects to solve in order to build softer skills, like communication and team work. “It was offering everything I was looking for so I did the Boot Camp in 2015. I have been loving it ever since,” he says.

While data science may seem like a recent discipline which contains all of the latest techniques in which analysts need to upskill, it has itself undergone a rapid evolution. Through the feedback The Data Incubator takes from its 250-plus partners in industry, as well as from the scientists who take the course, it has developed and adapted the syllabus accordingly.

Explains M’ndange-Pfupfu: “A lot of data scientists five or ten years ago were writing cool algorithms to get insights from data that were based in maths and statistics, or were using machine learning for recommendation engines. It has become more important not just to develop models, but to get them into the production environment, to get those recommendation engines out onto a web site. That means getting away from programming languages and into ones suitable for writing production code.”

One of the major new elements which he has brought to the course is an entire module dedicated to Python. “It is ideal because it is a very flexible, established language. It is not only easy to read and write, but also to integrate into systems, like web sites or back-end databases. If you want to send your data to Amazon Web Services for cloud computing, you can do that in Python, too,” he says.

While other analytical applications, such as R,  may be more fashionable, M’ndange-Pfupfu believes that Python is the more practical and useful option for data scientists to learn. “I think of Python as the minivan of programming language. Some others are more like sportscars because they have been built for performance, but they are tricky to handle. Some are like a stick-shift [manual] gearbox which is hard to learn, but more fun. With Python, you can put a lot of stuff into it, use it for every task. It may not be the flashiest, but it gets you there,” he says.

Scientists considering migrating into data science need to hone their skills in the tool because it is typically not taught in academic institutions. M’ndange-Pfupfu notes that his own experience in graduate school was in working with specialist maths and statistical applications which were ideal for theoretical tasks and generating the data plots needed for publishing academic papers. Python has more typically been found in the world of software development.

He points out that, “one of the main things I hope candidates will take from Boot Camp is the ability to combine data science techniques with software development engines to come up with something that can be built into a product, like an interactive web app.”

Two aspects of the Boot Camp should lead to its graduates emerging well equipped for taking up a day job as a data scientist. The first is that it is built around real-world business problems and data - sourced in this case through The Data Lab’s industry connections. “We give real projects for people to work on with all of the questions that a business might ask, such as whether they can predict if a subscriber will stay on a site for a year. The data we use on the course is real and the questions are real,” he says.

The second is that the programme has its students working hands-on, rather than being classroom-based. Says M’ndange-Pfupfu: “If you are learning from someone who is very experienced, you learn at their pace. If you are learning by yourself, you learn at you own pace. That can be more effective.”

If his own progress is a guide, the result should be a cohort of highly-skilled, enthusiastic data scientists keen to apply their abilities to business challenges. For the companies who have partnered with The Data Lab to bring Boot Camp to Scotland, that is a very attractive outcome.

Related articles: Michael Li: On the high road to data science 

DNA of a data scientist 


Knowledge and strategy director, DataIQ
David is developing the framework for soft skills and career development among data and analytics practitioners. He continues to be editor-in-chief and research director for DataIQ.