With data science often described as an intangible and nebulous subject, those who practice it can find the descriptions and definitions of what they do just as hard to pin down. Asitang Mishra, not a rocket scientist but a data scientist at NASA’s Jet Propulsion Lab, explained how he sees the work that he and his data science colleagues do.
Most people try to explain their jobs in words that their grandmother could understand. In Mishra’s case, it was his Uber driver. During the journey, he told him: "I’m a data scientist, not a rocket scientist or a geologist kind of scientist or like a computer scientist but not a software engineer, but I do do a lot of software but with focus on data. I’m more like a data analyst but with a lot of data, not always."
I am not sure his description passed the test of grandma-level explainability because the driver was left confused. To the TED Talk audience, Mishra said he has to deal with the preconception that data scientists are data wizards or really smart magicians who can solve any problem as long as enough data is thrown at them.
To dispel this myth that he is some alchemist, he gave an example of one of the things he and his colleagues do; detect anomalies on space hardware. The satellites that circle us above the Earth have numerous components and instruments that generate massive amounts of data. This data is presented on graphs that look similar to training load charts from fitness trackers but on steroids. Mishra’s job is to detect the anomalies in these graphs.
He said: “For a human it is not possible for someone to monitor this 24/7 so we’ve built algorithms that can automatically detect anomalies in these graphs and once they do that, they tell an operator, ‘come look at this,’ making their lives much easier.”
To make their lives even easier, instead of writing a new algorithm for every task, data scientists instead look them up in a library of algorithms “by writing just one line of code.”
But this isn’t the main part of his job. “It is only 20% of what goes into data science for solving problems and in fact there are meta-algorithms these days that are trying to also master the art of finding the right algorithm for you and then tuning it for you.” He also said that because data scientists are so good at automation, eventually they will automate themselves out of their jobs.
Mishra stated, the ability to communicate and the willingness to adapt communication style depending on the audience. Once when pitching an idea to clients, he used phrases like “crawl graphs” and “law of delayed returns” leaving the clients with blank faces. Mishra changed the type of language he used and instead said he wanted to build a tool that emulates how proficient humans are in browsing the internet. It worked and the clients closed the deal.
In addition, Mishra said it is essential that data scientists carry out ‘data care’. It is a term that he came up with that encapsulates finding the right data, merging it from different sources if necessary, and then cleaning it. Acknowledging that it is a time-intensive process Mishra said that it is essential because to quote his computing teacher, “garbage in means garbage ou”t. Or in acronym form, GIGO.
“Computers are GIGO. Algorithms are also GIGO. If you give it bad data, it will give you bad predictions,” he said.
A large quantity of data is great but it is not much use if those of us who are not data scientists cannot understand what the numbers mean. This is why good storytelling is so important and data scientists need to be able to translate their findings or predictions into concepts that others will understand and care about. Mishra said it is all well and good to have collected data about measuring soil moisture to a high degree of accurate from space. However, the way to make it relevant to people was to say that these findings could help to save millions of lives by predicting floods and droughts.
Mishra said the reason data scientists like to use pre-written algorithms is because they are lazy. It could be for this same reason that they do not like to duplicate efforts by attempting to solve the same problems as other data scientists. He said: “It is also encouraged to show people your answers. We don’t want to solve the same problems, we want to solve new problems and to share our work with the community, in general, we do something called open sourcing of our code. Our code, free for everyone to use modify and share.”
His final point was that data scientists are an eclectic bunch, in his experience at least, as his colleagues have educational backgrounds and degrees in physics, mechanical engineering, economics and psychology.
“These are all people solving problems using data and computers in their own field and at some point realised that they can use this expertise to solve problems in general. In essence, anyone could be a data science.”
This speech is a reminder of how broad a church data science is, and how similar their tasks and responsibilities are across a host of industries and sectors.
Asitang Mishra was speaking at TEDxOak Lawn.