Gary Richardson describes himself as a “dichotomy of a technologist”, focused on cutting-edge use of data technology driven by market-leading consumer products, but not a major user – he only signed up to Instagram and Whatsapp a month ago.
With over two decades of experience as a data engineer, Richardson has worked in data since before it was cool and he is still as dedicated to the profession as ever. Having worked at a number of companies, from QBE to Capgemini, he currently finds himself at KPMG where he has been for over four years as head of data engineering.
However, becoming a data engineer was not his initial choice of career. Richardson returned to the UK from South Africa in 1999 having been an extractive metallurgist. He soon found there was little demand for someone who specialised in the extractive metallurgy of gold. “I had to do something else with my life, so going from mining ore to mining data seemed to be a good thing to do,” he says.
He started off in programming in the development team at QBE and spent ten years working his way up the programming ranks to become an enterprise architect until he was asked by his boss at the time to build a data warehouse. This is how his new career really started: building data warehouses, document management systems and large data processing pipelines. It has given him over 15 years of programming experience, followed by many years as a technical application and enterprise architect.
In Richardson’s own words, he went from “doing the hardcore programming into more data pipelines and eventually data engineering.” Nowadays at KPMG, on any given day, his team finds itself dealing with mountains of email for trader surveillance or billions of accounting transactions looking for fraud or petabytes of machine logs to detect cyber attacks.
Richardson describes these tasks as, “the really tough problems which come with the need to scale-out data processing.” He says that the biggest challenge of these tasks is understanding the essence of the problem that needs to be solved and then trying to determine the best technology in the kitbag to achieve the desired outcome.
That consistently-used toolkit extends from data processing on Hadoop, Spark and Elastic, to stitching it all together in Java, Python, R and increasingly Scala. KPMG’s data engineering team is also using a lot of open source machine learning libraries, all underpinned by a virtual private cloud to scale out the clusters.
“The biggest benefit is when we are able to prove we can solve a problem and by adding more nodes to our clusters we scale easily,” he says. On his wishlist would be “a really nice open source visualisation capability to put in the hands of end users,” he says. “That’s one thing the open source community have yet to spend a lot of time on.”
Within KPMG, Richardson says that his data engineering team fits between infrastructure (DevOps) teams and data science communities, thinking of itself as the interface between hardware and algorithms. He adds: “We are increasingly responsible for content strategy, the processing and ingestion of large datasets, data curation and then exposing the data to the data science and analytics teams. We like to think we are evolving into DataOps.”
For Richardson, his biggest achievement over the last 12 months was implementing a number of machine learning projects for big clients. “I’ve slowly, but surely been introducing machine learning into enterprises. Traditionally, their focus has been on the data warehousing and data analytics side,” he says. These enterprises have been collecting huge amounts of data over the years which they have been labelling, fortunately for him, as this “makes machine learning easier.”
Coupled with enterprises embracing cloud technology and the rise of open source technology, he says, “it makes it a lot easier and more accessible for data scientists to combine the data and compute together to do the machine learning.”
Projects he has been involved in are typically focused on the optimisation of business processes. He says that he and his team have been able to differentiate products and services and create “vast” reductions in cost and speed-to-value to run a standard process in a bank or an insurance company.
For Richardson, the best thing about working in the data industry is that it now has a lot more visibility. “We seem to have become flavour of the month and we can make a dramatic impact if we are given the autonomy, the data and the budget,” he says.
To keep his skills up to date, Richardson will spend a lot of his time outside of work listening to podcasts, reading blogs and technical papers, and talking to industry leaders. He’ll discuss what they are working on and how it is helping to solve their business problems. He also encourages his data engineers to do the same.
In the coming 12 to 18 months, Richardson thinks there will be a lot more transition to production of open source data technology in mainstream enterprises. In addition, he says there will be very much a focus on cognitive automation - solving problems through deep and machine learning.
“We seem to have become flavour of the month and we can make a dramatic impact if we are given the autonomy, the data and the budget.”
“This will solve a lot of problems and unlock a lot of value right across different industries from banking and financial services to energy companies with smart metering and the internet of things. There’ll be a lot of data streams, a lot of machine learning and innovative solutions that will help businesses provide better customer experiences,” says Richardson.
Although he works for one of the big four accounting firms, Richardson still keeps an eye on the smaller players that use data in innovative ways. One of his favourites is a startup that is building a digital personal assistant that can book meetings and manage a calendar.
“It takes an inordinate amount of machine learning to be able to have a cognitive assistant sit in the loop and read your emails. If you get a message inviting you to lunch and you respond and cc your cognitive assistant, it’s able to find a venue, book it, book the calendar invite and make sure that everybody turns up,” explains Richardson.
He is also impressed by InsurTech startup companies like Trōv and Slice as he thinks they are redefining the insurance industry. “Built deep into their business model is the use of a lot of analytics and a lot of artificial intelligence,” he notes.
According to Richardson, being a data engineer is definitely a career with a future as it is more relevant than ever. “Great data science requires great data and it is the engineers that deliver that,” he says.
He explains that this “great data science” is coming from the evolution of data engineering to enable the processing, labelling and exposing of curated data at scale for machine learning. Furthermore, he says, the ability to industrialise and scale machine learning means it can be delivered to the edge where businesses can embed the machine learning into their business processes.
To someone who wants to become a data engineer, Richardson says that a high level of tenacity is needed. He cautions: “There is no substitute for many, many hours at the keyboard - learning, experimenting and delivering code. There are no shortcuts.”