Machine learning - focusing on code misses the big picture

ao link

Members

Contact

Free AI assessment

New to DataIQ?

Take our FREE data literacy indicator now

Unlock the power of data - take our FREE data literacy indicator now

According to Rangarajan Vasudevan, CEO of data strategy company The Data Team, machine learning has received an extensive amount of hype and media coverage in recent times. However, he feels it is important to point out that machine learning systems do not operate in a silo and other processes need to take place for it to be effective.

"Only a small fraction of real-world machine learning systems is composed of code."

He wanted to drive home his point that “only a small fraction of real-world machine learning systems is composed of machine learning code” and that the surrounding infrastructure required is vast and complex.

That surrounding infrastructure, or pipeline, is composed of nine components according to Vasudevan: configuration, data collection, feature extraction, data verification, machine resource management, analysis tools, process management tools, serving infrastructure and monitoring.

With regard to configuration, Vasudevan said it is not as easy as it sounds. “Being able to configure software, hardware, applications and getting that right is an extremely draining task,” he said. In the context of a financial services institution, knowing how often to run fraud checks or A/B testing in conjunction with an ecommerce partner are examples of configuration settings.

Vasudevan said that feature extraction is the most time-consuming task in the pipeline, involving the extraction of signals from the data. Using fraud detection once again as an example, feature detection would be about looking at indicators of fraud. He said: “In many cases, those who write the algorithm need to understand the business to be able to say what feature is important.”

Data verification and data cleansing are vitally important parts of the machine learning pipeline, according to Vasudevan. This process can be made more challenging if those tasked with creating the algorithm are not given a comprehensive dataset at the start.

Vasudevan recalled a time when he and his team were asked to create a defect prediction algorithm for products coming off an assembly line. However, he was only given a dataset of defective products. It took a lot of persuasion for the business leader to finally allow him access to the full dataset so he could "see what good looked like".

“That's data verification. You actually have to understand what the data is about,” he said. Vasudevan also underlined the critical importance of analysis tools such as visualisations as well as monitoring to be able to detect and rectify any problems or issues. The CEO stated that a heavy focus has been placed on machine learning, to the detriment of other components of the data science pipeline.

However, does his list of nine components create a representative depiction of the landscape? Or are there other neglected aspects of the pipeline, the value of which needs more recognition? If and when the lustre of machine learning and artificial intelligence begins to dim, the necessity of other parts of the machine learning systems process may be revealed.

Log in to read the entire article

Gain access to the entire article by logging in or registering for a free account here.

Did you find this content useful?

Thank you for your input

Thank you for your feedback

Next read

DataIQ 100 Success Series: EDF – National sustainability and preparing for the unexpected

EDF’s head of data and CRM, and member of the DataIQ 100 Martin Aylward, spoke to DataIQ editor Alex Roberts, about what data leaders need to succeed and how investment in data teams can provide extreme unseen wins.

Next read

Pioneering AI initiatives revealed: DataIQ Announces 2024 AI Awards Shortlist

15 Apr 2024by Alex Roberts

The shortlist for the 2024 DataIQ AI Awards has been unveiled, with the winners to be announced at the DataIQ Summit on May 21.

Final chance to enter the 2024 DataIQ Awards and demonstrate your team’s prowess

08 Apr 2024by Alex Roberts

The final deadline for submissions to the 2024 DataIQ Awards – 26 April – is rapidly approaching, so make sure you have entered to clinch a title.

Data Literacy versus Data Culture – DataIQ’s view

03 Apr 2024by Rachael Pimblett

DataIQ explains the differences between data literacy and data culture as understanding the differences is essential to achieve buy in and support from business leaders.

You may also be interested in

AI just rocked Las Vegas. But where was data?

DataIQ chief knowledge officer and evangelist, David Reed, examines the gamble surrounding AI and why businesses need to play the game.

DataIQ 100 Success Series: Data Driven Danske – Leveraging data in a new way for legacy business

Legacy businesses have a unique set of challenges when adopting a new data-driven future. Data Driven Danske is a transformational journey taking Danske Bank employees to the next level of leveraging data and analytics to drive value for customers, shareholders, colleagues and broader stakeholders.

Analytics and Insight business leaders data culture data literacy data objectives DataIQ 100 finance Financial Services/Banking investment legacy talent Technology Technology and Tools

Newspapers, radio and television – An insight into the impact of generative AI on media businesses

With generative AI paving the way for a new era of data, businesses are rapidly seeking ways to incorporate tools into their operations, DataIQ member News UK delves into their approach.

AI Analytics and Insight artificial intelligence generative AI machine learning Media ML News skills Technology Technology and Tools upskilling

Is your data team ready for generative AI?

The next era of AI tools is being implemented, but businesses must evaluate whether their team and organisation is prepared for a future involving generative AI.

AI Analytics and Insight artificial intelligence Culture and Skills generative AI skills Technology Technology and Tools

DataIQ is a trading name of IQ Data Group Limited
10 York Road, London, SE1 7ND

We use cookies so we can provide you with the best online experience. By continuing to browse this site you are agreeing to our use of cookies. Click on the banner to find out more.

Cookie Settings