Neo: Making a point with graph theory

David Reed, director of research and editor-in-chief, DataIQ

If you have come across the concept of graph as a way of understanding the connections between data points, then it is a fair bet that Facebook founder Mark Zuckerberg had something to do with your awareness. Graph theory is a cornerstone of the way the social network operates and it has given rise to a new world of graph database technology, analytics and practitioners.

But for some, the path towards mastery of graph was more difficult. “There have been dual processes that led to the category of graph databases. Graph theory is 300 years old, so it is very mature and well understood. In the early days of thinking about Neo, it was not the theory, but the data model that appealled,” explained Dr Jim Webber, chief scientist at Neo Technology in a recent interview with DataIQ.

The origin story of the developer behind one of the leading graph database solutions, Neo4j, starts all the way back in 2000 and some heavy lifting and data engineering by its eventual founder and the CTO. Using graph, they were able to achieve their goal within their existing IT environment, but recognised its limitations.

To draw the links between entities in a relational database required considerable human effort and compute power because RDBS is designed on set theory, whereas graph is based on points, edges and nodes. By focusing on creating a processing environment which allows those to be examined, the founders eventually came up with the first version of their system in 2002.

“Once they had got the engine in place, it opened up the world of graph theory. It provided a platform into which they could pour 300 years of thinking,” said Webber. That said, the system was essentially a Swedish offering up to 2009. Seed funding that year started the progression which moved the business out of its homeland to its present headquarters in Silicon Valley, as well as through three editions of the solution. Neo4j is now available both as an open source application whose users “only pop up when they need advice”, according to Webber, as well as in a commercial variation.

At the same time, its market has been expanding way beyond the association with social media. Webber encountered the technology while working at a telco and recognised its potential to go beyond solving network-oriented problems. “It excels on ‘unstructured’ data - which is not really unstructured, you just have to infer its structure,” said Webber. By contrast, structured data, like that found in relational databases, “doesn’t reflect the world we live in,” because it is not messy, complex and connected. As a result, there has been a growing desire for solutions to both broader business problems and also to the technological limitations of RDBS. As Webber put it: “Many organisations are now realising for themselves that the relational database model is too limited for large-scale jobs - it is not good enough for graph.”

The technology has helped to transform and embed some business activities, such as measuring influence, which were very hard to do just five years ago. Ebay has plugged the solution into its marketplace to support near real-time information about delivery options, for example. “Before, it could only offer one delivery slot in the next three days, whereas now it is able to look at hundreds and capture revenue it couldn’t get before,” he said.

While recognising the importance of social media in allowing graph to gain traction in data and analytics, Webber also noted that the practice has moved on. He said: “For a business leader like Zuckerberg to popularise the concept of graph is phenomenal for us, but it had also become a silo. Now, other domains are graph-based, like financial risk or routes to work. So we have spent time creating a strategy to address those markets. Neo4j is now being used in financial services for fraud detection and intra-day position monitoring. Ad tech is using it for cross-device tracking to optimise the graph.” One “submarine adtech” user is tracking onlne activities by 90% of the US population via one billion ad transactions every day.

Users are now running graph queries at enterprise scale in a way that could only be achieved with enormous effort by Neo’s founders when they first tackled the challenge. “When you look at an issue like identity and access management, that is not a tree, it is a graph,” Webber pointed out.

In that respect, Neo4j has been both solution and catalyst, enabling whole new generations of users to explore data in different ways, thereby opening up an expanding range of business issues to optimisation and improvement. 

The remarkable thing is that not only has a 300-year-old mathematical theory proven robust enough for modern business, but also that the core model which Neo Technology’s founders came up with continues to scale. Said Webber: “As hardware evolves, we can continue to evolve the software so it is always meets and exceeds the expectations of our users. We have already moved from single to multi-core and now multi-CPU environments. On the flip side, cypher query language is being standardised. I am confident that we have got solid underpinning to meet the next challenge.”

Related articles: How the Panama Papers transformed the ICIJ and journalism

Starcount: Why green cars are a tech sell, not a green purchase

Director of research and editor-in-chief, DataIQ
An expert commentator on all things data, David has been editor of DataIQ since its inception in 2011.

Sign-up to hear about the latest DataIQ news, content and events.