The Panama Papers - the unprecedented leak of 11.5 million files from the database of the global law firm, Mossack Fonseca - opened up the offshore tax accounts of the rich, famous and powerful, laying bare how they have exploited secretive offshore tax regimes.
At 2.6 terabytes of data, the Panama Papers is one of the biggest leaks in history, towering over the US diplomatic cables released by WikiLeaks in 2010 or, more recently, intelligence documents handed to Edward Snowden.
The investigation into the Panama law firm’s dealings and that of its elite clients was the direct result of work carried out by journalists at Süddeutsche Zeitung and the International Consortium of Investigative Journalists. More than 370 reporters from 80 countries worked on the data for a year, such was its scale. As part of its endeavours, the ICIJ also released a searchable database of 300,000 entities harvested from the Panama Papers and its offshore leaks investigation.
The Panama Papers displayed the murky side of offshore accounts, identifying high-ranking government and public officials and pushing some out of office. But another major aspect that stands out is the power of the data itself and how it was sifted. It may surprise you that it wasn’t searched and manipulated by experienced data scientists, but by a team of journalists, many of whom would not identify themselves as technical.
How did the journalists manage to pick out meaningful data from such huge, unstructured files? The answer is graph database technology, which enabled journalists to surface connections between the data, much like joining the dots, to form a picture. Mar Cabra, head of the data and research unit at the ICIJ, has described graph database technology as, “a revolutionary discovery tool that’s transformed our investigative journalism process.”
The unique skill of graph databases is in spotting relationships between data and enabling them to be understood at huge scale. Graph databases utilise structures made up of nodes, properties and edges to store data, unlike relational databases which store the information in rigid tables. Graph databases then map the links between required entities.
This is a boon for investigative journalists, but it is also a powerful tool for any business looking to tackle big data, connected data issues.
Graph connections outshine RDBMS
Graph databases are the only real way we can make sense of the terrabytes of connected data we are seeing today in an efficient manner. Why? Because unlike relational databases, which break data down into tables, graph databases use a notational structure which mimics the way we humans intuitively look at information.
Once the data model is coded in a scalable architecture, a graph database is unbeatable at analysing the connections in large, complex datasets. This enables any business to build and manipulate big data structures easily.
Tech giants such as Google, Facebook and LinkedIn have recognised the power of graph databases for some time now. Facebook’s and LinkedIn’s tools for mapping real-time networks and connections, for example, that let us walk through our social networks are founded on graph technology.
Now that graph database technology has started to go mainstream, this highly-scalable connected data analysis is available to all organisations, from start-ups to blue chips and government.
Graphs give big data flexibility
Graph databases are set to come into their own with the internet of things, where billions of connected devices mean we will be dealing with petabytes of data. Graph databases will enable enterprises to mine data in ways that just aren’t possible using data warehouses and relational database technology.
Increasingly, graph technology is becoming the tool of choice for international agencies, governments and financial services, as well as enterprises, looking to make real-time connections between data and discover the patterns that make up their relationships.
Business is finally waking up to the fact that graph databases are the only tool capable of making sense of complex data connections that will not fit into tables. We will undoubtedly be hearing more about the power of graph databases in the business world as more and more organisations latch on to the unique capabilities it offers.
Related articles: The new data jobs: Data journalist - Hannes Munzinger