For an evening and a day in early Summer, I had the opportunity to be a data journalist. Usually I am a journalist who writes about data, so I jumped at the chance to spend a bit of time at the coal face and see exactly what is involved in a data hack.
I heard that Open Contracting Partnership was organising a data hack on "Digging public procurement in the UK". It sounded intriguing, so I signed up. Thankfully, attendees didn’t need to have prior coding experience which was great for me - instead of saying "angle brackets", I usually make a sideways V sign with my fingers. I turned up to an office in North London one evening armed with my laptop and a inquisitive mind.
It began with several short presentations. Open Contracting Partnership is a non-profit that advocates for governments to open up the data they hold on public contracts. Senior advocacy manager, Hera Hussein, explained that $9.5 trillion is spent by governments annually in public contracting and that with big money comes big corruption risks. Open contracting, she said, is a step towards efficiency, modernisation, business integrity and avoiding corruption.
It was interesting to find out that the NFP already works with 30 governments around the world, including the UK, Colombia, Mexico and Nigeria. The UK has often been heralded as a world leader for open data and Hussein said that the UK was an early adopter of open procurement practices. However, there is always room for improvement . The Edinburgh tram saga is a case where more transparency would have been useful. It was heartening to see that it is not only countries in the global North that are making steps towards being more open, transparent and fair, as this example from Colombia illustrates.
Bogota has cut out intermediaries and now buys directly from specialist producers.
When the mayor and education secretary of the Colombian capital Bogota adopted an open contracting approach, they uncovered inefficiency in the procurement of food for the city’s school children. As a result, they were able to push through reforms and implement a smarter tendering process. They discovered they were spending 8p per banana instead of the budgeted 4p. They have since cut out the intermediaries and now buy directly from specialist producers, the number of which increased from 14 to 46.
We were then introduced to the resources we would be using. We met the head of transparency in procurement at Crown Commercial Service, as well as the partnerships and engagement manager at 360Giving, which aims to make UK grant-making more open. We were also introduced to the founder of Spend Network, which has a database on how much central government departments and local authorities spend with particular contractors, and also the project co-ordinator of OpenOwnership, a project that is building an open register of global beneficial ownership - by opening up this data, you can get to see who is pulling the strings of an organisati
Having been given context on the sources we could use in our search for interesting information and stories, we brainstormed possible story topics. The following day, we 16 participants split into three groups and I decided to join the group that would find a story within the Panama Papers - a leak of 11.5 million documents mostly relating to the clients of law firm Mossack Fonseca. They revealed the offshore trusts and business interests that some of the rich and famous would have preferred to stay under wraps. The question we sought to answer was, do any of the companies listed in the Panama Papers receive money through UK government contracts?
The first thing was to download the Panama Papers dataset from the ICIJ website and extract a list of UK-based companies by writing in Python script. We found that between 650 and 660 companies in the data set were registered in Great Britain.
We then used Google Big Query - a programme in which you can write SQL to do searches. Some say that SQL is easier and more intuitive than Python. In Google Big Query, one can search or request the same columns from different tables. Python can also be used, but that language looks more computer science-y.
We had to think about the line between suspicious and incriminating. Were we even able to make those judgements?
Using Python, we looked for companies that were both in the Panama Papers data set and the top 1,000 government spending recipients. We found four companies. The results were sense checked, so for example public corporations were excluded, leaving three.
We used the data in Spend Network to get a list of organisations that were paying out to these companies and how much these transactions were worth. We then had to think about the line between suspicious and incriminating. Were we even able to make those judgements? Was this a case to be investigated further?
Looking at OpenOpps - which publishes tenders from around the world - we searched the time period only for transactions and revenue. We then investigated the anomalous results by looking at Contracts Finder data in Big Query to see if there were any contracts that correlated to any anomalous results.
The ability to do data was only one aspect of the process.
We also used AppGov to search for the specific payments and looked up the contextual data connected to them. We did find something curious. On AppGov, we saw that one of the three companies had a ten-fold increase in the number of government transactions and also a steep increase in revenue from one month to the next. Approximately a year later, instead of receiving money from the government, cash flowed in the opposite direction. The investigation could have continued and the story developed, had there been the time to make a Freedom of Information request to understand the anomaly.
What I learnt from the experience was that the ability to do data was only one aspect of the process. It takes different people with different ideas to make suggestions of what paths we should probe into further. And everyone has something to contribute. One person in my group had a PhD in biomedical science. I also learnt that doing data appeals to my nosey side and can be very satisfying when you find out something juicy.