So now we know the truth about our Facebook friends - one in twelve may not be real friends. In fact, they may not even be real. The social network has revealed that 8.5 per cent of its 955 million accounts are not necessarily created and run by actual individuals.
As part of becoming a publicly-listed company, Facebook has been forced to open itself up to scrutiny for the first time in its short, eight-year life. Investors already disappointed with the losses they have made in its shares will have been keen to understand more about the active user base and how it will be monetised.
From that point of view, the news is not good for two reasons. The first is what Facebook reported about those user accounts: 4.8 per cent are duplicates (run in parallel to a main account), 2.4 per cent are misclassified (created on behalf of a pet or a business) and 1.5 per cent are “undesirable” (created for spamming purposes).
Each of those types of account breach Facebook’s terms of service and the company says that it makes efforts to identify and suppress such behaviour. In the case of pets and bots, it seems likely that big data is being analysed to spot behaviours that are not real.
Which brings us to the second reason why this revelation is not good news for investors. According to Facebook’s filing, “we are continually seeking to improve our ability to identify duplicate or false accounts and estimate the total number of such accounts, and such estimates may be affected by improvements or changes in our methodology.” Evidence of these improvements can be found in its announcement that a flaw was discovered in its geo-location attribution algorithm in June and that it is now identifying where users are more accurately.
But while this points to a positive approach to ensuring all users are genuine, another statement gives less confidence: “These estimates are based on an internal review of a limited sample of accounts and we apply significant judgment in making this determination, such as identifying names that appear to be fake or other behavior that appears inauthentic to the reviewers. As such, our estimation of duplicate or false accounts may not accurately represent the actual number of such accounts.”
Take a gulp and read that again. The world’s largest social network is awash with data, yet it is still using sampling and estimates to deal with the most basic issue - user ID. In my opinion, the reason is simple - the site has grown from a walled garden digital proposition to a Big Data giant without passing through any interim stages of data management and data quality.
Signing up for Facebook is easy and that is its problem. Without validation and matching routines right at the entry point, the network can not hope to maintain the credibility of its user profiles. Internal data cleansing and deduplication are standard practices at almost every other business, but not here, apparently.
With its IPO, Facebook raised $16 billion. It is a shame none of that appears to have been spent on installing data quality measures that are widespread and proven elsewhere. (And as I mentioned in a previous blog, other back office processes seem under-invested, too.)
Until it fixes that, returning some of that value to investors remains a distant prospect.