Separating factual tweets from the "alternatively factual"

ao link

Members

Contact

Free AI assessment

New to DataIQ?

Take our FREE data literacy indicator now

Unlock the power of data - take our FREE data literacy indicator now

At a time when we seem to be swamped with fake news, work is being done to make it easier to filter out the fiction disguised as fact online. On the micro-blogging site Twitter, information spreads like wildfire, whether it is true or not.

Axel Oehmichen, a research associate at the Data Science Institute and the Department of Computing at Imperial College London, gave an account of the data science pipeline for creating a model, an also co-authored an academic article detailing the project. He and his colleagues Julio Amador Díaz López and Miguel Molina-Solana went through the process of creating a model to identify tweets containing fake news at a recent event.

"It is indeed possible to model and detect fake news."

From sentiment analysis, they found that non-fake news tweets are usually more positive while fake news tweets tend to be a lot more negative. Their project also revealed that viral tweets containing fake news appeared to include more URLs than viral tweets that didn’t. Also, tweets containing fake news mostly contained one mention while other tweets contained two. Furthermore they found that there was a higher chance of fake news originating from an unverified Twitter account. They concluded that: “It is indeed possible to model and automatically detect fake news.”

In terms of the process, Oehmichen said that first step was collect to data on Twitter, looking only at viral tweets in English, composed between November 2016 and March 2017, that contained the following hashtags and handles; #MyVote2016, #ElectionDay, #electionnight, @realDonaldTrump and @HillaryClinton. Viral tweets were defined as having 1,000 or more retweets. Of the 57 million composed during that time period, 9,000 were selected. The process of collection took five months and labelling took a further month.

Manually labelling the tweets was a two-part process. The tweets were tagged as fake news, not fake news or unknown by two groups. The first group comprised students, friends and colleagues of the project authors. The second group comprised the authors themselves.

"People tweet more actively during election night and the day after."

One of the first things they realised was the spike in the number of tweets the night of and the day after the election. He said: “It just turns out that people were a lot more active during election night and the day after than any other day before or after. Because of that, we have had to remove that feature from what we selected for building our model.”

The features they decided to look at were from the meta data: the location, the number of favourites, users, followers and mentions, whether it is from a verified or unverified account, the number of friends that user has, and the media used in the tweet.

They used the Kolmogorov-Smirnov "goodness-of-fit" test which compares one set of data to a known distribution to find out if they have the same distribution. According to StatisticsHowTo, it is commonly-used as a test for normality. This test was used to see if there was a statistical difference between the features of fake news tweets and non-fake news tweets.

Users that had capitals or weird characters...are more likely to propagate fake news.

Oehmichen and his colleagues performed natural language processing analysis on the tweets. He said: “For sentiment analysis, we just extracted different bits from all the text we could find as part of the metadata. We realised that users that had capitals or weird characters like exclamation points in their username had a significant chance of being people propagating fake news.” He and his colleagues also looked at the core sentiments that were captured in the expressions of the tweets. They then moved on to “more elaborate techniques which are machine learning approaches.” Oehmichen weighed the words in a tweet together to give a sentiment on the tweet, then "piled them up" for a final score.He described this in greater detail, but was very aware that some members of the audience would not grasp everything as the process was complex with many steps.

"The data is saying something, but always go back to why?"

The presentation gave the audience an insight into the use of data science to solve a problem from start to finish, and highlighted some revelations the report authors made along the way. Oehmichen said it is imperative to ask the same question. “The data is saying something, but we always go back to why and every step of the way. We ask 'Why? Why? Why?' If there is no clear reason for this, we still keep it, but it is not very satisfying and we always try to do additional analysis to make sure that what we have is indeed ground truth.” Another important lesson learned was to always be aware of context when looking at any set of results.

Axel Oehmichen was speaking at fintech hub Rise.

Log in to read the entire article

Gain access to the entire article by logging in or registering for a free account here.

Did you find this content useful?

Thank you for your input

Thank you for your feedback

Next read

Starting a data academy programme: A blueprint for success

Organisations need to implement their own data academies to prepare for long-term success as data’s place in the business world continues to rapidly evolve.

Next read

A case of the AI biter bit?

23 Apr 2024by David Reed

DataIQ’s Chief Knowledge Officer and Evangelist, David Reed, examines the hype cycle around generative AI and the actual speed of transformation being seen.

Pioneering AI initiatives revealed: DataIQ Announces 2024 AI Awards Shortlist

15 Apr 2024by Alex Roberts

The shortlist for the 2024 DataIQ AI Awards has been unveiled, with the winners to be announced at the DataIQ Summit on May 21.

Final chance to enter the 2024 DataIQ Awards and demonstrate your team’s prowess

08 Apr 2024by Alex Roberts

The final deadline for submissions to the 2024 DataIQ Awards – 26 April – is rapidly approaching, so make sure you have entered to clinch a title.

You may also be interested in

CDO Challenges – Stressing the importance of a data strategy

DataIQ is a trading name of IQ Data Group Limited
10 York Road, London, SE1 7ND

We use cookies so we can provide you with the best online experience. By continuing to browse this site you are agreeing to our use of cookies. Click on the banner to find out more.

Cookie Settings