Who will be the next president of the United States? According to the algorithm developed by a team of researchers from the University of Calabria for the analysis of sentiment on social media, Joe Biden will win the elections today.
The system uses neural networks to interpret the voting orientation of voters on Twitter, analyzing the hashtags used by users. The algorithm has already been successfully tested on the 2016 US presidential elections and the Italian policies. In both cases it returned a voting prediction very close to real results and even more accurate than traditional polls and other AI-based survey techniques (more details here).
For the 2020 US presidential elections, 550 thousand 979 tweets were examined, published by 308 thousand 262 users in the period between 18 and 27 October.
The result sees Joe Biden leading with 53.1% of the vote, followed by Donald Trump with 41.7% (the remaining 5.2% is split among the remaining candidates). Professor Domenico Talia, full professor of Information Processing Systems, researchers from DIMES (Department of Computer Engineering, Modeling, Electronics and Systems) Loris Belcastro, Riccardo Cantini, Fabrizio Marozzo and Paolo Trunfio collaborated in the study. Giovanni Bruno, of the Unical DtoK Lab spin-off, also participated in the data analysis.
THE METHODOLOGY – The researchers provided the system with a first set of hashtags for the classification of tweets based on the voting orientation expressed. All ‘institutional’ hashtags and easy to interpret, to start the analysis:
- Pro-Biden: ‘democrats’,’ votebiden ‘,’ bidenharris2020 ‘,’ americaneedspennsylvania ‘,’ bidentownhall ‘,’ trumptaxreturns’, ‘bidenriots’,’ trumpisaloser ‘,’ trumpknew ‘,’ trumpcoupplot ‘,’ thepresidentisacrybabybabyhkin ‘,’ trumpletinshkin ‘,’ ‘,’ voteblue ‘,’ lyingtrump ‘,’ votebidenharris’, ‘militaryforbiden’.
- Pro-Trump: ‘americafirst’, ‘maga’, ‘trump2020’, ‘republicans’,’ kag ‘,’ kag2020 ‘,’ moscowhunter ‘,’ maga2020 ‘,’ makeamericagreatagain ‘,’ crookedjoebiden ‘,’ draintheswamp ‘,’ teamtrump2020 ‘,’ gop ‘,’ votered ‘,’ fourmoreyears’, ‘blacksfortrump’, ‘trumptrain’.
Starting from this first group of hashtags, the system has added others that it has learned to interpret by iterating the analysis process and recognizing the association with the starting ‘keywords’.
Filters were applied to ensure the validity of the sample and therefore the authenticity of the accounts: the language in which the tweets were written and the location of the user in the country called to vote, based on georeferencing or statements in the profile bio . Bots have also been identified and removed. More than two tweets per user were not taken into consideration, to prevent very prolific voters on the social media from distorting the final result.
STATES IN BALANCE – Thanks to the georeferencing of the tweets, it was also possible to formulate forecasts for some of the states in the balance: Biden should win in Georgia, North Carolina and Ohio.
News on University of Calabria web portal published on November 3, 2020: https://www2.unical.it/portale/portaltemplates/view/view.cfm?103993
News on Youtube: https://www.youtube.com/watch?v=XqDVzj0IiYs
News on University of Calabria web portal as PDF file USA2020_Unical_news