Belgian elections, June 13, 2010 - Twitter opinion mining
In the week before the Belgian 2010 elections, we analyzed approximately 7,600 tweets that mentioned the name of a Belgian politician. What makes this experiment interesting is the fact that Belgium is divided in a Dutch-speaking half (Flanders, 60% of the population) and a French-speaking half (Wallonia, 40% of the population). Flemings can only vote for Flemish politicians, Walloons can only vote for Walloon politicians.
A follow-up to the experiment is politiekebarometer.be, which tracks the 2012 Belgian local elections.
To set up the experiment we used Pattern:
from pattern.web import Twitter, plaintext from pattern.db import Datasheet from pattern.nl import sentiment as sentiment_nl from pattern.fr import sentiment as sentiment_fr csv = Datasheet() for politician, party in (("bart de wever", "NV-A"), ("elio di rupo", "PS")): for tweet in Twitter().search(politician): if tweet.language in ("nl", "fr"): s = plaintext(tweet.description) if tweet.language == "nl": w = sentiment_nl(s) if tweet.language == "fr": w = sentiment_fr(s) csv.append([politician, party, tweet.date, s, w])
The resulting Datasheet (i.e., Excel-like table) was updated daily and visualized using NodeBox.
The sentiment() functions rate Dutch and French texts for their subjective tone. Take the following tweet, chosen for its obvious (positive) sentiment: "Danny Pieters, sterke speech voor een gedurfde en degelijke sociale bescherming." Translated to English, the individual words would have the following scores:
For research purposes, the old project source code is available here.
Fig. 1-4 shows an overview of our readings with Twitter data from May 26 to June 7 2010. Liberal political factions (VLD, MR) are marked in blue, center right wing (NV-A, CD&V) in deep yellow + orange, right-wing (VB) in bright yellow, social democrats in red (PS, SP-A, CDH), green left-wing in green. Each bar has a splitter that indicates the frequency of Dutch tweets (on the left) vs. French tweets (on the right).
There is an apparent tendency for the Flemish part of Belgium to discuss Flemish politicians and for the Walloon part to discuss Walloon politicians. The darker bar indicates negative tweets, for which we used a custom corpus of 800 adjectives. However, this corpus performed poorly and we later replaced it with SentiWordNet (see below). In hindsight, the code examples included here now use Pattern's built-in sentiment analysis (for English, Dutch and French it is about 75% accurate).
Fig.1 - Twitter mentions on June 7.
Fig. 2 - Twitter mentions on June 8.
Fig. 3 - Timeline of Dutch tweets, May 26 to June 7.
Fig. 4 - Timeline of French tweets, May 26 to June 7.
Figure 3 and 4 show the results per day. To do this, we need to "bin" the tweets of a politician per day (or per week, month, year) and calculate the average sentiment of that day:
from pattern.db import Datasheet, date, avg from collections import defaultdict bins = defaultdict(lambda: defaultdict(list)) for politician, party, date, score in Datasheet.load("data.csv"): d = date(row) d = (d.year, d.month, d.day) bins[politician][d].append(float(score)) for politician in bins: for day in politician: bins[politician][day] = avg(bins[politician][day])
On June 10 we plugged in SentiWordNet for sentiment analysis, replacing a number of heuristic, proof-of-concept scripts. The result was instant and striking: all of the mentions where roughly divided 50-50 between positive tweets and negative tweets, as shown in Fig. 5. Flemish-speaking Belgians were inclined to report more negatively on French-speaking Walloon politicians, and vice versa.
Moreover, another striking result is visible: Bart De Wever, leader of the center-right nationalist NV-A got massive feedback when compared to other politicians. If we are to give any value to the Twitter buzz, we could at this point assume that his party would have a large electorial victory (which it did) and that opinions concerning this victory would vary greatly (which they did).
Fig. 5 - Twitter mentions on June 10.
Fig. 6 - Timeline of tweets, May 26 to June 10.
Fig. 7 - Timeline of tweets, split positive and negative, May 26 to June 11.
As of the 2010 federal elections the NV-A gained a plurality in the Flemish region of Belgium, with 28% of the votes in Flanders and 17% of the national vote, becoming the largest party in both Flanders and Belgium altogether. This was the first time in which a non-traditional political party dominated the outcome of a Belgian election. – excerpt from Wikipedia
The second largest party was the PS, led by Elio Di Rupo. It became the largest party in Wallonia and held 14% of the national vote. NV-A and PS subsequently became the key players in the coalition formation.
While the NV-A ultimately seeks secession of Flanders from Belgium, the PS is inclined towards state interventionism. After various media clashes and attempts to form a coalition, Belgium still did not have an official federal government 500 days after the elections.