Tag Archives: text mining

Twitter Penguins Analysis

Twitter Analysis – Penguins Game 7

I’ve been listening to 93.7 The Fan while running the analysis for this, and I never realized that people can say the same thing over and over again but in slightly different ways. Also all tweets were captured AFTER THE CONCLUSION OF THE 1st PERIOD.

Everyone knows Twitter is the best venue to vent your anger about sports teams. I was able to the statistical programming language R to scrape tweets which had certain keywords or hashtags in them, put them in a database and then flag the tweets that contain certain keywords or collection of words. I had about 20 keywords including: “penguins”, “pens”, “rangers”, “game 7”, and “firebylsma”. I also search for any of the handles of the local hockey writers, because a lot of people will reply to the sports writers during the game with their own opinions.

The first graph has the total number of tweets that I scraped and tweets that I flagged as ‘swearing’. For the most part, I feel like if someone swore in the tweet, it indicates anger or at the very least aggressiveness. As mentioned before these graphs begin after the 1st period ends. And tweets containing the keyword ‘rangers’ has been filtered from this first graph to include a greater amount of Penguins fan.

The quickest and most basic analysis is the number of tweets as a time-series. As soon as you look at the time-series line graph, you can tell when the game ends [9:41 PM]. It’s like Mt. Everest in the graph. Looking closer you can see the spikes where each team scores. An interesting occurrence happened right before the end of the game. Twitter got quiet. The tweets per minute dropped below 500 right before it exploded to a few thousand per minute. I am attributing this silence to people actually watching the game during the tense last minute.

The tweets peaked about 2 minutes after the game ended indicating a minimal lag which includes the time of picking up the smart phone, unlocking it, and composing the tweet. However, the angry, swearing tweets peaked right as the game ended indicating more visceral emotions instead of more thought-out, 140-character commentary. There are two severe dips that I can’t account for at 9:49 PM and 10:04 PM. If anyone knows something that occurred at this time, please let me know. Since there is a clear downward trend and the game was no longer being played, I am going to write those dips off as some technical difficulties that didn’t allow a lot of tweets to be sent at those times.

Since Twitter isn’t an invention specific to just Penguins fans, I separated and compared two sets of tweets: one containing the word “penguins” and one comparing “rangers”. There are a lot more Rangers fans than Penguins fans, because the “rangers” tweets outnumber “penguins” tweets at almost every time. The “penguins” tweets did spike when they scored their only goal. Interestingly enough, there was a lot of swearing right when the goal was scored. Penguins fans are just so angry!

Bottom-line calm down; it’s just sports.