Adding to the basic text analytics I wrote about last week, I ran a bag-of-word sentiment analysis on CNN’s midterm election coverage on transcripts found on their site. Fortunately, all the transcripts have a time stamp on them denoting what hour of programming the transcript covers, so I was able to attach a time of day to all the transcripts to produce a visualization of CNN’s election coverage.
To capture what CNN was talking about, I wrote a Python script that found specific key words based on a frequency distribution produced from the entire transcript corpus. The idea being that if CNN didn’t talk about the topic, it wasn’t worth investigating. To consolidate some of the terms into concepts or topics, I created categories to group similar words together. Sometimes these were analogous or simply a plural form of the same word.
Republicans and the President were the biggest talking points of the night being mentioned more times than the Senate or Democrats. The number of times that a topic is mentioned doesn’t provide any clue to the context or demeanor of how CNN presented this topic, so a bag-of-words approach was used to score the sentiment of words surrounding these terms within the transcript. This process won’t give an exact interpretation for every instance, but it can get close. With enough term occurrences, the overall sentiment should rise above the error noise.
The first thing to notice is that there was no bad news about the Republicans. The sentiment analysis never found any hour of CNN’s broadcast that had more negative mentions of Republicans than positive. In contrast, through out the day, Democrats were not doing anywhere near as well on the newscast from morning to about early evening. Mentions of President Obama were rather volatile with strong negative and positive swings through out the day. Mitch McConnell got a rather big bump right when CNN projected his Senate race in his favor [at about 7PM EST]. The topic of Washington, predominately referring to the federal government, was the only topic that had a negative overall score for the entire day.
The graph above offers a direct comparison of the sentiment scores for the political categories for every hour of broadcast during the actual election returns after 6PM EST. [It also aggregates mentions across the different programs CNN runs on different channels, so there might be a little disagreement with the numbers if you are comparing charts.]
Overall, the sentiment analysis produces an interesting visual picture of how CNN handled the election. If other news networks had transcripts of entire shows readily available, I’d be able to compare the outlets looking for evidence of bias or slant. If this was applied over a longer time frame, it could present an interesting look into how a news story evolves shredding an objective light on how the news cycle works.