“Mutation” of Science Misinformation

For this project, I am doing a textual analysis of the constantly changing online popular media content.

I plan to utilize a code called “diffengine”, self-described as a “utility for watching RSS feeds to see when story content changes” (Github). As far as I understand, it works by tracking RSS feeds and comparing them with the corresponding data in the Internet Archive. Ii is set up to find content that was changed on specific websites, take a picture of the changes and post them on twitter pages. Though many of these accounts have recently been suspended by Twitter, the ones still active as of today (9/10/2018) are: The Guardian, The Washington Post, The Wall Street Journal, The White House Blog, and Breitbart.

The diffengine creators state on Github, that “the hope is that it can help draw attention to the way news is being shaped on the web”. Though some of these changes are innocent spelling or grammar errors, some are worthy of a deeper look as they sometimes change the entire tone or content of the news story, such as the ones below:

Also a part of my analysis are NewsDiffs, Diffengine’s precursor, which was created at the Knight Mozilla MIT hackathon in June 2012.

The current plan has two components:

  1. Analyze the the 13 in-depth examples on http://newsdiffs.org/examples/

  2. Take 100 of the DiffEngine images from the NYT, WSJ, WAPO, the White House Blog and Breitbart each (because those are the only DiffEngine twitter accounts still up) and extract themes (manually) based on the types of changes being made, excluding duplicates.

I am planning to tie these findings together with some previous research I have done on the changing media environment and the spread of misinformation. There is evidence that often news stories are published online with unverified information as a “placeholder”, until the real data can be found and (hopefully) fact-checked (Usher, 2014). This has lead to the spread of misinformation, since blogs and social media posts pick up information from news sources and never take the time to check back later to make sure the story is still the same. In this project I want to see what kind of changes are being made, by whom (referring to news organizations), when and then consider the impact these changes could have on the greater media news network. This study can also assist me in developing a coding scheme that I can later apply to machine learning in a large-scale content analysis.


Here is a brief analysis of some of the 13 NewsDiff stories, called “highlights” on their website:

1. Comparing: Yvonne Brill, a Pioneering Rocket Scientist, Dies at 88, newsdiff

This story went a bit viral when it first came out, due to the fact that a rocket scientist who happened to be female, was described in her obituary like so:


The NYTimes pretty quickly (5 hours later) removed the offending statement and put the “was a brilliant rocket scientist” statement first. This is an interesting case because the change of content was obvious – the original sparked outrage and that (most likely) forced the change to be made. It was also not a factual error, unverified information or exaggerated findings – it was a sexist comment that the author (who happens to be male) made and later had to remove.

2. Gunman Massacres 20 Children at School in Connecticut; 28 Dead, Including Killer (NYT), Change Log, newsdiff

Very different story here: this is very clearly a story that was updated as it developed. From the titles of the changed posts below, we see how the story went from vague “shooting reported” to more detailed “multiple fatalities reported” to more data driven “18” then “20 children killed”. On December 17th – the tone changed to be even harsher – “kills” became “massacres”.

This story embodies something I discussed in an earlier post about the non linearity of news stories. Before, you could trace the development of any one story through different newspaper clippings. Now with digitization of the news, any one article is no longer chronological. It does not represents “a snapshot in time”, but is instead a living, mutating organism (Fass & Main, 2014). Because the chronology of the news is disappearing, it is increasingly difficult to determine how a story unfolds. For example, here is a vital change made to the story, in red & crossed out what was removed, and green is what was added:


The first version implied that the principal had let the shooter into the school, before she herself was shot. That later was explained not to be accurate. This is problematic, as the false information was already out there and people do not continuously check back to online articles for updates. This, though intuitive, is actually interesting: news reports are not considered to be an ever-evolving space (in contrast to Twitter or 24 hour news shows). We expect what is published to stay static, but that is not the case. Why do they choose to edit rather than publish new content?

Speaking of updates, on December 14 the article was completely rewritten by two new authors. Why? It’s a mystery to me.

3. Comparing: Mubarak Rushed To Military Care After a Stroke, newsdiff


Similar issue with this story as it evolved – over the period of 2 days, 8 changes in headlines and content – literally a matter of life or death.


Overall, these case studies are adding up to some interesting themes of the different kinds of changes being made, and illustrating various problems with the current media environment.


Next, I will discuss some key developing themes that are coming out of both the newsdiffs and the diff engine analysis. All examples in this post are from the NYT diffengine account.

1. THEME: Developing Story

This the theme of non linearity: online news articles no longer adhere to chronological order. As new details are coming in, the article can be updated to be the most accurate, up to date information – providing both “new”, relevant and the most up to date content. Otherwise, they would have to produce a new “update” article every time new information came in, which would be a waste of time and resources. Some examples:

This is consistent with previous studies of newsrooms. For example – in “Making the News at the New York Times”  Nikki Usher describes how, in an effort to keep up with the speed of the news cycle, a journalist at the NYT continued to update an already published story online as new information and corrections came in from various sources, “rushing to file the next addition to his work with every new source that called” (Usher, 2014).

This is an interesting parallel, since the diffengine account I’m looking at IS the NYT, so in a way, I’m verifying her study with actual evidence of the changes she is describing.

2. THEME: Change in tone

As with the sexist obituary of Yvonne Brill, sometimes edits are made to change the tone of the content:

This seems to be changes in framing based on external pressures. The first example – it could be a response to NYC not wanting to scare away tourists, so they changed “popular tourist attraction” to “dangerous intersection”. The second – could have been a result of changing political pressures. I would need to do more background research on specific examples, perhaps, to actually be able to explain what happened in these cases, but just like removing the sexist comments from the Yvonne Brill obituary, these are re-framings of the story based on some external context or pressure.

3. THEME: Correcting Inaccuracy

The final big theme I am seeing is one we might expect – error correction. This theme is aligned with the traditional roles of journalists as fact-checkers (as well as producers and curators of information).

This can be explained by the shift towards digitization of media. The heightened speed of production that came along with the Internet age forces journalists to produce multiple stories a day and leaves minimal time for close reading of the research or fact checking their sources. This opened up the opportunity for error to be made. Thus, as facts are checked, changes are made live to content to make it accurate, but often changing the entire meaning of the text – very problematic as people do not continuously check to make sure the content they were exposed to is STILL accurate hours later.


Fass, J., & Main, A. (2014). Revealing the news: How online news changes without you noticing. Digital Journalism, 2(3), 366-382.

Usher, N. (2014) Making News at The New York Times. University of Michigan Press

Ania KorsunskaComment