The Spread and Mutation of Science Misinformation
This is a short summary of a research project now published here.
We live in an age when there is unprecedented access to information through the Internet. Though this provides the ability for the world to become more connected, educated and informed on global news, the downside is that information online constantly changes, and the pace and extent of this change is not something the general population is aware of. As information changes, facts are often distorted through oversimplification and exaggeration to attract readership, leading to the creation and spread of misinformation. In this project, I aimed to explore how this process takes place through analyzing the life cycle of a specific academic article.
In April 2014, the article was published in the Journal of Medicinal Chemistry Communications. It is titled "The synthesis and functional evaluation of a mitochondria-targeted hydrogen sulfide donor, (10-oxo-10-(4-(3-thioxo-3H-1,2-dithiol-5-yl) phenoxy)decyl) triphe nylphosphonium bromide (AP39)". The popular media coverage of this study will later know it as the farts cure cancer study, though the study had nothing to do with curing cancer or flatulence.
Farts Cure Cancer: Story Goes Viral
The press release about this study was published in July 2014 on the website of the University of Exeter, with a slightly more catchy title: "Rotten egg gas holds key to healthcare therapies". On July 11, 2014, Time Magazine was the first popular media publication that picked up the story. I was able to access the original Time Magazine article via Archive.org, also known as The Way Back Machine, a digital archive of the Internet. This article first introduced a unique and interesting interpretation of the scientific findings; the title of the article was, surprisingly, "Scientists say smelling farts might prevent cancer".
On July 18th, just one week after the original publication, the article was completely rewritten, changing the title to something more like what we’d expect from Time Magazine, “A stinky compound may protect against cell damage, study finds”, as well as adding a “correction” notice at the bottom of the page.
Once Time Magazine introduced these specific elements to the story (cancer and farts), neither of which is mentioned in the original paper or the press release, the rest of the popular media outlets picked up on the story and it went viral that week, as can be seen by the sheer amount and speed of publication of blog-style stories with a variety of funny pictures and puns all over the Internet.
Gathering the Data
In order to illustrate the spread of the story, I collected news articles published between July 1 and July 31, 2014 by searching keywords "farts", "cancer" and "study". What I really wanted to see is the immediate spread of the story online, without filtering by reliability or popularity of the website sources. Though the study was published in April, there was no media coverage until the press release in July. The first six pages of Google results were scraped from the date range July 1–31, 2014, excluding duplicates and stories not directly related to the topic (n = 48). Including the peer-review publication and the press release, total sample size was 50 articles. From each popular media article all external, working hyperlinks, article titles, URLs and dates of publication were extracted.
Below is the visualization I built with Gephi, an open-source network analysis software. In this visualization, the nodes represent individual articles, with their sources noted, and the edges (arrows) represent the spread of information via explicit hyperlink citation.
This network contains 71 nodes (total sources cited) and 123 edges (total hyperlink citations). The most important nodes were the peer review publication MedChemComm Journal (cited by 13 sources), the University of Exeter press release (cited by 24 sources) and Time (cited by 22 sources). Out of the 49 articles (excluding the peer-review publication), 37 (76%) were proliferating the misinformation in their titles that shows how the story spread across a variety of blog and news platforms during the month of July 2014.
Visualization and Analysis
After extracting the date and time of publication, publication name and the list of all hyperlinked citations mentioned in each of the articles, I created the network visualization of how the story spread using Gephi and then started to isolate how often the original publication was used as a source, as opposed to secondary information, especially the press release and the rewritten Time Magazine article.
Just from these visualizations, we can see that popular media publications over-reliance on secondary sources, as opposed to the original scientific publication. When adding the temporal parameters of these publications, we also find out that the references to the Time Magazine were made prior to the article being completely rewritten on July 18th. This is by far the most interesting and disturbing finding.
This means that the citations as they stand now in all those articles are actually referencing information that no longer exists, as the URL of the Time Magazine article remained the same, while the title and the entire contents of the publication were completely changed.
Conclusions and Outstanding Questions
By tracing this specific scientific study as it disseminated from journal to blogs, I was able to gain valuable insight into the inner workings of the media network through tracing hyperlink citations:
First, the popular media publications refer to each other as sources of information, creating a type of echo-chamber. This may be because of the need for transparency as a professional norm in journalism, but it can also signal a need to transfer responsibility and accountability of fact checking ones sources prior to publication, e.g. We are not saying this, the DailyMail said this and we are telling you what they said.
Another obvious theme is the lack of reliance on the original sources, possibly due to the de-professionalization of science journalism and overuse of amateur, freelance writers coupled with the need to publish high quantity with high speed, little emphasis on quality or depth. This needs more exploring in future research.
A final fascinating theme is the constantly changing online content. The fact that the initial spark of confusion in this pilot study, the Time Magazine article, was able to avoid any responsibility by simply appending a correction notice after the rewrite. This constantly changing online environment is not something the general public is aware of, but it is a common practice in the current media age. This complicates the network structure significantly, since, as mentioned prior, later citations of other popular media publications end up referencing information that no longer exists.
Fake news and misinformation is one of the most salient problems of our time. In the future, I hope to add to the growing literature on this subject by replicating this type of network analysis study on a greater scale. This would enable me to make conclusions about the general trends of how information spreads throughout the media, and continue to trace the life cycle of science information and, even more importantly, misinformation, back to its sources.