Is Fandango still inflating movie ratings?

Photo from Alex Litvin via Unsplash

Executive Summary

In 2015, Walt Hickey published a FiveThirtyEight article demonstrating that Fandango's movie ratings were biased towards higher-than-normal scores. Based on analysis of Fandango's ratings in 2016, there has not been a significant shift in how Fandango rates films.

A conversion chart between Fandango's score system and a less-biased 5-star scale is provided near the end of this document.

Introduction

In 2015, an article titled "Be Suspicious Of Online Movie Ratings, Especially Fandango's" was published on the data science website FiveThirtyEight by Walt Hickey. This article was an exposé of online movie ratings manipulation, focusing on the worst identified culprit, Fandango. Since the article was published, Fandango has maintained that errors in their rating algorithms have been corrected. This project seeks to evaluate if those changes have indeed been made.

Data

Data Description

I will work with two datasets in this project:

These datasets focus on "popular" movies. Popular is a non-specific metric, and it is possible that each of these datasets utilize different filters to identify popular movies. For the purpose of this analysis, I chose to utilize all entries in both datasets.

Data Cleaning

The primary goal of this project is to investigate if Fandango's rating system was significantly altered following the posting of Hickey's article. For this reason, the time of a movie's release is important.

Hickey's article was posted in October 2015. He utilized data collected on movies released that calendar year. Our second dataset contains release years, but not release months. To restrict our analysis of the second dataset to post article releases, I chose to use only movies that were released in 2016 and later.

Analysis

Reviewing Hickey's Results

In Hickey's article, he argued that Fandango, unlike other movie rating aggregators, has an incentive to keep movie ratings high. In addition to providing movie ratings, Fandango sells tickets. Poorly rated movies are likely to sell fewer tickets, so Fandango benefits from inflating ratings.

One of Hickey's key findings was that Fandango reported the actual aggregate rating of a movie in its HTML code. He observed that the ratings present in the HTML code (Raw Stars) were lower than the ratings present on the user-facing webpage (Displayed Stars). I reproduced these findings and present them visually in Figure 1.

Figure 1

Reanalysis of Walt Hickey's data demonstrating the impact of star rounding on Fandango's reported movie ratings

Comparing Fandango's Pre and Post Article Ratings

I set out to determine if Fandango has significantly altered its rating system since Hickey's article was published. I defined significantly as:

These metrics would demonstrate that Fandango has chosen to display accurate representations of some films that are collectively viewed as awful without forcing them to rate many films poorly.

The data presented in figure 2 suggest that although the mean score modestly decreased, the overall number of movies with 2 or fewer stars did not meet the definition of "significant" change, above.

Figure 2

Comparison of pre- and post- article star distribution indicates a modest mean score reduction, but no notable increase in movies with two stars or fewer.

Comparing Fandango's 2016 ratings to other aggregator sites

The data show that Fandango has not significantly changed its rating system since Hickey's article was published. Yet this finding alone does not show that Fandango's rating system is biased towards high scores. To demonstrate this bias, I compared Fandango's 2016 ratings to ratings scraped from other rating aggregators including:

The distribution of ratings presented on each of these rating aggregators are presented in figure 3. The other sites' metrics, which varied from 0 to 100, rather than 1-5, were normalized to the 1-5 scale used by Fandango for the sake of comparison.

 This analysis reveals a striking difference between the ratings distributions of both Fandango and IMDB and the Rotten Tomatoes "Audience" and "Critic" scores, as well as Metacritic's score distributions.

Figure 3

Comparison of different rating aggregators normalized to Fandango's scale.

Beyond the gross differences in score distribution, we can also identify some interesting trends in the other rating systems:

Audience and Metascore's central-heavy and tapering distribution would be most effective at identifying truly exceptional movies (good or bad). Tomatometer's approach appears to push ratings towards 'exceptional'(fresh) or 'awful'(rotten). The bimodal distribution indicates that it is more likely to label a movie as "good" or "bad" than "average".


Normalizing All Metrics to a 5 point scale

In the violin plot above, both Fandango and IMDB have distributions skewed towards higher scores. To force these two rating systems to occupy the entire 0-5 star range, we can renormalize their scores against 5 (Figure 4)

Figure 4

When renormalized, the Fandango and IMDB plots begin to resemble the 5-star scale normalized distributions of Audience, Tomatometer, and Metascore. Based on this graph, I believe we can create a reasonable conversion table between Fandango's displayed stars and a 5-star scale that is in line with other rating systems.


Conclusions

Fandango has a perverse incentive to provide inflated ratings to films. Higher rated films are more likely to sell tickets, which is how Fandango makes money. Although Walk Hickey published an article exposing Fandango's inflated ratings in 2015, this analysis of 2016 ratings data does not indicate that Fandango has significantly altered its rating system.

Rather than ignore Fandango's ratings altogether, I believe that a simple conversion from Fandango's rosy rating system to a true 5-point scale allows more objective interpretation of a movie's quality.