Blue Christmas: A data-driven search for the most depressing Christmas song

Christmas music can be a lot of things — joyous, ironic, melancholy, cheerful, funny, and, in some cases, downright depressing. I personally realized this while watching an immensely sad scene in The Family Stone centered around Judy Garland singing ‘Have Yourself a Merry Little Christmas’ and haven’t yet fully recovered (or stopped noticing sad Christmas music). With this scene in mind, and without being able to think of any song that was more sad, I set out to use data to find the most depressing Christmas song. (Spoiler alert: I was wrong about Have Yourself A Merry Little Christmas.)

Data Collection

The data collection process broke down into three steps: choosing which songs to analyze, using Spotify to extract “musical” information about each song chosen, and using Genius (and Google) to collect the lyrics for each song.

Which Songs?

As it turns out, there are looooots of Christmas songs out there. (Just think of how many covers are released each year!). I was hoping for a Buzzfeed “Top 100 Christmas Songs of All Time” list but after checking FiveThirtyEight, Billboard, Spotify, and lots of Google searches, I wasn’t able to come up with anything satisfactory that included both title and artist (which I’d need to gather ‘musical’ track attributes). I settled for a Spotify’s ‘Christmas Classics’ playlist. This 60-song playlist contains many classics (“Silver Bells”, “Sleigh Ride”) as well as some modern classics (I’m looking at you, Mariah Carey). While it doesn’t include all songs, I think it does a good job of picking the most popular version of each song chosen and handily satisfies my “title and artist” requirement.

Gathering ‘Musical’ Data (from Spotify)

We can get an idea of musical sadness by using data from the Spotify API, which allows you to extract various musical attributes for a given track (like danceability, speechiness, and liveness). For this analysis, I focused mainly on two attributes: energy and valence.

Valence is defined by Spotify as: “A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).” We can use this to determine how sad a track sounds (independent of lyrics).

Energy (as defined by Spotify) also rates from 0 to 1 and represents “perceptual measure of intensity and activity”. Heavy metal would rate high on the energy scale and slow acoustic tracks would rate low.

To gather this data, I used Charlie Thompson’s fantastic spotifyr package to interface with the Spotify API. This package can be installed and loaded via github like so:


Regardless of how you access it, the Spotify API requires that you set up a dev account (here) to create a client_id and client_secret. Save these as system variables (see below) and you’re ready to start gathering data!

Sys.setenv(SPOTIFY_CLIENT_ID = "your client id here")
Sys.setenv(SPOTIFY_CLIENT_SECRET = "your client secret here")

The spotifyr package allows you to pull track info based on a given artist, playlist, or album. Since I couldn’t pull the data from Spotify’s playlist directly, I copied all of the songs into my own playlist (“christmas_classics_spotify”) for easy access.

The steps I took were:
1) Get all of my playlist names (since I have more than one) using get_user_playlists
2) Get tracks from each playlist using get_playlist_tracks
3) Filter all tracks to just the tracks from the christmas_classics_spotify playlist
4) Use the get_track_audio_features function to get the features for the songs I care about.

Here’s what that code looks like:


playlists <- get_user_playlists("1216605385")
tracks <- get_playlist_tracks(playlists)
xmas_tracks <- tracks %>%
    filter(playlist_name == "christmas_classics_spotify")

track_features <- get_track_audio_features(xmas_tracks)

Gathering Lyrics

Originally, I planned to get pull all of the song lyrics via the Genius API and Josiah Parry’s in-progress geniusR functions. This was a pretty good plan but I quickly realized that not all of the songs I wanted lyrics for were actually available on Genius (some are fairly old); so, I used a combination of geniusR and good ol’ fashioned copying and pasting to get the lyrics for all of the songs in my playlist. (Note: copying and pasting is not as boring as it sounds if you’re also watching old episodes of Parks and Recreation.)

If you do want to use the Genius API to gather data, you’ll need to create an account with Genius to get an API access token. Similar to what we did with the Spotify API, you can save this as an environment variable:

Sys.setenv(genius_token = "your access token here")

Since geniusR isn’t a fully instrumented package, the best way to use its functions is to clone the geniusR repo and run each script, or copy and paste each one into your own script. Most of these functions are helper functions for the genius_lyrics function, which is the only one you’ll need. This function takes artist, song as arguments (like below):

jingle_bell_rock <- genius_lyrics(artist = "Bobby Helms", song = "Jingle Bell Rock")

You can loop through this function as needed to get any lyrics you’d like to analyze. Once you’ve collected all of your lyrics, you’re ready to move on to analysis!
Data Analysis: Quantifying Sadness

A song is made up of music and lyrics, and we’ll use both to create a Downer Score (a measure of a song’s sadness).

Musical Sadness

Earlier I mentioned a couple of useful features from the Spotify data we can use to quantify sadness — energy and valence. While these measures are useful individually, combining them gives us a better picture of how depressing a song might be. For example, a song that is high-valence but low-energy would definitely be happy, but might be considered more ‘peaceful’ or ‘calm’ than ‘joyous’. Likewise, a song that is low-valence but high-energy might be considered more ‘angry’ or ‘turbulent’ than ‘sad’. The most depressing songs will be both low-valence and low-energy (think Eeyore!). If we plot valence against energy, the sad songs will be the ones closest to the point lowest-valence, lowest-energy (0, 0):

(Note: you can interact with the plot by clicking on it.)
To quantify the musical sadness of each song, we calculate that song’s distance (in terms of valence and energy) from the point (0, 0) — the lower this distance is, the sadder the song.

Based on musical sadness, the most depressing Christmas songs are:

Screen Shot 2017-12-21 at 10.54.42 PM

O Christmas Tree was definitely not one of my guesses for “most depressing Christmas song” (although a listen-through of this version might convince me otherwise), but fear not, we still have to take a look at the emotions conveyed in the lyrics…

Lyrical Sadness

To analyze lyrical sadness, I used Julia Silge and Dave Robinson’s tidytext package to perform sentiment analysis on each song. tidytext comes complete with a tokenizer (to break down long blocks of texts into their individual words for analysis), a list of stop words (common words like “a”, “an”, and “the” which don’t carry much meaning and are therefore removed), and several sentiment catalogs we can use to analyze feeling or emotion attached to a given word.

Let’s get to it! After loading up the tidytext package, I created a list of sad words and a list of joy words from the NRC emotion lexicon.


sad_words %>%
    filter(lexicon == "nrc", sentiment == 'sadness') %>%
    select(word) %>%
    mutate(sad = T)

joy_words %>%
    filter(lexicon == "nrc", sentiment == 'joy') %>%
    select(word) %>%
    mutate(joy = T)


Next I removed stop words and left-joined the sad and joy word lists into my set of lyrics to calculate the percent of sad words and the percent of joy words that appeared in each song.

with_sentiment %
    anti_join(stop_words) %>%
    left_join(sad_words) %>%
    left_join(joy_words) %>%
    summarise(pct_sad = round(sum(sad, na.rm = T) / n(), 4),
    pct_joy = round(sum(joy, na.rm = T) / n(), 4),
    sad_minus_joy = pct_sad - pct_joy)

You might have noticed that in the last line of code above, I subtracted the percent of joy words from the percent of sad words. Originally, I only looked at the percent of sad words, but I noticed that even happy songs (like Joy to the World) do have some sad words, while other songs had zero sad words. To account for this, I subtracted the percent of sad words from the percent of joy words. (In fact, I thought it was interesting that only two songs have a higher percent of sad words than joy words — Blue Christmas and You’re a Mean One, Mr. Grinch.)

Based on lyrical sadness, the most depressing Christmas songs are:

Screen Shot 2017-12-21 at 11.39.37 PM

The Downer Index

In order to crown the most depressing Christmas song, we’ll have to combine the metrics for lyrical sadness (pct sadwords) and musical sadness (distance). I’ve created a metric, the Downer Index, which does just that:


A downer index near 1 is a happier song and a downer index near 0 is a more depressing song. This index weights the musical and lyrical elements of the song equally, and both are on a (0, 1) scale such that a higher score represents a happier quantity. This metric (and blog post) is inspired by Charlie Thompson’s gloom index (and the accompanying blog article, which I highly recommend reading for a look into the sad songs of Radiohead).

Based on the data, the most depressing Christmas song is… Blue Christmas!

This doleful tune about unrequited love certainly delivers lyrically (Blue Christmas was the most lyrically sad song), which contributed highly to its ranking; it came in 28th overall for musical sadness with a score of .64. I also learned that Blue Christmas was not an Elvis original, though his is by far the most popular cover (thanks Wikipedia!).

And without further ado, here are the top ten most depressing Christmas songs:

Screen Shot 2017-12-22 at 9.35.17 AM

While this approach isn’t perfect, I’m pretty happy with the results (except that my horse wasn’t even in the top ten!) and think the data does a fairly good job of capturing both the musical and lyrical sadness in the songs I analyzed.

Bonus: Christmas Song Superlatives

If you’re a “glass half full” kind of person, you might also be interested in some of the happier Christmas songs, which I also dug up while performing this analysis:

This data was a lot of fun to play with and I only scratched the surface on types of analyses you could do with it. If anyone is interested, I’m happy to share it.

Merry Christmas!



Data Meta-Metrics

Sometimes I work with great data: I know how and when it’s collected, it lives in a familiar database, and represents exactly what I expect it represent. Other times, I’ve had to work with less-than-stellar data — the kind of data that comes with an “oral history” and lots of caveats and exceptions when it comes to using it in practice.

When stakeholders ask data questions, they don’t know which type of data — great, or less-than-stellar — is available to answer them. When the data available falls into the latter camp, there is an additional responsibility on the analyst to use the data appropriately, and to communicate honestly. I can be very confident about the methodologies I’m using to analyze data, but if there are issues with the underlying dataset, I might not be so confident in the results of an analysis, or my ability to repeat the analysis. Ideally, we should be passing this information — our confidences and our doubts — on to stakeholders alongside any results or reports we share.

So, how do we communicate confidences and doubts about data to a non-technical audience (in a way that is efficient and easily interpretable)? Lately I’ve been experimenting with embedding a “state of the data” in presentations through red, yellow, and green data meta-metrics.


Recently my team wanted to know whether a new product feature was increasing sales. We thought of multiple ways to explore whether the new feature was having impact, including whether emails mentioning the new feature had higher engagement, and using trade show data to see whether there was more interest in the product after the feature was released. Before starting the analysis, we decided that we’d like this analysis to be repeatable — that is, we’d like to be able to refresh the results as needed to see the long-term impact of the feature on product sales.

Sounds easy, right? Collect data, write some code, and build a reproducible analysis. I thought so too, until I started talking to various stakeholders in 5+ different teams about the data they had available.

I found the data we wanted in a variety of states — anywhere from “lives in a familiar database and easy to explore” to “Anna* needs to download a report with very specific filters from a proprietary system and give you the data” to “Call Matt* and see if he remembers”. Eventually I was able to get some good (and not-so-good) data together and build out the necessary analyses.

While compiling all of the data and accompanying analyses together for a presentation, I realized that I needed some way to communicate what I had found along the way: not all of the data was equally relevant to the questions we were asking of it, not all of the data was trustworthy, and not all of the analysis was neatly reproducible.

The data meta-metrics rating system below is what I’ve used to convey the quality of the data and its collection process to technical and non-technical members of my team. It’s based on three components: relevance, trustworthiness, and repeatability. The slide below outlines the criteria I used for each score (green, yellow, red) in each category.

Screen Shot 2017-11-13 at 9.00.49 PM

Within the presentation, I added these scores to the bottom of every slide. In the below example, the data we had definitely answered the question we were asking of it (it was relevant), and I trusted the source and data collection mechanism, but the analysis wasn’t fully reproducible — in this case, I needed to manually run a report and export a text file before being able to use it as an input in an automated analysis. Overall, this data is pretty good and I think the rating system reflects that. The improvement that would take this data to green-green-green would be pretty simple — just writing the email data to a more easily accessible database, which becomes a roadmap item if we feel this report is valuable enough that we’ll want to repeat it.

Screen Shot 2017-11-13 at 9.24.12 PM

Below is an example of a not-so-great data process. Trade shows are inherently pretty chaotic, and our data reflects that. It’s hard to tell what specifically makes a trade show attendee interested in a product, and tracking that journey in real-time is much harder without records of interactions like demos, phone calls, etc.. This becomes another road map item; if we want to dig deeper into trade show data and use it to guide product decisions, we need to implement better ways of collecting and storing that data.

Screen Shot 2017-11-13 at 9.18.37 PM

Overall, this exercise was helpful for diagnosing the strengths and weaknesses of our data storage and collection across multiple teams. Providing this data in an easy-to-understand format allowed us to have informative conversations about the state of our data and what we could do to improve it. Getting the rest of the team involved in the data improvement process also helps my understanding of what data we do and don’t have, what we can and can’t collect, and makes my analyses more relevant to their needs.

The meta-metrics I used here are the ones we specifically cared about for this type of analysis; I could certainly see use cases where we might swap out or add another data meta-metric. If you’ve worked on conveying “the state of the data” or data meta-metrics to your team, I’d love to hear more about your process and the meta-metrics you’ve used in the comments.

*  Names have been changed to protect the innocent.