Blue Christmas: A data-driven search for the most depressing Christmas song

Christmas music can be a lot of things -- joyous, ironic, melancholy, cheerful, funny, and, in some cases, downright depressing. I personally realized this while watching an immensely sad scene in The Family Stone centered around Judy Garland singing ‘Have Yourself a Merry Little Christmas’ and haven’t yet fully recovered (or stopped noticing sad Christmas music). With this scene in mind, and without being able to think of any song that was more sad, I set out to use data to find the most depressing Christmas song. (Spoiler alert: I was wrong about Have Yourself A Merry Little Christmas.)

Data Collection

The data collection process broke down into three steps: choosing which songs to analyze, using Spotify to extract “musical” information about each song chosen, and using Genius (and Google) to collect the lyrics for each song.

Which Songs?

As it turns out, there are looooots of Christmas songs out there. (Just think of how many covers are released each year!). I was hoping for a Buzzfeed “Top 100 Christmas Songs of All Time” list but after checking FiveThirtyEight, Billboard, Spotify, and lots of Google searches, I wasn’t able to come up with anything satisfactory that included both title and artist (which I’d need to gather ‘musical’ track attributes). I settled for a Spotify’s ‘Christmas Classics’ playlist. This 60-song playlist contains many classics (“Silver Bells”, “Sleigh Ride”) as well as some modern classics (I’m looking at you, Mariah Carey). While it doesn’t include all songs, I think it does a good job of picking the most popular version of each song chosen and handily satisfies my “title and artist” requirement.

Gathering ‘Musical’ Data (from Spotify)

We can get an idea of musical sadness by using data from the Spotify API, which allows you to extract various musical attributes for a given track (like danceability, speechiness, and liveness). For this analysis, I focused mainly on two attributes: energy and valence.

Valence is defined by Spotify as: “A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).” We can use this to determine how sad a track sounds (independent of lyrics).

Energy (as defined by Spotify) also rates from 0 to 1 and represents “perceptual measure of intensity and activity”. Heavy metal would rate high on the energy scale and slow acoustic tracks would rate low.To gather this data, I used Charlie Thompson’s fantastic spotifyr package to interface with the Spotify API. This package can be installed and loaded via github like so:

[code lang="r"]devtools::install_github('charlie86/spotifyr')library(spotifyr)[/code]

Regardless of how you access it, the Spotify API requires that you set up a dev account (here) to create a client_id and client_secret. Save these as system variables (see below) and you’re ready to start gathering data!

[code lang="r"]Sys.setenv(SPOTIFY_CLIENT_ID = "your client id here")Sys.setenv(SPOTIFY_CLIENT_SECRET = "your client secret here")[/code]

The spotifyr package allows you to pull track info based on a given artist, playlist, or album. Since I couldn’t pull the data from Spotify’s playlist directly, I copied all of the songs into my own playlist (“christmas_classics_spotify”) for easy access.

The steps I took were:

1) Get all of my playlist names (since I have more than one) using get_user_playlists

2) Get tracks from each playlist using get_playlist_tracks

3) Filter all tracks to just the tracks from the christmas_classics_spotify playlist

4) Use the get_track_audio_features function to get the features for the songs I care about.Here’s what that code looks like:

[code lang="r"]library(dplyr)playlists <- get_user_playlists("1216605385")tracks <- get_playlist_tracks(playlists)xmas_tracks <- tracks %>%filter(playlist_name == "christmas_classics_spotify")track_features <- get_track_audio_features(xmas_tracks)[/code]

Gathering Lyrics

Originally, I planned to get pull all of the song lyrics via the Genius API and Josiah Parry’s in-progress geniusR functions. This was a pretty good plan but I quickly realized that not all of the songs I wanted lyrics for were actually available on Genius (some are fairly old); so, I used a combination of geniusR and good ol’ fashioned copying and pasting to get the lyrics for all of the songs in my playlist. (Note: copying and pasting is not as boring as it sounds if you’re also watching old episodes of Parks and Recreation.)

If you do want to use the Genius API to gather data, you’ll need to create an account with Genius to get an API access token. Similar to what we did with the Spotify API, you can save this as an environment variable:

[code lang="r"]Sys.setenv(genius_token = "your access token here")[/code]

Since geniusR isn’t a fully instrumented package, the best way to use its functions is to clone the geniusR repo and run each script, or copy and paste each one into your own script. Most of these functions are helper functions for the genius_lyrics function, which is the only one you’ll need. This function takes artist, song as arguments (like below):

[code lang="r"]jingle_bell_rock <- genius_lyrics(artist = "Bobby Helms", song = "Jingle Bell Rock")[/code]

You can loop through this function as needed to get any lyrics you’d like to analyze. Once you’ve collected all of your lyrics, you’re ready to move on to analysis!

Data Analysis: Quantifying Sadness

A song is made up of music and lyrics, and we’ll use both to create a Downer Score (a measure of a song’s sadness).

Musical Sadness

Earlier I mentioned a couple of useful features from the Spotify data we can use to quantify sadness -- energy and valence. While these measures are useful individually, combining them gives us a better picture of how depressing a song might be. For example, a song that is high-valence but low-energy would definitely be happy, but might be considered more ‘peaceful’ or ‘calm’ than ‘joyous’. Likewise, a song that is low-valence but high-energy might be considered more ‘angry’ or ‘turbulent’ than ‘sad’. The most depressing songs will be both low-valence and low-energy (think Eeyore!). If we plot valence against energy, the sad songs will be the ones closest to the point lowest-valence, lowest-energy (0, 0):

plot_ly

(Note: you can interact with the plot by clicking on it.)

To quantify the musical sadness of each song, we calculate that song’s distance (in terms of valence and energy) from the point (0, 0) -- the lower this distance is, the sadder the song.

Based on musical sadness, the most depressing Christmas songs are:

O Christmas Tree was definitely not one of my guesses for “most depressing Christmas song” (although a listen-through of this version might convince me otherwise), but fear not, we still have to take a look at the emotions conveyed in the lyrics...

Lyrical Sadness

To analyze lyrical sadness, I used Julia Silge and Dave Robinson’s tidytext package to perform sentiment analysis on each song. tidytext comes complete with a tokenizer (to break down long blocks of texts into their individual words for analysis), a list of stop words (common words like “a”, “an”, and “the” which don’t carry much meaning and are therefore removed), and several sentiment catalogs we can use to analyze feeling or emotion attached to a given word.

Let’s get to it! After loading up the tidytext package, I created a list of sad words and a list of joy words from the NRC emotion lexicon.

[code lang="r"]library(tidytext)sad_words %>%filter(lexicon == "nrc", sentiment == 'sadness') %>%select(word) %>%mutate(sad = T)joy_words %>%filter(lexicon == "nrc", sentiment == 'joy') %>%select(word) %>%mutate(joy = T)[/code] 

Next I removed stop words and left-joined the sad and joy word lists into my set of lyrics to calculate the percent of sad words and the percent of joy words that appeared in each song.

[code lang="r"]with_sentiment %anti_join(stop_words) %>%left_join(sad_words) %>%left_join(joy_words) %>%summarise(pct_sad = round(sum(sad, na.rm = T) / n(), 4),pct_joy = round(sum(joy, na.rm = T) / n(), 4),sad_minus_joy = pct_sad - pct_joy)[/code]

You might have noticed that in the last line of code above, I subtracted the percent of joy words from the percent of sad words. Originally, I only looked at the percent of sad words, but I noticed that even happy songs (like Joy to the World) do have some sad words, while other songs had zero sad words. To account for this, I subtracted the percent of sad words from the percent of joy words. (In fact, I thought it was interesting that only two songs have a higher percent of sad words than joy words -- Blue Christmas and You’re a Mean One, Mr. Grinch.)

Based on lyrical sadness, the most depressing Christmas songs are:

The Downer Index

In order to crown the most depressing Christmas song, we’ll have to combine the metrics for lyrical sadness (pct sadwords) and musical sadness (distance). I’ve created a metric, the Downer Index, which does just that:

A downer index near 1 is a happier song and a downer index near 0 is a more depressing song. This index weights the musical and lyrical elements of the song equally, and both are on a (0, 1) scale such that a higher score represents a happier quantity. This metric (and blog post) is inspired by Charlie Thompson's gloom index (and the accompanying blog article, which I highly recommend reading for a look into the sad songs of Radiohead).

Based on the data, the most depressing Christmas song is... Blue Christmas!

This doleful tune about unrequited love certainly delivers lyrically (Blue Christmas was the most lyrically sad song), which contributed highly to its ranking; it came in 28th overall for musical sadness with a score of .64. I also learned that Blue Christmas was not an Elvis original, though his is by far the most popular cover (thanks Wikipedia!).

And without further ado, here are the top ten most depressing Christmas songs:

While this approach isn't perfect, I'm pretty happy with the results (except that my horse wasn't even in the top ten!) and think the data does a fairly good job of capturing both the musical and lyrical sadness in the songs I analyzed.

Bonus: Christmas Song Superlatives

If you're a "glass half full" kind of person, you might also be interested in some of the happier Christmas songs, which I also dug up while performing this analysis:

This data was a lot of fun to play with and I only scratched the surface on types of analyses you could do with it. If anyone is interested, I'm happy to share it.

Merry Christmas!  

Previous
Previous

Reading List: 2017 Edition (and some thoughts on resolutions)

Next
Next

Data Meta-Metrics