Sunday, July 24, 2011

Visualizing A Week Of 99.7

When we're driving, Melissa and I often listen to Bay Area radio station 99.7 FM. It specializes in dance-focused pop, club, and hip-hop songs. In other words: boppy, brainless music.

But at any given point in time, it feels like they just replay the hits du jour. How much diversity do they really have? I wanted to know.

The answer? Not much. A mere nine songs made up half the station's rotation in the week I measured. During that week, about half the songs you would have heard would have been one of those nine songs.

I also made an interactive version that requires a Java-enabled browser. It shows the song titles as you hover over them.

Here are the top 9 songs:

Song TitleArtistTimes Played
The Edge of GloryLady Gaga100
I Wanna GoBritney Spears75
Till The World EndsBritney Spears69
How To LoveLil Wayne59
Stereo LoveGym Class Heroes54
Rolling In The DeepAdele53
Written In The StarsTinie Tempah52
The Lazy SongBruno Mars46
Cheers (Drink to That)Rihanna39

To gather the data for this chart, I used the station's published playlist, which lists 25 songs at a time. I set up a micro-instance on's EC2 to run a Ruby script every 45 minutes that fetched that playlist page, extracted the information I wanted (via regexes and the excellent Nokogiri library) and appended it to a file.

Then I set up a node.js server* that returned a cleaned-up version of the raw playlist data I amassed. It removed duplicates caused not only by my data fetching script, whose 45-minute interval meant that there was always some degree of overlap, but also by a peculiarity in the data. Remixes on the site get two or more entries with the same timestamp, one for the original song and one for the remix title, and I collapsed those down into the original song. A remix of Britney Spears' "Till the World Ends" might be different in some ways than the original, but to me it counts as playing the same song. I've published the final dataset I used for this chart.

Along the way, I discovered two lacunae in the website's playlist — there are probably more — which affect the numbers a bit. As Nathan Yau says in his new book Visualize This, "Just because it’s data doesn’t make it fact."

Katy Perry's "Last Friday Night" and Pitbull's "Give Me Everything" got plenty of rotation on the station during this week but never showed up in the published playlist. I checked my raw data, the cleaned-up form, and did spot checks on the site whenever I heard one of those songs. They're just skipped. I assume this is some discrepancy in the database, since other songs from the same Katy Perry album show up in the list.

I don't know that it would change the numbers very much. There might be a more gradual drop-off from "Edge of Glory" to "I Wanna Go," but, if anything, the halfway mark would be closer in. The gap doesn't change the premise: 99.7 replays a lot of the same music.

For visualizing the data, I put it into a couple of tools -- a custom tool I'm writing as well as R -- but ultimately decided on Processing, the big gun in any data visualizer's arsenal. Processing is a full programming language aimed specifically at making digital images, with an emphasis on visualization. I could both fully churn through and munge the semi-raw data and quickly visualize it, all with the same tool. And since Processing is basically Java with some handy utility methods, I'm already very comfortable with the language.

Inspired by Yau's book, which encourages a storytelling mindset, I decided to add visual cues and callouts for "points of interest" to my graphic: the most popular song, the cutoff line for the songs that made up fifty percent of the total, and the cutoff line for the the songs that made up eighty percent of the total.

Because Processing is a programming language, I drove everything off the data itself. While I obviously had to program the callouts I wanted on the chart, I don't have a line that says, "Draw 'Edge of Glory, Lady Gaga' at these coordinates." Instead I have a line that says, "draw the name of the song that got played the most next to the leftmost bar." I used the same mindset for all the callouts. Change the dataset, and the callouts change with it.

Once you have all this shiny, pretty data, you start looking at other ways to explore it. For instance, what's the average number of songs played in each hour? A little bit of modification to my Processing program, and I had a new chart ready to go from the same data.

Then Melissa and I wondered if Bruno Mars' "The Lazy Song" gets played a lot more on weekends, since it's, well, about being lazy and deciding to do nothing with your day. Not really. As a rule, expect to hear it six to eight times a day at the moment, and not more on weekends.

I have more ideas for this, but they're going to take a bit more data collection, so stay tuned for more in the coming weeks.

*There was no need to use node.js here. It just gave me a chance to play with it for something deeper than the "Hello, World!" example.