Saturday, September 3, 2011

Radio Station Playlist Data Visualization, Part 2

As soon as I did my visualization of 99.7's music selection for a week, I asked the obvious next question: How does 99.7 compare to other "adult contemporary" radio stations?

There's an interactive version that lets you drill down into the graph, but here's a screenshot.

People listen to radio stations for all sorts of reasons, of course, so I don't know that anyone actually cares about this. But it did give me a chance to look at Protovis and compare it to Processing as I learn about data visualization toolkits.

Gathering Data
When I gathered data for my first visualization, I wrote a simple script that grabbed songs from the 99.7 website. I set that up as a cron job on an EC2 instance and let it go.

I did the same thing for the other four radio stations I decided to look at. 97.3 uses the same website tech as 99.7, and KFOG and KBAY share a different website tech, so those got me two stations for the price of one. 101.3 uses yet another system. Once I had my scripts running, I just had to wait until I had the same week's worth of data from all stations. A bit of cleanup on the data, a quick change to JSON from comma-separated values, and I was ready to go.

I decided to use the concept of small multiples to provide a quick comparison between stations, but then showing an enlarged version for deeper exploration. Each small graph in the chart represents one station across the same span of time.

Protovis Vs. Processing
It took me some time to learn Protovis. I feel that only now, after finishing one visualization, do I really have a grasp on how it works. It seeks to be a declarative language, which means that you define the result and let the under-the-hood bits figure out how to get you there, but I found myself struggling against the lack of control.

Processing gives you that control. You have vast amounts of control, but that's because it starts you with a blank slate. You can probably do anything you want, but the flip side is that you have to do everything you want.

But Processing comes with a strong disadvantage: It creates Java applets. Remember those? I barely do, and I was actually writing Java when that's all people did with it. An applet takes a long time to load in a world where website visitors are accustomed to instant gratification from your page. An applet also won't work on your iOS device. So my first visualization was completely unusable by iPad owners.

(Yes, there is Processing.js, but my attempts to use it only frustrated me. It didn't support Java generics, and even when I removed them from my code, it failed with cryptic errors that were impossible to debug.)

As with so many things, deciding on a visualization toolkit means figuring out what's best for your job. If you're doing something complex and custom, you'll probably want Processing. But for a lot of web-based visualizations, I think Protovis will give you what you need once you figure out how to use it. It can certainly do a lot in that space.

I have still more visualizations in mind for this same set of data, and I'm planning on starting with Protovis (or its successor, d3). The Java applet problems are too big.

No comments:

Post a Comment