Saturday, September 24, 2011

Protovis And Wine Visualization: California Crush Statistics

Radio station visualizations are fun and all, but I realized that I should research data visualization by looking at data I actually care about. That way, I can provide context and ask deeper questions about the subject matter at hand.

As an occasional wine writer, data about the wine industry seemed like a good start.

Harvest — "crush" in wine industry jargon — is afoot here in California, and that spurred me to search for data on previous harvests. The National Agricultural Statistics Service publishes a range of interesting data for wine geeks, some of which I've been using for experiments and explorations with Protovis.

The first public one shows harvest statistics over 20 years for the 15 grapes with highest crush numbers in California in 2010. The interactive version gives you a deeper view, with detailed per-year statistics as you mouse over, but here's a static version to give you an idea.


Groovy, eh?

Wine geeks will know many of this visualization's stories well. The California wine industry has grown tremendously over the last 20 years, thanks to increased consumption in the United States. Grape gluts are periodic, but 2005 was a particularly grape-heavy year. Industrial grapes such as French Colombard, Rubired, and Ruby Cabernet are mainstays of the bulk-wine industry led by Gallo. Pinot Noir tonnage surpassed Syrah tonnage in 2008, about 4 years — when vines start producing worthwhile fruit — after Sideways, the movie that told everyone about Pinot Noir. (Though I should note that I prefer the Pinots of Oregon and the Sonoma Coast to those of Santa Barbara, the setting for the movie. But, really, I prefer the Pinots of Burgundy to those from anywhere else.)

But some items in the data surprised me. Merlot, a common Bordeaux variety, went from almost nothing in 1991 to a dominant grape in 2010. Grenache, the popular, fruity darling of the Rhone Rangers, has actually seen lower crush values in the last 20 years. Pinot Gris has gone from a nonexistent grape in California to one of the top 15 in the state in just over a decade. Tonnage of French Colombard has gone down, which makes me wonder how the industrial market is doing overall.

But if you're reading this blog, you're probably more interested in the technical aspects of this data. I used Protovis, and I have repeatedly found that getting a basic visualization up and running with the library is very fast. Getting the fine details right, however, is much slower. It takes a lot of trial and error to get the language to do what you want. I might switch to D3, its successor, for my next projects. It supposedly gives finer control over your visualization.

What I also keep realizing is that visualizing some set of data isn't really an issue. Organizing the data is. I know this isn't news to anyone who works with data, but these projects are good reminders of how much work that can be.

I started with 20 separate spreadsheets from the NASS and wrote a Ruby script to extract out the bits of data I wanted and compile them into a JSON object I could serve to this chart's HTML page. But even then, the page's JavaScript has to do some processing as well to get the data in a format that Protovis can easily work with. The Underscore JavaScript library is a handy tool for doing data transformations.

But I also used that preprocessing to cache certain items such as the pretty-printed numbers, the colors to use for the different areas (which I calculated with the excellent 0to255.com) and other useful items.

No comments:

Post a Comment