Sunday, November 18, 2012

Scripting Campfire, Again

A little over two years ago, I wrote a post about scripting Campfire, the group chat tool from 37signals. At the time, my script posted a routine "today's date is" message with a variety of statistics. Over time, the statistics have disappeared — though I now post charts from Graphite — but the bot has been tirelessly plopping the date into each room each weekday (and now, in crunch time, each day).

Then I watched a video that mentioned the Campfire interface to HUBOT, github's little slave server that handles all sorts of tasks. What if we could type a command into Campfire and have it actually do something?

But what? A first use case quickly suggested itself.

One of our Campfire rooms is devoted to server issues, and my boss and I often, while chatting in there, make a comment such as "todo: update deployment instructions."

How often do you think we actually remember those todos? Did you guess "at least rarely"? You may have overshot.

Some after-hours refactoring of the Ruby scripts I originally wrote plus a bit of tinkering with a new script, and I had a very simple command parser. Now, when you type "@SimCityBot todo blah blah blah," you'll get an email saying something like "You wanted to be reminded to blah blah blah." That doesn't guarantee the task will get done, of course, but it does make it  less ephemeral.

The script is pretty straightforward: it polls each room looking for messages that start with "@SimCityBot," and then invokes a method with the same name as the first word in the text. That means adding a new command is now a simple matter of adding a single method to a file. Yay for Ruby's metaprogramming support! The script also maintains a YAML file that keeps track of the most recent messages in each room. This ensures that when the script is restarted, it doesn't respond a second time to every command it sees.

I had to add support to our Campfire library for uploading images in order to post our "slowest calls" graphs each day. Once that work was done, adding an "image" command was a single call. Give it a URL, and the the script downloads that image and re-uploads it to whatever room the command appeared in.

Next up is a command to kick off Hudson builds. For that one, of course, I'll want to spin off a process that can monitor the build and report back when it's done. A co-worker suggested listing Emeryville food trucks. (Which I maintain as a Twitter list.)

There's lots of things that the script isn't good at. Handling a command blocks until it's done. Error reporting is minimal. It doesn't support multi-word commands. But it's a little trinket I can poke at and have fun with.

Is this the most important thing I could be doing for SimCity? Let's hope not. But at the end of a long day, sometimes I need a break, and my breaks from programming are … other programming projects! SimCityBot provides a refreshing distraction that often buoys my mood and gives me a nice close to the day. That's also part of why I've not just installed HUBOT. There's less fun in that.

But the catalog of all HUBOT scripts is an inspiring read. AWS status checks? Graphite graphs? The latest XKCD? When is break time again?

Sunday, November 11, 2012

Scala and Java: A Simple Application

My boss recently asked me how I'd build out SimCity's online systems today if I knew everything three years ago that I know now. I didn't hesitate. "I'd take a good, long look at Scala and, by extension, Lift."

Scala has a lot that appeals to the me of today. I like its hand-in-hand support for functional programming and emphasis on immutable objects. I've gotten used to both concepts with Erlang, and I've come to appreciate that programming paradigm for building robust, scalable systems. But Scala also offers imperative syntax and mutable objects if you need them.

Scala has native support for the actor model abstraction of concurrency, which I first encountered with Erlang (Scala's syntax is openly lifted from that language's). The actor model makes it much easier to manage and reason about concurrent code, and Scala supports two major implementations: actors tied to particular threads or event-based actors thrown onto whatever thread is available, maximizing resource utilization.

And, unlike something like node.js or even Erlang, Scala has a huge universe of libraries at its disposal thanks to its bytecode-compliance with Java.

All good stuff. I thought it was time to do something real with it.

Before I dove into a large system, I thought I'd write a simple application. We have a group of binary files on SimCity that are very important to the game but, being binary, aren't easy to debug when something goes wrong. So I thought I'd do a quick project on my own to write a Java-based parser for the files and compare it to a Scala-based version. Little admin tools like this or other low-risk sections of the code base are often good ways to try out new tech and see if it will fit into the larger project. This code didn't leverage any of the concurrency systems in Scala; I just wanted a simple program.

One reason people like Scala — and, indeed, the many other JVM-compatible languages — is conciseness. Spend any time at all with Scala, Ruby, or — it sometimes seems — any other language, and Java's verboseness begins to feel like cement around your hands, sucking time and productivity away from your programming. Plus, more code, even Java's boilerplate, means more potential bugs.

As a simple measure, I compared the the non-whitespace characters in my two versions. I structured the programs the same way, mirroring class structures and refactored methods. But I used the support that each language gave me for keeping things concise.

The Java version was 5258 characters. The Scala version? 3099. The Java version was almost 70 percent larger.
Java CodeScala Code

Scala's biggest single win was with a file full of small classes that defined types of data within the file I was parsing. The Java version was 160 percent bigger than the Scala one.

This makes sense. Let's say you wanted an immutable class in Java to represent a point in 3D space. This is about as concise as you can get it.

public class Point {
   public final float x, y, z;
   public Point(float _x, float _y, float _z) {
      x = _x;
      y = _y;
      z = _z;

Here's the equivalent Scala code.

class Point(val x: Float, val y: Float, val z: Float)

But Scala offers lots of little aids as well. You rarely need Java's omnipresent semicolons; you don't need to declare types as often, since Scala can usually infer them; you don't need to explicitly type "return" at the end of a function, because the last result in the function is the return value; you don't have to declare that you throw exceptions. The list of little things goes on and obviously adds up.

Functional programming, too, offers some conciseness. I needed a routine to read an unsigned int out of a variable-length byte array. In the Java version, I wrote this:

private static int byteArrayToInt(byte[] bytes) throws IOException {
long retVal = 0;
    for (int i = 0; i < bytes.length; i++) {
        retVal = (retVal << 8) | ((long)bytes[i] & 0xff);
    return (int)retVal;

In the Scala version, I wrote this:

private def byteArrayToInt(bytes: Array[Byte]) = {
    ((0L /: bytes) {(current,newByte) => (current << 8).toLong | (newByte & 0xff).toLong}).toInt

(The references to longs in this int-parsing code are to cope with the fact that I needed to read very large unsigned ints from the files, which Java defaults to interpreting as signed integers. The way to get around that is to write into a larger memory space, namely a long.)

You could argue that the Scala version is concise to the point of obtuseness, even if you're familiar with the functional-programming mainstay foldLeft operation it represents. I agree that there's a balance to be struck. In particular, I'm not sold on the /: operator for foldLeft; I might opt for spelling it out to be more clear.

For functional programming geeks, note that, to the extent it can, Scala offers tail-call optimization on recursive calls.

But things weren't all sunshine and roses on the Scala side. Here is the average time to run my program for each version, timed over 1000 iterations.

Java TimeScala Time

To some extent, I expected this. Scala has to compile down to Java bytecode, which means that all that syntactic sugar and functional programming and closure support must turn into Java concepts somewhere. Even my little program generates a slew of extra classes and, presumably, lots of extra code that has to be navigated. Also, I think it's reasonable to imagine that immutable objects necessarily mean that new objects have to be created more often than they would in mutable space, where you can change an object directly. Finally, I've been working with Java in one form or another for 16 years or so; I've been working in Scala for about three days. So I'm likely missing out on performance tips.

Though I admit this seems like a huge difference for some extra classes and objects and missing an optimization step or two. Even if it's correct, I'm still of the mindset that greater productivity and easier, safer concurrency are big wins. (Note that you could always switch to imperative mode in key sections if performance demanded it, in much the same way that some sites offload work to C programs.)

If I were really honest about how I'd rebuild SimCity, I'd probably use Erlang, where you have to do things functionally, have a virtual machine that supports what you're doing, and have native systems for handling failures with aplomb. But Scala at least offers the potential of hiring from the pool of Java programmers, whereas Erlang really doesn't. (On the other hand, the vast majority of Java programmers I've seen seem to be couched safely and comfortably in Java, so wouldn't necessarily adapt. But Erlang would be a way bigger change, I think.)

I'm going to keep plunking away at Scala and try to build something a bit more real with it. Event-based actors might be a bit slower, but if they can scale vastly better, that may matter more to a site.

Thursday, November 8, 2012

Copying On S3

The question recently arose: Is it faster to copy within buckets on S3 than it is to copy between buckets?

A quick script provided an answer. I copied a 100K file 100 times for each test and averaged the results (which are in seconds).

Avg. time to make copy between buckets: 0.10705331
Avg. time to make copy within bucket: 0.10522299

A second test produced similar results (very slightly slower in both cases).

And here's the Ruby script I threw together. It uses the aws-sdk gem.

# get buckets
s3 =
bucket1 = s3.buckets['dfsbucket1']
bucket2 = s3.buckets['dfsbucket2']

# get an object from bucket 1
random_file = bucket1.objects['191111308/state_file']

start =
copies = 100
(1 .. copies).each do |i|
  random_file.copy_to("test_file#{i}", {
     :bucket => bucket2
puts "Avg. time to make copy between buckets: #{( - start)/copies}"

start =
(1..copies).each {|i| random_file.copy_to("test_file#{i}")}
puts "Avg. time to make copy within bucket: #{( - start)/copies}"

Sunday, November 4, 2012

Grokking Graphite

We started using Graphite at work six or so months ago, largely because there was already support for it in the metrics library we're using. If you don't know Graphite, it's a system for accumulating and, obviously, graphing time series. Most people use it for systems monitoring.

When we first set it up, I played with a few graphs of key metrics over time. Pick the metric from a list; Graphite shows you the graph. Easy. I also set up some basic dashboards that showed a few graphs. Again, easy.

But that's not always all you need. I wanted larger pictures of the whole system: hot spots, accumulated data across servers (in our setup, each server is its own metrics hierarchy in Graphite), and more. I pondered various ways to get the data out of Graphite (which it supports) and into R.

Then I discovered its functions library. And I went crazy.

First was a graph that showed every call in our system over a certain time threshold.

Then came one that combined a number of metrics to estimate mean time to first byte, a common metric for website performance.

And then another. And another. These days, I set up my laptop to run Chrome in full-screen mode so that it can fit all the graphs on one of my dashboards. But that's just one tab: I have dashboards for different environments and dashboards that focus on subsystems within those environments. A graph showing our 10 slowest calls gets uploaded to Campfire each day.

Our ten slowest calls as of today, with proprietary information removed. The lines are flat because of lack of activity on the server.

So far Graphite — especially version 0.9.10 — has been able to keep up with almost all my needs, and I haven't even hit all the functions. It even has a command-line interface that I just started playing with. (It allows faster iteration and finer control over each graph in a dashboard, but also allows you to keep a dashboard-building script under source control.) There are also a wide range of tools that work with it (including, of course, my own metrics relay system).

When I first read Graphite's documentation, I was struck by the author's right-up-front advice to consider your metrics naming scheme carefully. It seemed very nitty-gritty so early in the manual.

But now I understand. A consistent naming scheme and hierarchy depth allows for much simpler construction of useful graphs. To some extent, our profiling code, our package hierarchy, and our metrics library give us this for free. But the other day I realized I had made a mistake in naming a metric that captures all invocations of methods with a particular annotation, and it made it much more difficult to assemble a meaningful graph. I got it to work, but it required some wrestling. If you're using Graphite, I recommend auditing your metrics periodically to make sure you can get the most out of them.