Sunday, January 16, 2011

One-Button Loadtesting With EC2

I recently realized a minor ambition.

As I've said before, EA's loadtesting group doesn't meet all my needs: They come in near the end of a project and test the nearly-finished system, which means that it's a big headache all around. Fixing performance issues means rummaging through months — or even years — of code to try and suss out issues they're seeing. Also, the mere act of getting them up and running is usually a big disruption for one or more members of your team.

I had loadtesting scripts running already, but they suffered in a few ways. JMeter is good at what it does, but I want easier programmatic control of the scripts (especially for measuring things it's not so good at, such as the lag time on asynchronous processes under load). Also, I didn't have a loadtesting environment. I was running the scripts on a dev server against that same server. I could have put together an environment of multiple servers, but that would incur expense to the studio for machines that were largely idle.

Then I read an article somewhere about a company that uses EC2 as a dynamic loadtest environment. Yes, I thought, that's what I want.

Off and on over the last few weeks, I put together a Ruby script that can, with a single command, set up a full environment in Amazon's cloud-based servers. That one command instantiates an RDS database, an EC2 server that will act as our application server, and another EC2 server that acts as an instance for running The Grinder, a loadtesting tool that another team in the studio is using and that I liked for its extensibility. That one command also builds a custom version of our war file (to handle all the dynamic addresses you get from EC2/RDS), sets up access permissions between all the servers, and runs checks to make sure the whole environment is ready to go.

A second command fires off the loadtest itself, running the test script through The Grinder and collecting the results back to the local machine.

Finally, a third command shuts the whole system down. I do have one command that runs all three steps, but at the moment I tend to run each one manually.

Each run takes less than an hour for now, so the cost to the studio is on the order of $.28 per loadtesting run. Not something you'd want to run continually, but certainly something you could run once a day. As we move along in development, we'll probably need to spend more to set up bigger, more realistic environments, but that will also be when the studio has more budget for my project. The environments we need will scale up alongside the money we have.

One of the key pieces that I set up was the concept of a profile. A config file lets you specify which script to run and how many of each item to set up. So, for my first profile, I just set up one each of the different servers. But you could imagine some profile later down the road that sets up 2 of each machine with some sort of autoscaling system. And you could imagine one much further down the road that sets up something approximating our production environment and runs tests against it. All that is mostly supported.

The advantages to this system are huge. First, it makes real loadtesting feasible even early on in our project. But it also empowers developers to do rapid iteration on performance fixes without having to push them live and see if they make a difference. See a problem in the results, figure out a fix, run the script again to test. Repeat and then release when you can prove that your fix makes a difference. Because each script fires up its own environment, I'll be able to distribute performance fix work to my team.

This ad hoc loadtesting tool is already proving its worth, and I've only just started employing it. I can't wait to see how effective it is going forward.