An Obsession With Programming

Friday, September 3, 2010

Optimize For Bingos In Words With Friends

See the updated version here.

Somewhere in my library of books is one called Everything Scrabble, a guide to improving your Scrabble game. I read through it a few years back and have probably forgotten half its contents, but I do remember the author's comment that highly ranked Scrabble players often swap tiles in an effort to increase their chances of hitting a bingo (a word that uses all seven letters on your tray).

I've been playing Words With Friends (screen name: linechef) and have been thinking about that Scrabble strategy. Bingoing is less of an advantage in WWF. They're only worth 35 points instead of 50, and they often open up large swaths of bonus tiles for your opponent to gobble up. But they are usually worth it if you can hit a bonus tile yourself.

Everything Scrabble provides the scoop on which five tiles you should keep in your tray to maximize bingoing, but my mind let that knowledge go into the abyss. So I decided to write a program to rediscover it.

You can imagine how the program works: Go through every seven- and eight-letter word in the Words With Friends dictionary, the ENABLE wordlist, and find every unique combination of five letters in it. Keep track of how many copies of that combination you've seen throughout the dictionary, and then sort.

Let's start with the answers first. Here are the top 20 combinations, along with the number of times they occurred (the ENABLE list has 172,823 words):


aerst 2413
eirst 2199
einst 1780
eorst 1698
eerst 1646
einrs 1596
aeist 1587
aeirs 1484
aelrs 1468
aelst 1350
aenst 1346
eilst 1338
eiprs 1286
aeirt 1273
aeprs 1266
aenrs 1258
aeins 1245
einrt 1238
deirs 1233
ainst 1232

And here's the Ruby code. Ruby 1.8.7 added a combination method to the Array class, which makes this program tiny. You tell it how big you want each subset to be, and you pass it a block which takes each subarray as an argument.


wordlist = ARGV[0]

combos_to_count = Hash.new(1)

File.open(wordlist,"r") do |file|
   line = file.gets
   while(line)
      line = file.gets      
      if line then
          line.chomp!
          next if line.length != 7 && line.length != 8
          chars = line.split(//)
          chars.combination(5) do |combo|
             combo.sort! {|a,b| a <=> b}
             combo_str = combo.join
             combos_to_count[combo_str] = combos_to_count[combo_str] + 1
          end
      end
   end
end

sorted_combos = combos_to_count.sort {|a,b| b[1] <=> a[1]}
(0...20).each {|num| puts sorted_combos[num][0] + " " + sorted_combos[num][1].to_s}

Once you have the code, you can start asking other questions. Is the list different for words with exactly seven letters? A little bit:


aerst 609
eirst 493
eorst 450
aelrs 413
eerst 404
einst 385
aeprs 379
einrs 348
aelst 346
eilst 335
acers 331
aeirs 330
aenst 327
aeist 322
deirs 320
aders 318
eiprs 317
eilrs 297
aerss 296
deers 295

Note that there are only 676 permutations of the other two letters you could fit in your tray, so with STARE you can bingo with almost any combination. I leave learning those 609 words as an exercise for the reader.

What about a different dictionary? Here's the same run I first did but with the Scrabble Official Dictionary, Third Edition.


aerst 2395
eirst 2182
einst 1759
eorst 1670
eerst 1628
einrs 1581
aeist 1562
aeirs 1473
aelrs 1458
aelst 1335
aenst 1327
eilst 1317
eiprs 1278
aeprs 1258
aeirt 1257
aenrs 1239
aeins 1232
ainst 1224
einrt 1219
deirs 1218

Of course, some of this is common sense: EST gives you the superlative of many words. Likewise, ER gives you the comparative along with the prefix RE and the "person who does" suffix. Those four letters plus vowels other than U and Y form top contenders in many of the runs.

If you're playing WWF (or Scrabble), work to keep STARE in your tray, playing or swapping other tiles, until you get a bingo. Placing it on the board, of course, is a different matter.

Wednesday, August 25, 2010

Rails Scaffolding In Java

I recently needed to add a new business object to a system I'm developing at work, and it was a bit of a drag.

If you follow any of the normal Java patterns, you probably know what I'm talking about. You need the file for your business object. You need a service class responsible for returning instances or sets of instances from the DAO layer. You need a SQL file that defines the database table the business objects will be stored in. In our case, we also have queries in resource files, so we need a file for that as well. You need to edit your Spring config files to point to the new service. And so on. Your environment probably has some overlap with mine, and probably requires pieces mine doesn't.

You can simplify a lot of this, of course. You can make a service base class with Java Generics that will give you a lot of the type-safe methods you'd want. IDEs will let you set up templates, but you'll still have to click through a few menu options to get you the files you need. And the more files, the more clicks. And I like running an IDE-neutral team, so I wouldn't want to do something specific to one workflow.

I wanted a better way. Specifically, I wanted what Rails provides. You type a command at the command line, and you get all the files you need for working with the object. (XCode offers similar functionality.)

And I thought, "Well, why not?" My build system is written in Ruby, and Ruby's ERB templating system is built in to the language.

It took about an hour to get the main system up and running. I now type buildr :biz_object object=com.ea.foo.TestTest and I get a src/main/java/com/ea/foo directory, a TestTest.java and TestTestService.java file in that same directory, a sql file with the table definition (named test_test), and an empty query file. I also get a perforce changelist (via p4Ruby) with the description filled in and all the new files added.

Here's the heart of the code. template_to_final is a hash of template file name locations to the end file destination. The local variables exposed by the call to binding include the package name, the Java object name, and the SQL-friendly name:


       template_to_final.keys.each do |key|
           File.open(key) do |file|
               b = binding
               erb = ERB.new(file.read)
               outfile = File.new(template_to_final[key],"w")
               puts "Creating #{outfile.path}"
               outfile.write(erb.result(b))
               outfile.close
           end
       end

To give you an idea of what the templates look like, here's some code from the java file templates:


package <%=java_pkg%>;

@Entity
@Table(name="<%=java_sql_name%>")
@SequenceGenerator(name="objIdGenerator",sequenceName="object_id_seq",allocationSize=1)
public class <%=java_obj%> {

I don't yet go the full Rails route and specify all the properties on the command line, but this takes care of getting the boring parts of business object development out of the way, enforcing consistent naming schemes, and ensuring that the developer doesn't forget to check in some file. I also don't yet modify my config files, but that won't be too tough to add.

Thursday, August 5, 2010

Scripting Campfire

My team uses Campfire, a web-based chat tool from 37signals, to communicate throughout the day. We've divided it into various rooms, one of which is a Status room where we do a virtual version of the daily stand-up that many teams do: a quick meeting where everyone says what they're working on. (One of our team members is in England, and our hours vary a bit, so a classic stand-up isn't practical.)

About a month ago, one team member started posting the date in the status room before anyone gave their update. For some reason, Campfire itself doesn't do this reliably, and when you have a room with nothing but status updates, it's not always clear where one day's messages end and another's begin.

Usually one or two people on the team did this, but I followed suit on a couple of occasions, and of course thought about automating it. We already had a bot account for some very early automation, and Campfire has a good web service. Insert Tab A into Slot B.

Here's the relevant Ruby source code, which I have hooked up to a cron job (Note that if you want to script Campfire, I suggest creating a "Bot Test Room" that you can use for experiments without spamming your real rooms):


require 'date'
require 'json'
require 'net/http'
require 'net/https'
require 'uri'

def send_text_to_campfire(text)
    message = {
       :message => {
           :type => "TextMessage",
           :body => text
       }
    }
    send_to_campfire(message)
end

def send_to_campfire(message)
    url = URI.parse("https://<your base URL>/room/<your room number>/speak.json")

    request = Net::HTTP::Post.new(url.path)
    request.basic_auth(<your auth token>,<any password string>)
    request.content_type = 'application/json'
    request.body = message.to_json
    request.content_length = request.body.length

    http = Net::HTTP.new(url.host,url.port)
    http.use_ssl = true
    response = http.start do |http|
        http.request(request)
    end

    puts response.body.to_s
end

# construct message
today = Date::today
formattedDateString = sprintf("%02d/%02d/%4d",today.mon,today.mday,today.year)
dateStatusString = "=== Today is #{Date::DAYNAMES[today.wday]}, #{formattedDateString} ==="

#send status message to campfire
send_text_to_campfire(dateStatusString)

Filling in the date each day isn't a huge time savings: It will take us a lot of days to recoup the time I spent automating a five-second typing task. But once I figured out the gist of posting to Campfire, I started adding new functionality. Our "Today is ..." status message now includes a few choice statistics — gleaned from our telemetry system — about gameplay from the previous day. I also wrote a "canary" script that does a health check on our dev server and posts to our general chat room if it seems to be slow.

Naturally, some of my co-workers have suggested writing an adventure game on top of the API. That may be a bit silly, but it does emphasize a point I often make: You can't really imagine all the possibilities for a technology until you get your hands dirty a bit and play with it.

This whole experience underlines again why web applications should have APIs. The lack of a date is probably a Campfire bug, but we don't have to wait for them to fix it. And we've added functionality that is only relevant for our team, resulting in something that more closely ties in with our real needs.

Sunday, August 1, 2010

Watch Your Users Use

Time and time again, I re-learn this lesson: Nothing makes your system more usable than watching your users use it.

As I mentioned in my last post, I'm building a telemetry system on top of Google AppEngine so that my studio can understand how our games will be played. As I've demoed pieces of it, I've put together web pages and interactive charts and other nice little visualizations. I've made JavaScript libraries to make it easy to build those sorts of things, and I've demoed some of the prettier ones to the stable.

But I also put in a way to export data as tab-delimited fields. I figured this would be a good last resort if the JavaScript version wasn't quite there yet, or if we hadn't written a specialized view on the data, but I figured we'd favor those.

Naturally, then, my principal users have all just used the tab-delimited export and not pushed beyond that. In particular, they grab a bunch of tab-delimited data, import it into Excel, and then do manipulations on it there. Though there are macros and various other things, it still seems to take a senior engineer a couple of hours to compile all the data for his weekly reports.

I know this because I sit with him a fair amount as he explains his process.

The other day, I was thinking of his process and realized that Python can create Excel spreadsheets. So I could remove one minor barrier by just letting him get the data in Excel format instead of tab-delimited text. Easy.

But that doesn't buy him much. For his weekly reports, he trims some of the unneccesary columns (my exports grab all the fields and dump them out, including some that are system-level fields), runs some macros to insert formulas to give him percentages, sorts the data, and then copies and pastes that into an email. He does that for about 10-15 reports, in one form or another.

What if, I thought, I could let him define the format of the spreadsheet the system generates? Then he could say, "give me this data, but put it in this layout." That would shave a big chunk of time from his flow.

One thing about working with him is that he's not an online engineer. This turns out to be very good, because it makes me think in terms of letting him modify config files instead of modifying server code. (A medium-term goal is building a real interface on top of my system so that anyone in the studio can interact with it, but for now, config files are how we do things.) One of our other online engineers used the system by writing a custom request handler that interacted with the AppEngine datastore directly to generate a customized HTML page. Very neat, but if he were my only user, my system wouldn't be very evolved. I'd just say to someone new, "write a request handler."

I mentally sketched out what the config file would need to contain, what abilities it would need to give him, and within 45 minutes had a system in place that will let him generate something much closer to the final spreadsheets he needs for his reports.

A key component of the new feature is the Django templating language that comes with AppEngine. I actually dislike the language for HTML generation, but for letting users construct simple templates, it's pretty good. One important feature: It can render a template contained in a string, not just one contained in a file. And you can give the renderer a context, filling in some variables that are accessible within the template. In my case, I create a context that contains the current piece of aggregation data (which is a summary object of a large number of events) and the current row number.

From his perspective, to generate a report about how much damage a creature is doing in our testing, he creates a YAML config file that looks like this:


header:
    - value: Name
    - value: Avg Damage
    - value: Total Damage
row:
    - value: "{{aggregation.groupByValue}}"
    - value: "{{aggregation.average}}"
    - value: "{{aggregation.sum}}"

Then, when he requests data in Excel format, he can tell it to go through the layout he specifies. Rather than getting a dump of every piece of data in each object, he gets a spreadsheet with just the information he needs. The {{}} is part of the templating system and translates as "spit out the result of this expression."

Let's say he also wanted to capture the number of times that creature did damage. That's actually available to him in another field in the aggregation, but let's say it wasn't. My system also supports formulas. So he could do this:


header:
    - value: Name
    - value: Count of Attacks
    - value: Avg Damage
    - value: Total Damage
row:
    - value: "{{aggregation.groupByValue}}"
    - formula:"ROUND(D{{row_num}} / C{{row_num}},0)"
    - value: "{{aggregation.average}}"
    - value: "{{aggregation.sum}}"

That formula will become, in the output for the first row, ROUND(D1/C1,0).

This is still a long way from user-friendly, but I think it's going to give him back an hour and a half of time each week. And building the system this way means that a user-friendly version just needs to write the format into this config file, and away it will go.

Tuesday, July 20, 2010

Google AppEngine

Three months ago or so, I had an interesting idea for Google's AppEngine, an environment for building web applications. I wanted to use it for telemetry.

Telemetry, literally, means measuring something from afar. (Merriam-Webster's definition is somewhat circular if not outright tautological.) In the gaming industry and perhaps others, it means capturing information about how users interact with your system. Think of stats packages for web pages; they're the same thing. You also hear the terms business intelligence and plain old data analysis.

I want to be clear: This isn't about watching you, in particular, play a game. If you die in a particular spot on a particular level when our newly announced Darkspore ships, I don't really care. But if lots of people die in that same spot, it tells you something about the difficulty of that spot, doesn't it?

I needed something that could take in a large stream of data without hiccuping and allow me to process it and report on it. I went with AppEngine.*

A big selling point of AppEngine is scalability. Google obviously knows something about running big systems, and they take control of scaling your app. At any given point, I don't know how many servers are running my code: AppEngine starts up new instances as traffic grows and shuts them down when traffic slows. It's not like Amazon.com's EC2, where you manually say, "I want this many servers. No! Now I want this many!"

Closely tied to its scalability is the datastore behind AppEngine, which is a proprietary NoSQL technology called BigTable. When you run queries against BigTable, you're using a MapReduce algorithm — the same algorithm that allows Google to index and search the web so quickly. I had a potentially huge set of relatively unconnected data: AppEngine seemed like a good fit.**

And, for the most part, it has been.

But the datastore that makes AppEngine so powerful has some quirks that I've had to work around. Keep in mind that it's not a relational database, so it doesn't have the same behaviors. Most notably, it's not very good at aggregating data: sums, averages, whatever. And remember where I said that telemetry is all about aggregation? In the long run, I'm hoping BigQuery will make this easy, but in the mean time, I had to build my own aggregation system on top of AppEngine's cron jobs and task queues. The system is pretty neat, including support for a "group by" concept so that I can get views on subsets of the data in a given aggregation, but I'll be happy to let someone else handle that heavy lifting.

Another issue I've run into is that AppEngine sets hard caps on certain aspects of the system. For instance, any given request can't run too long or it will be terminated. On my own servers, if a webpage for an admin tool takes thirty seconds to load, I'm not too upset. It's an admin tool; I can wait. Google isn't so forgiving, presumably to prevent some process of yours from running amok on their machines. This has led to some early performance tweaking that's probably, in the long run, better to get out of the way now.

But these have been interesting problems to solve, so I haven't minded them so much as been annoyed by them. We're currently pulling in 100,000 events or so on any given day (as testers play the game and developers work on it), and I've only had to do optimizations on the reporting side and aggregation system. There's a whole other round of work I can do on optimizing writes, but I haven't bothered yet. Granted, 100,000 events isn't that significant, but it's nice to defer some of the deeper tuning until we're a bit closer to shipping. Meanwhile, we've got graphs and charts and spreadsheets with our telemetry data served up in a useful way to different people in the studio.

It's always hard for me to gauge the ease-of-use of a web system for someone who's not a long-time server engineer, but AppEngine feels like it is very good for either the people who know nothing about web stuff or people who know a lot about it. For middle ground types — maybe a bit of PHP or other dynamic site system — who won't be facing scalability issues, I wonder if Rails might be a better choice.

But I do know that one of our non-online engineers was able to install a new index on AppEngine for a query he wanted without even asking me. I didn't know about it until I happened to see his check-in note. That's pretty powerful. AppEngine supports both Python and Java, but I deliberately went the Python route first, despite not knowing the language at all, because there's more Python knowledge in general in my studio. And that language I didn't know? The code I wrote to take in telemetry events is virtually unchanged from the first day I wrote it. So it's easy to get started on AppEngine, and the cost is free well into the needs of most websites.

If you're interested in developing on AppEngine, I recommend the yet-to-be-released Code in the Cloud.The writing in the early drafts is a bit too exuberant, but there's solid information from the get-go, and Pragmatic Programmers delivers polished final products.

* Yes, yes. I don't trust Google, either, though, as always in blanket statements, the AppEngine team seems to have their hearts in the right place. What's to prevent them from pilfering through our data and gleaning their own information about our users? Well, there is a clause about that in the Terms of Service — they won't do it — but I'm not naive. We won't be storing information that anyone else could use to trace back to our users. An ID that makes sense inside EA? Yes, sometimes. Anything that makes any sense anywhere else? Nope. Our legal department was pretty firm on that.

** What if AppEngine goes down on our big launch day? Am I nervous about tying my company's game to a service that may not be there when I need it? In this case, not really. Let's say AppEngine goes down for a full 24 hours. The whole thing, not just pieces. Such an event would be unprecedented in my experience, but still. So we lose a day of telemetry data. Big deal. People will play the game one day the same way they played it the day before and the day after. Since I'm dealing with aggregates, I can cheerfully ignore missing data.

Wednesday, May 26, 2010

What's The Probability Of Two Boys?

There is a link going around about the most recent Gathering For Gardner. The writer hooks the reader with an intro featuring Gary Foshee (a top-notch designer of "secret box" puzzles, though they're rarely boxes), who poses this question to the crowd of mathematicians, puzzlers, and magicians: "I have two children. One is a boy born on a Tuesday. What is the probability I have two boys?"

The article then describes the Gathering. (I'm on the invite list, but I've not yet been.) It eventually explains the answer, but these probability questions never make sense. So I wrote a program to illustrate the first oddity, which is that announcing you have two children and one is a boy makes the probability of the other one being a boy only 1/3. Basically, create 10,000 sets, and remove any that are two girls. Now count up the total number of pairs left and the total number of those that are two boys, and you end up with something around one-third.



num_bbs = 0
total_pairs = 0

(0...10000).each do |count|
  children = [Kernel.rand(2),Kernel.rand(2)]
  next if children[0] == 1 && children[1] == 1

  total_pairs = total_pairs + 1
  num_bbs = num_bbs + 1 if children[0] == 0 && children[1] == 0
end

puts "#{num_bbs} pairs of boys out of #{total_pairs} valid pairs = #{(num_bbs.to_f/total_pairs.to_f) * 100}"

It's weird, but it's true.

Tuesday, May 18, 2010

We Rule: Maximize Your Time

I've been playing We Rule on my iPhone. I have to say, I don't really get the point of it. It seems like the only way to progress is to plant and harvest crops and exchange services with other players. But there are tons of things like trees and banners and what not that don't seem to do anything. Should I place a water tower? Who knows?

But this is not a game review. When you plant items and harvest them, you get paid some amount of gold depending on the crop. Planting each crop takes some defined amount of money (usually), and each crop requires some defined amount of real-world time to mature.

So, naturally, I wondered: Which plants are the best investment?

A simple spreadsheet offers the answer*. I wrote down the profit of the crop and divided that by the number of minutes it would take to grow. I did the same thing for experience points, which allow you to level up. (This list reflects the crops I currently have access to. I'll update as I go.)

Crop	Profit	Profit/Minute	XP/Minute
Magic Asparagus	1250	.87	.24
Potatoes	200	.56	.22
Pumpkins	160	.89	.36
Strawberries	120	1.33	.56
Carrots	260	.36	.15
Squash	180	.6	.25
Beans	340	.24	.10
Onions	100	1.67	.67
Wheat	20	4	1.60
Corn	5	6.67	1.33

Your definition of a good investment may be different than mine. Clearly the best investment is to plant and harvest corn like a madman every 45 seconds. You have fun with that. Wheat is somewhat more tolerable, with a 5-minute harvest time, but I like to set up a bunch of crops and then go do something else until they're done. Or set up a bunch overnight to have ready the next morning.

Strawberries and onions are good investments, with onions beating out strawberries for both cash and experience points. Plus, they only take an hour: That's about right for mid-day play. In the crops that take all day, Magic Asparagus is your best best, with beans not even close. Beans also give you one of the worst XP/Minute ratios. Don't plant beans.

Derive your own fun theories from the list.

*Yes, this is a programming blog. Spreadsheets are programs, too, though they're rarely treated as such.