An Obsession With Programming

Friday, May 14, 2010

Rails Rules

I get now why people love Rails.

I had a personal project in mind: a web-based application for organizing all the notes you collect while working on an article. Like the index cards you used to use for school reports. I decided to try Ruby on Rails, a web application framework built on top of the Ruby language, mostly to see what all the fuss was about.

Rails is the framework of choice for a wide range of startups because it lets you get up and running quickly. It works by driving home a brutal truth: Your website is not that unique.

Some sizable percentage of what you need to do for a real-world website is what everyone needs to do for a real-world website: working with databases, adding CRUD functionality, rendering HTML, mapping handlers to URLs, and so forth. "All of this has happened before and all of it will happen again."

Rails acknowledges this and takes care of most of that functionality out of the box. A lot of what it does seems like magic, but really it just imposes naming conventions that will let it do the right thing most of the time. Put your User objects into a users table, and you rarely have to write SQL. Organize your URLs as /controller/action/id, and you don't have to map anything manually. Call a controller method create, and it will automatically be called to handle POST operations. And you don't even have to remember any of this: The scripts that come with Rails generate tons of the code for you, so adding a new business object into the database and HTML-based CRUD pages to manage said objects take two scripts. Two lines at a command prompt, and poof.

I gave it a whirl and came away impressed. In five hours of flying to the East Coast this weekend, I, who know very little about Rails and something about Ruby, had a functional site for creating, viewing, and editing each of the key object types my application needs. I even added a whole bunch of "it would be nice if it did this" types of features. All on the way to New Hampshire. I wrote minimal amounts of code to do it, too. I have an app that I can use — indeed, I've started using it for real-world stuff.

On the way back, I started worrying about the user interface, and that's where I began to flounder a bit. Rails is smooth sailing if you're using all of its tools, but if you want to use something such as jQuery for your front-end JavaScript/AJAX, things get tougher. Or perhaps it's that the book I used doesn't explain the depths well enough to let me figure it out. A friend of mine says it's easy, and I'm sure it is, but I haven't quite grokked everything I need to do yet.

But even with this hiccup, Rails is clearly a valuable tool that any web developer should consider.

Tuesday, April 13, 2010

Ruby + Outlook + Perforce

At my work, we have a practice that I've always found curious. Usually, when people check something in, they send out an email to the whole studio with the change list.

This is, by itself, a fine practice. The curious part to me has always been that Perforce (our version control software) has the ability to send out emails for every check-in. My old boss would subscribe to every check-in in our branch, for instance, so she could keep tabs on what the group was doing. But that's not what we use.

Over time, I've come to understand the rationale of our redundant workflow. People can attach screenshots (I do work for a game company, after all) and the subject line can provide project categorization and a short description that perforce doesn't know about. Also, culturally speaking, if everyone in your company does this and you don't, people have a harder time figuring out what you do.

But I still find the workflow annoying. You submit your changes in Perforce. Then you go to the "submitted changelists" pane and double-click on the changelist you just submitted. You select all the text and then copy it (this view has more info in it — such as the list of files — than the text you wrote when submitting, so you can't just use that). You alt-tab over to Outlook, open a new message, address it to the studio email address, attach your pithy subject, and then copy and paste in the changelist info. (You also use this moment to attach screenshots, if relevant.) You send the email and then alt-tab back over to Perforce (or your IDE) to do more work.

I finally decided to automate the process a bit using Ruby to tie together Outlook and Perforce. I set it up to work with my needs, so it may be of limited use to anyone else. For instance, we have a custom of using "mini" to indicate minor, one-liner types of fixes. Otherwise, I tend to use "submit." Also, sometimes I want to aggregate a few recent changelists in one email. That's what the -count argument does. I also put all the config info (username, password, etc.) into a separate yaml file so that I can distribute the script without sending around my network password. Finally, I specified a sendMode of either Display or Send. Display opens the email for you and lets you customize it. Send just kicks it out the door. The former is useful for screenshots and the like. The sendMode and project config file options provide defaults, but they can be overridden. Sometimes I do work in another functional project and need to change the email accordingly.

You'll need p4Ruby to make this work. (I think the OLE stuff is built in to the Ruby for Windows installation.) Perforce returns information in sort of an odd way: a changelist will have the list of revisions as one field and the list of files as another. The indexes line up, but it takes a bit more work to get the info you want.

There are no doubt better ways to do this: I'm still fumbling around with Ruby.


require 'win32ole'
require 'P4'
require 'yaml'

is_mini = false
subject = ""
num_cls = 1

def assert_value(obj,message_on_fail)
   if !obj
      puts message_on_fail
      Kernel.exit
   end
end

# load the yaml settings first
assert_value((File.exists? 'p4_email.yaml'),'You must have a p4_email.yaml file in the same directory')

config = YAML.load_file 'p4_email.yaml'
sendMode = config['sendMode']
project = config['project']

assert_value(config['p4User'],'No p4User specified!')
assert_value(config['p4Password'], 'No p4Password specified!')
assert_value(config['p4Client'], 'No p4Client specified!')
assert_value(config['p4Host'],'No host specified!')

# now parse ARGs. In particular, see if the user has overridden config settings
ARGV.each_index do |index|
   if ARGV[index] == '-sendMode'
      sendMode = ARGV[index+1]
      # check value
      assert_value(sendMode == 'Display' || sendMode == 'Send',"Invalid sendMode value: #{sendMode}")
   end

   if ARGV[index] == '-project'
      project = ARGV[index+1]
   end

   if ARGV[index] == '-mini'
      is_mini = true
   end

   if ARGV[index] == '-subject'
       subject = ARGV[index+1]
   end

   if ARGV[index] == '-count'
      num_cls = ARGV[index+1].to_i
   end
end

puts "count: #{num_cls}" 

assert_value(project,'No project specified! Add to p4_email.yaml or use the -project command-line argument')

# set up p4 connections
p4 = P4.new
p4.client = config['p4Client']
p4.password = config['p4Password']
p4.user = config['p4User']
p4.host= config['p4Host']

p4.connect
p4.run_login

# retrieve recent changelists
lists =  p4.run_changes('-u',p4.user,'-m',num_cls, '-s','submitted')

#get the id
msg_body = ""
(0...(num_cls)).each_with_index do |obj,index| 
    cl_num = lists[index]['change']

    #get the full details for that cl
    cl_full = p4.run_describe(cl_num)[0]
    cl_action_list = cl_full['action']
    cl_rev_list = cl_full['rev']
    msg_body = msg_body + "Change #{cl_num} by #{p4.user}@#{p4.client}\n\n"
    msg_body = msg_body + cl_full['desc'] + "\nAffected files ...\n\n"
    cl_full['depotFile'].each_index do |index|
        msg_body = msg_body + cl_full['depotFile'][index] +\
                   "##{cl_rev_list[index]} " + cl_action_list[index] + "\n"
    end
end
p4.disconnect

#compose email
outlook = WIN32OLE.new('Outlook.Application')

message = outlook.CreateItem(0)
submit_type = "submit"
if is_mini
   submit_type = "mini"
end
message.Subject = "p4 [#{project}] #{submit_type}: #{subject}"
message.Body = msg_body
message.To = '[studioemail]'
# todo: should invoke the method by using reflection
if sendMode == 'Display'
   message.Display
elsif sendMode == 'Send'
   message.Send
end

Sunday, April 4, 2010

Thoughts On Core Data

I started a new iPhone app, and I decided to use the Core Data framework.

For my first app, I built an object wrapper around calls to sqlite, the embedded database built in to the iPhone frameworks. Core Data didn't exist, so everyone had to roll their own solution to this problem. I thought about just using my original solution again — it's well tested, it's a few tweaks from total reusability, and I know SQL well — but my iPhone programming is mostly about learning new technologies, so I gave Core Data a try.

Core Data is basically an ORM system. I've used a number of these over the years; I've even written some, including, in a minor way, the sqlite wrapper I mentioned above. All the ones I've seen abstract away the notion of a "database" so that the bulk of the system just sees objects without knowing their origin.

Here are some of my initial thoughts on Core Data.

Core Data abstracts the database away so much that you can't actually get to it. I recognize that Core Data can run on top of any number of storage solutions, but I feel like if I know it's running over a database, I should be able to manipulate the database myself. Bulk updates of database info — versus loading each object and modifying it — are just one scenario where that would be useful.

Objects managed by Core Data have to extend a single base class. This isn't a huge problem for my model, but it does mean you use up the one inheritance you have in Objective-C. Java has the same limitation, and most of its ORM solutions don't require you to extend a class, which gives you more flexibility in the long run.

Migrating a model should not be an "advanced" topic. One minor change to a model, and you have to nuke the data for your app, which is a bother when you're actually using it. Yes, there are a range of ways to accomplish your goal. But in my first iPhone app, I just wrote a few lines of SQL and had them run against the database at startup: Migration to new models was a snap.

The NSFetchedResultsController is a delight to use. With a few short lines of code, you have a model object you can use to drive table views of data.

Maybe I haven't read up on it enough, but when Core Data is running against a database, I'd like to see explain plans for its queries and be able to check its index usage.

Running arbitrary queries is extremely verbose, again because of the inability to run SQL directly. I wanted the ability to display a unique list of existing non-null values for an object's property in my app so that a user could either enter a new one or select an existing one. In SQL, that would be something like SELECT DISTINCT property_column FROM object_table WHERE property_column IS NOT NULL ORDER BY property_column. The Core Data version of this is:


    NSFetchRequest *request = [[NSFetchRequest alloc] init];
    [request retain];
    NSEntityDescription *entity = [NSEntityDescription entityForName:@"CallSlip" inManagedObjectContext:[self managedObjectContext]];
    [request setEntity:entity];
    [request setResultType:NSDictionaryResultType];
    NSExpression *keyPathExpression = [NSExpression expressionForKeyPath:researchAreaField];
    
    NSExpressionDescription *expressionDescription = [[[NSExpressionDescription alloc] init] autorelease];
    [expressionDescription setName:researchAreasKey];
    [expressionDescription setExpression:keyPathExpression];
    
    [request setPropertiesToFetch:[NSArray arrayWithObject:expressionDescription]];
    
    NSPredicate *predicate = [NSPredicate predicateWithFormat:@"%@ != nil", researchAreaField];
    [request setPredicate:predicate];
    
    [request setReturnsDistinctResults:YES];
    
    NSSortDescriptor *descriptor = [[[NSSortDescriptor alloc] initWithKey:researchAreaField ascending:YES selector: @selector(caseInsensitiveCompare:)] autorelease];
    NSArray *descriptors = [NSArray arrayWithObject:descriptor];
    [request setSortDescriptors:descriptors];
    
    NSError *error = nil;
    NSArray *results = [[self managedObjectContext] executeFetchRequest:request error:&error];

That version isn't exactly shorter.

Compared to other, similar frameworks, I'd rank Core Data as decent. I imagine it's scalable enough for a client application, where you probably don't have to worry about anything larger than 50,000 records. And, if you don't know SQL, it's probably better than just dumping an object tree into a file. But if you know databases, you're likely to find it frustrating as often as you find it useful.

Wednesday, March 17, 2010

Dynamic XML Schema Elements With JAXB

I solved an interesting problem the other day at work, and I wanted to write about it here, mostly so I don't forget how I did it.

I can't talk about what I'm doing at work at the moment, but I came up with a scenario from a different industry that presents some of the same problems.

Let's say you're building a web service for a stock trading desk. You generate a variety of reports in XML and JSON so that you can build a robust, AJAX-y front end solution.

Imagine that the head of the desk wants a report that gives a summary of all the stocks traded that day, complete with volume bought and volume sold. The desk trades different stocks each day, of course, and there are a vast number of valid ticker symbols that could appear, with more coming online as companies go public and with some disappearing as companies get delisted.

You might end up with an XML structure that looks something like this:


<report>
    <tradedStocks>
        <stock>
            <symbol>AAPL</symbol>
            <bought>100</bought>
            <sold>100</sold>
        </stock>
        …
    </tradedStocks>
</report>

With a corresponding JSON structure like this:


{"report":{"tradedStocks":[
    {"symbol":"AAPL","bought":"100","sold":"100"},
    …]}}

So far, so good. Now imagine that the head of the desk wants to treat AAPL differently. Instead of being mixed in with the other stocks traded that day, it should be at the head of the list and printed in green.

When you code this special-case logic on the front end, it will probably look something like this (in JavaScript):


    function findSymbol(symbolName) {
        for (stock in report.tradedStocks) {
            if(stock.symbol == symbolName) {
               return stock;
            }
        }
        return undefined;
    }

    aapl = findSymbol("AAPL");

Even wrapped in a function, that's a bit cumbersome. Plus, if the desk has traded hundreds of stocks, that iteration can be time-consuming, especially if the head of the desk wants to call out stocks that the desk shouldn't be trading: The code goes through the entire list of stocks only to return undefined.

It would be easier and more readable to be able to do something like this:


aapl = report.tradedStocks.AAPL

But that would necessitate an XML structure in which the element name stock was replaced by the element name AAPL. Which would in turn mean that the elements allowed under tradedStocks were drawn from a very large list of ever-changing element names. In essence, subelements under tradedStocks could literally be anything.

XML doesn't really allow for that. It assumes you have a well-defined structure. And tools like JAXB build on that.

You could put the special-case logic on the server-side, of course. Grab the AAPL data from the list of traded stocks, and make a sub-element of report be AAPL with further subelements showing the data. That would repeat data, which may not be the end of the world, but what about the next time the head of the desk wanted special-case logic? More server-side custom logic.

Here's how I solved the problem.

First, I changed the logic in our XML formatter. My view layer is isolated from the rest of the application, and within it it shuffles the model to different formatters depending on the format that was requested: xml goes to the XML formatter, json goes to the JSON formatter, and so forth.

The first incarnation of our XML formatter did the obvious thing: It just used the JAXB engine to spit out the XML based on the JAXB annotations. But I made it act more like our JSON formatter, which receives events from a parser that analyzes the JAXB annotations and constructs the output based on those events.

Why switch? Because I wanted custom annotations. By making our XML formatter act as an event listener too, I could interject events based on annotations that JAXB doesn't know about. The new XML formatter (and the JSON formatter) wouldn't notice anything odd about the data, because it would just be another event.

Next I created a FlattenableMap annotation, which our JAXB parser spies and interprets as "take this map, and for each key-value pair, fire an appropriate event." To use the above example, there would be a Map in the report object that would key stock ticker symbols to stock objects. In that case, our parser would say "I'm starting a complex object whose name is 'AAPL'." All the infrastructure would then fire appropriately, and you'd end up with


<report>
    <tradedStocks>
        <AAPL>
           <bought>100</bought>
           <sold>100</sold>
        </AAPL>
    </tradedStocks>
</report>

On the front-end side, a JavaScript programmer merely writes:


   report.tradedStocks.AAPL

This implementation also means that no matter what stocks show up in the report, they'll be pushed out to the client (assuming the stocks get loaded into the Map). There's no need to maintain the code to add "allowed" stocks. So it's automatically maintainable based on the real data in front of it, even if that data changes (which, in my case, it certainly will).

You could fairly point out that this means we're not using valid XML. You could certainly not construct a DTD or schema for this logic. But the reality is that eventually, we probably will have needs defined enough to allow us to construct the equivalent of report.AAPL, at which point we can have a more rigid schema. In effect, our schema is still so much in flux that every week or so creates a different schema. My code keeps up with those changes, even without me doing any new work.

Thursday, March 4, 2010

Highest-Scoring Word In WordCrasher

I've become a fan of WordCrasher on the iPhone. Think of it as <insert your favorite word game> meets Tetris: You tap on bubbles, which are falling from the top, to make words and clear them from the screen.

I've unlocked most of the achievements, but there are two secret achievements that have escaped me. I'm convinced that one of them is finding the highest-scoring word in the dictionary. The in-game view of the leaderboards, as of this writing, shows just four people who have managed to find a 2,300-point word, the highest recorded score.

What is the highest-scoring word? Well, I don't know. But I wrote a Ruby script to make an educated guess. I assumed Kevin Ng, the developer, used the Official Scrabble Dictionary as his dictionary (though his game omits at least ort and gams).

So I wrote this script, which takes a path to a dictionary file as a command-line argument:



$letterScores = {
  'a' => 10,
  'b' => 20,
  'c' => 20,
  'd' => 20,
  'e' => 10,
  'f' => 30,
  'g' => 30,
  'h' => 20,
  'i' => 10,
  'j' => 50,
  'k' => 20,
  'l' => 10,
  'm' => 30,
  'n' => 10,
  'o' => 10,
  'p' => 20,
  'q' => 80,
  'r' => 10,
  's' => 10,
  't' => 10,
  'u' => 10,
  'v' => 30,
  'w' => 30,
  'x' => 50,
  'y' => 30,
  'z' => 50 }
  
def calc_word_score(word) 
   sum = 0
   word.split(//).each do |char|
      sum = sum + $letterScores[char] if $letterScores[char]
   end
   sum * word.length
end


File.open(ARGV[0]) do |file|
    file.each_line do |word|
        score = calc_word_score(word.downcase)
        isBest = (score == 2300)
        puts "#{score} #{word}" if isBest
    end
end

I put in 2,300 as a score because that's what people have achieved. However, in the basic Scrabble dictionary, there's no word that scores exactly 2,300; there's one word that scores 2,350: zyzzyvas. So instead of the official Scrabble dictionary, I used the Enable wordlist (both wordlists are available from the National Puzzlers' League

With the Enable list, I found two words that scored exactly 2,300: showbizzy and whizzbang. I have yet to get a board where I can spell any of these, but I'm going to be trying all of them as soon as I can.

Saturday, February 6, 2010

Tail Recursion

The annoying thing about being a writer who has focused a lot on learning his craft is that I now have a constant editorial chatter now when I'm reading. Typos, awkward sentences, factual problems. They all crop up and prevent me from just taking in what I'm reading.

I was reading through a Scala book the other day, and I noticed this blurb in a section about tail recursion.

(If you don’t fully understand tail recursion yet, see Section 8.9).

8.10 Conclusion

The editor in my brain pounced on this end sentence, which cross-references to the same section the reader has just finished.

It took a beat before the programmer side of my brain woke up and noticed the joke.

Tail recursion is when the last instruction in a method is a call to the same method. To take an example from the book,


def boom(x: Int): Int =
         if (x == 0) throw new Exception("boom!")
        else boom(x - 1) + 1

In this specific case, boom calls itself on the last line of the method. That's tail recursion.

And the last sentence in that section is a perfect example.

Monday, February 1, 2010

Print To URL Via Smartphone

According to this Mashable post, Microsoft has unveiled a new "tagging" system that would let print publication have a smartphone-readable link so that readers could visit a webpage referenced in an article by pointing their smartphones at it.