Wednesday, March 17, 2010

Dynamic XML Schema Elements With JAXB

I solved an interesting problem the other day at work, and I wanted to write about it here, mostly so I don't forget how I did it.

I can't talk about what I'm doing at work at the moment, but I came up with a scenario from a different industry that presents some of the same problems.

Let's say you're building a web service for a stock trading desk. You generate a variety of reports in XML and JSON so that you can build a robust, AJAX-y front end solution.

Imagine that the head of the desk wants a report that gives a summary of all the stocks traded that day, complete with volume bought and volume sold. The desk trades different stocks each day, of course, and there are a vast number of valid ticker symbols that could appear, with more coming online as companies go public and with some disappearing as companies get delisted.

You might end up with an XML structure that looks something like this:

<report>
<tradedStocks>
<stock>
<symbol>AAPL</symbol>
<bought>100</bought>
<sold>100</sold>
</stock>

</tradedStocks>
</report>


With a corresponding JSON structure like this:

{"report":{"tradedStocks":[
{"symbol":"AAPL","bought":"100","sold":"100"},
…]}}


So far, so good. Now imagine that the head of the desk wants to treat AAPL differently. Instead of being mixed in with the other stocks traded that day, it should be at the head of the list and printed in green.

When you code this special-case logic on the front end, it will probably look something like this (in JavaScript):

function findSymbol(symbolName) {
for (stock in report.tradedStocks) {
if(stock.symbol == symbolName) {
return stock;
}
}
return undefined;
}

aapl = findSymbol("AAPL");


Even wrapped in a function, that's a bit cumbersome. Plus, if the desk has traded hundreds of stocks, that iteration can be time-consuming, especially if the head of the desk wants to call out stocks that the desk shouldn't be trading: The code goes through the entire list of stocks only to return undefined.

It would be easier and more readable to be able to do something like this:

aapl = report.tradedStocks.AAPL


But that would necessitate an XML structure in which the element name stock was replaced by the element name AAPL. Which would in turn mean that the elements allowed under tradedStocks were drawn from a very large list of ever-changing element names. In essence, subelements under tradedStocks could literally be anything.

XML doesn't really allow for that. It assumes you have a well-defined structure. And tools like JAXB build on that.

You could put the special-case logic on the server-side, of course. Grab the AAPL data from the list of traded stocks, and make a sub-element of report be AAPL with further subelements showing the data. That would repeat data, which may not be the end of the world, but what about the next time the head of the desk wanted special-case logic? More server-side custom logic.

Here's how I solved the problem.

First, I changed the logic in our XML formatter. My view layer is isolated from the rest of the application, and within it it shuffles the model to different formatters depending on the format that was requested: xml goes to the XML formatter, json goes to the JSON formatter, and so forth.

The first incarnation of our XML formatter did the obvious thing: It just used the JAXB engine to spit out the XML based on the JAXB annotations. But I made it act more like our JSON formatter, which receives events from a parser that analyzes the JAXB annotations and constructs the output based on those events.

Why switch? Because I wanted custom annotations. By making our XML formatter act as an event listener too, I could interject events based on annotations that JAXB doesn't know about. The new XML formatter (and the JSON formatter) wouldn't notice anything odd about the data, because it would just be another event.

Next I created a FlattenableMap annotation, which our JAXB parser spies and interprets as "take this map, and for each key-value pair, fire an appropriate event." To use the above example, there would be a Map in the report object that would key stock ticker symbols to stock objects. In that case, our parser would say "I'm starting a complex object whose name is 'AAPL'." All the infrastructure would then fire appropriately, and you'd end up with


<report>
<tradedStocks>
<AAPL>
<bought>100</bought>
<sold>100</sold>
</AAPL>
</tradedStocks>
</report>


On the front-end side, a JavaScript programmer merely writes:

report.tradedStocks.AAPL


This implementation also means that no matter what stocks show up in the report, they'll be pushed out to the client (assuming the stocks get loaded into the Map). There's no need to maintain the code to add "allowed" stocks. So it's automatically maintainable based on the real data in front of it, even if that data changes (which, in my case, it certainly will).

You could fairly point out that this means we're not using valid XML. You could certainly not construct a DTD or schema for this logic. But the reality is that eventually, we probably will have needs defined enough to allow us to construct the equivalent of report.AAPL, at which point we can have a more rigid schema. In effect, our schema is still so much in flux that every week or so creates a different schema. My code keeps up with those changes, even without me doing any new work.

Thursday, March 4, 2010

Highest-Scoring Word In WordCrasher

I've become a fan of WordCrasher on the iPhone. Think of it as <insert your favorite word game> meets Tetris: You tap on bubbles, which are falling from the top, to make words and clear them from the screen.

I've unlocked most of the achievements, but there are two secret achievements that have escaped me. I'm convinced that one of them is finding the highest-scoring word in the dictionary. The in-game view of the leaderboards, as of this writing, shows just four people who have managed to find a 2,300-point word, the highest recorded score.

What is the highest-scoring word? Well, I don't know. But I wrote a Ruby script to make an educated guess. I assumed Kevin Ng, the developer, used the Official Scrabble Dictionary as his dictionary (though his game omits at least ort and gams).

So I wrote this script, which takes a path to a dictionary file as a command-line argument:


$letterScores = {
'a' => 10,
'b' => 20,
'c' => 20,
'd' => 20,
'e' => 10,
'f' => 30,
'g' => 30,
'h' => 20,
'i' => 10,
'j' => 50,
'k' => 20,
'l' => 10,
'm' => 30,
'n' => 10,
'o' => 10,
'p' => 20,
'q' => 80,
'r' => 10,
's' => 10,
't' => 10,
'u' => 10,
'v' => 30,
'w' => 30,
'x' => 50,
'y' => 30,
'z' => 50 }

def calc_word_score(word)
sum = 0
word.split(//).each do |char|
sum = sum + $letterScores[char] if $letterScores[char]
end
sum * word.length
end


File.open(ARGV[0]) do |file|
file.each_line do |word|
score = calc_word_score(word.downcase)
isBest = (score == 2300)
puts "#{score} #{word}" if isBest
end
end



I put in 2,300 as a score because that's what people have achieved. However, in the basic Scrabble dictionary, there's no word that scores exactly 2,300; there's one word that scores 2,350: zyzzyvas. So instead of the official Scrabble dictionary, I used the Enable wordlist (both wordlists are available from the National Puzzlers' League

With the Enable list, I found two words that scored exactly 2,300: showbizzy and whizzbang. I have yet to get a board where I can spell any of these, but I'm going to be trying all of them as soon as I can.