Wednesday, March 17, 2010

Dynamic XML Schema Elements With JAXB

I solved an interesting problem the other day at work, and I wanted to write about it here, mostly so I don't forget how I did it.

I can't talk about what I'm doing at work at the moment, but I came up with a scenario from a different industry that presents some of the same problems.

Let's say you're building a web service for a stock trading desk. You generate a variety of reports in XML and JSON so that you can build a robust, AJAX-y front end solution.

Imagine that the head of the desk wants a report that gives a summary of all the stocks traded that day, complete with volume bought and volume sold. The desk trades different stocks each day, of course, and there are a vast number of valid ticker symbols that could appear, with more coming online as companies go public and with some disappearing as companies get delisted.

You might end up with an XML structure that looks something like this:

<report>
<tradedStocks>
<stock>
<symbol>AAPL</symbol>
<bought>100</bought>
<sold>100</sold>
</stock>

</tradedStocks>
</report>


With a corresponding JSON structure like this:

{"report":{"tradedStocks":[
{"symbol":"AAPL","bought":"100","sold":"100"},
…]}}


So far, so good. Now imagine that the head of the desk wants to treat AAPL differently. Instead of being mixed in with the other stocks traded that day, it should be at the head of the list and printed in green.

When you code this special-case logic on the front end, it will probably look something like this (in JavaScript):

function findSymbol(symbolName) {
for (stock in report.tradedStocks) {
if(stock.symbol == symbolName) {
return stock;
}
}
return undefined;
}

aapl = findSymbol("AAPL");


Even wrapped in a function, that's a bit cumbersome. Plus, if the desk has traded hundreds of stocks, that iteration can be time-consuming, especially if the head of the desk wants to call out stocks that the desk shouldn't be trading: The code goes through the entire list of stocks only to return undefined.

It would be easier and more readable to be able to do something like this:

aapl = report.tradedStocks.AAPL


But that would necessitate an XML structure in which the element name stock was replaced by the element name AAPL. Which would in turn mean that the elements allowed under tradedStocks were drawn from a very large list of ever-changing element names. In essence, subelements under tradedStocks could literally be anything.

XML doesn't really allow for that. It assumes you have a well-defined structure. And tools like JAXB build on that.

You could put the special-case logic on the server-side, of course. Grab the AAPL data from the list of traded stocks, and make a sub-element of report be AAPL with further subelements showing the data. That would repeat data, which may not be the end of the world, but what about the next time the head of the desk wanted special-case logic? More server-side custom logic.

Here's how I solved the problem.

First, I changed the logic in our XML formatter. My view layer is isolated from the rest of the application, and within it it shuffles the model to different formatters depending on the format that was requested: xml goes to the XML formatter, json goes to the JSON formatter, and so forth.

The first incarnation of our XML formatter did the obvious thing: It just used the JAXB engine to spit out the XML based on the JAXB annotations. But I made it act more like our JSON formatter, which receives events from a parser that analyzes the JAXB annotations and constructs the output based on those events.

Why switch? Because I wanted custom annotations. By making our XML formatter act as an event listener too, I could interject events based on annotations that JAXB doesn't know about. The new XML formatter (and the JSON formatter) wouldn't notice anything odd about the data, because it would just be another event.

Next I created a FlattenableMap annotation, which our JAXB parser spies and interprets as "take this map, and for each key-value pair, fire an appropriate event." To use the above example, there would be a Map in the report object that would key stock ticker symbols to stock objects. In that case, our parser would say "I'm starting a complex object whose name is 'AAPL'." All the infrastructure would then fire appropriately, and you'd end up with


<report>
<tradedStocks>
<AAPL>
<bought>100</bought>
<sold>100</sold>
</AAPL>
</tradedStocks>
</report>


On the front-end side, a JavaScript programmer merely writes:

report.tradedStocks.AAPL


This implementation also means that no matter what stocks show up in the report, they'll be pushed out to the client (assuming the stocks get loaded into the Map). There's no need to maintain the code to add "allowed" stocks. So it's automatically maintainable based on the real data in front of it, even if that data changes (which, in my case, it certainly will).

You could fairly point out that this means we're not using valid XML. You could certainly not construct a DTD or schema for this logic. But the reality is that eventually, we probably will have needs defined enough to allow us to construct the equivalent of report.AAPL, at which point we can have a more rigid schema. In effect, our schema is still so much in flux that every week or so creates a different schema. My code keeps up with those changes, even without me doing any new work.

2 comments:

  1. Hi, Derrick,

    This looks very interesting to me. I'm in a scenario that necessitates sth very similar (if not identical):

    I have a long list of simple key/value pairs to marshall to a list of XmlElements. For each element, I'd like the key to be the element name and the value would be the element value.
    Eg,
    LastName -> Smith
    marshalled to
    Smith

    It would be a bad pain to create list of Java class properties for each key/value pair with the @XmlElement annotation though

    Do you think your trial can also be used for unmarshalling? Is it possible that you can share with me the sample code? Really interested in knowing how to achieve this.

    Thanks a billion

    ReplyDelete
  2. Hi, Ginger.

    Because of the particulars of my system, I don't have to worry about unmarshalling, so I don't know how well it would work. You'd have to somehow distinguish from elements that were subobjects versus keys in a map. I guess in your @XmlElement for a given element, you could somehow flag to treat the items as keys, but I don't have a good idea what that would look like.

    Sharing the source code is a bit tricky, since it's owned by my employer, but I'll see if there's some snippet I can extract.

    ReplyDelete