Saturday, January 23, 2010

Polyglot Programming

I recently read through The ThoughtWorks Anthology, a collection of essays by Big Thinkers in the realm of systems design. The essays were largely interesting, but one in particular resonated with me: Polyglot Programming. The author made a compelling case for using the Java Virtual Machine — a robust, mature, well-tested infrastructure — as a platform in which any number of languages can co-exist.

Java's a good language, of course, but it's not good at everything. Why not mix in other languages that can run in the virtual machine but offer strengths in the face of Java's weaknesses, asked the author. I've toyed with this idea before, especially with adding Scala's potential for highly concurrent code, but the essay lit a new fire in me.


I came up with a way to try out the concept. We have a bunch of queries in our service layers, and Java blows when it comes to formatting long strings. Without the ability to have one string span multiple lines, you end up with something like this:

String query = "SELECT table_a.*,table_b.* FROM table_a,table_b,table_c " +
"WHERE table_a.some_column = table_b.some_column " +
"AND table_b.some_column = table_c.some_column " +
"AND table_c.id = :idValue";


Easy reading, right? Not only is it annoying to read, it's error prone. I almost always forget a space in one of these long strings, causing SQL exceptions that don't show up until runtime.

Ruby, like many other scripting languages, allows for a "here document" which basically says, "Treat this text following double less-than signs as a double quote and just pull in everything after it until you see the same text again." In Ruby, you might write the query above as follows:

query = <<QUERY
SELECT
table_a.*,
table_b.*
FROM
table_a,
table_b,
table_c
WHERE
table_a.some_column = table_b.some_column
AND table_b.some_column = table_c.some_column
AND table_c.id = :idValue
QUERY


Which is more readable. I admit this is not a monumental problem, but it did offer an opportunity.

Enter JRuby. JRuby is a Ruby interpreter written in Java. Curiously, it now outperforms the C-based interpreter in lots of benchmarks, which is not only a testament to the maturity of the JVM but to the dedicated open-source team that have devoted themselves to improving JRuby. JRuby's main benefit is that you can access the sweeping Java API from within your Ruby scripts, but you can also invoke Ruby scripts from your Java code.

I made a new class called QueryContainer that would serve as a facade for managing the Ruby invocations and giving Hibernate Query objects back to the service layer. No other layer in the code would need to know about invoking Ruby: QueryContainer would translate the scripts into objects useful elsewhere in the system. Inside each Ruby script, I made a class to act as a namespace (because I opted for a singleton of the Ruby interpreter instead of multiple copies), and then inside each class defined hash literals that looked something like this:

QUERY_1 = {
:type=>AppConstants::SQL,
:query => <<QUERY
SELECT
table_a.*,
table_b.*
FROM
table_a,
table_b,
table_c
WHERE
table_a.some_column = table_b.some_column
AND table_b.some_column = table_c.some_column
AND table_c.id = :idValue
QUERY
}


What's that AppConstants::SQL thing? AppConstants is a Java class in our system that has some globally useful constants. Because it's JRuby, I can use constants from my Java classes. We have two query languages in our system: normal SQL and Hibernate's abridged SQL. QueryContainer needs to know which query language it is because Hibernate defines a createQuery method for HSQL and a createSQLQuery method for SQL.

But it gets more complicated. If you have a SQL query that returns everything you need to construct a Hibernate object, you need to tell Hibernate what kind of object it is. (You don't need to do this for HSQL.) I added an entityClass key to the SQL hash literals, and had it reference a Java class object (.java_class when you're in JRuby, since .class has meaning in the Ruby world. In other words:

QUERY_1 = {
:type => AppConstants::SQL,
:entityClass => BusinessObject.java_class,

}


Here's the final flow. Some method in the service layer wants to run a query. It calls a method in the base class called getQueryForKey, passing in the query key it wants. That base class method calls a similar method on a QueryContainer instance variable held by the base class. QueryContainer was initialized with the Ruby script that will act as a resource, and it reaches into it to find the keys in the hash literal with the same name as the key that's been moving through the chain. e.g.,: QuerySet1::QUERY_1[:query]. If it's an HSQL query (QuerySet1::QUERY_1[:type]), QueryContainer just constructs a regular Query object. If it's a SQL query, QueryContainer constructs a SQLQuery object and calls addEntity on it, passing in the Java class from the :entityClass key of the hash literal.

So how does it work? Well, on the one hand, it accomplishes what I wanted. My queries have been factored out into new files, and they've been re-produced in a format that's easier to read and less error-prone. The entire rest of the system is ignorant of their source. It makes the case for adding languages that have strengths (in this case, the relatively minor advantage of string literals that can span multiple lines) to a deploy.

But on the other hand, JRuby seems to have added a sizable chunk of memory to our app. Shortly after I put in this system, our dev server started running into OutOfMemory errors on a regular basis, a process that I've contained somewhat by disabling some other systems. And this is with a singleton of the interpreter. I've found little information about this, and so I'm wondering if JRuby is the way to go. I haven't hooked up a profiler yet to determine the real source, but it's the only thing that's changed.

I've started looking at Groovy as an alternate. At least if I go that route, only QueryContainer needs to change.

No comments:

Post a Comment