Saturday, March 26, 2011

Combining Gradle with Antlr3

I've been going through a relatively painless process of converting Tapestry from Maven to Gradle, and am thrilled with the results. My biggest stumbling point so far was Tapestry's use of Antlr3 for its property expression language.

The built-in support for Antlr only went as far as Antlr2. The Maven plugin I had been using understood Antlr3. After a bit of research and hacking, this is what I came up with as a solution for Tapestry:

description="Central module for Tapestry, containing all core services and components"

antlrSource = "src/main/antlr"
antlrOutput = "$buildDir/generated-sources/antlr"

configurations {
  antlr3
} 

sourceSets.main.java.srcDir antlrOutput

dependencies {
  compile project(':tapestry-ioc')
  compile project(':tapestry-json')
  
  provided project(":tapestry-test")
  provided "javax.servlet:servlet-api:$servletAPIVersion"

  compile "commons-codec:commons-codec:1.3"

  // Transitive will bring in the unwanted string template library as well
  compile "org.antlr:antlr-runtime:3.3", { transitive = false }

  // Antlr3 tool path used with the antlr3 task
  antlr3 "org.antlr:antlr:3.3"
}

// This may spin out as a plugin once we've got the details down pat

task generateGrammarSource {
  description = "Generates Java sources from Antlr3 grammars."
  inputs.dir file(antlrSource)
  outputs.dir file(antlrOutput)
} << {
  mkdir(antlrOutput)
  
  // Might have a problem here if the current directory has a space in its name
  
  def grammars = fileTree(antlrSource).include("**/*.g")
    
  ant.java(classname: 'org.antlr.Tool', fork: true, classpath: "${configurations.antlr3.asPath}") {
     arg(line: "-o ${antlrOutput}/org/apache/tapestry5/internal/antlr")
     arg(line: grammars.files.join(" "))
  }
}

compileJava.dependsOn generateGrammarSource

The essence here is to create a configuration (a kind of class path) just for running the Antlr Tool class. The new task finds the grammar files and feeds them to the tool. We also thread the output of the tool as a search path for the main Java compilation task. Finally, we define the inputs and outputs for the task, so that Gradle can decide whether it is necessary to even run the task.

Part of the fun of Gradle is that it is still a Groovy script, so there's a familiar and uniform syntax to defining variables and doing other non-declarative things, such as building up the list of grammar files for the Tool.

As you might guess from some of the comments, this is something of a first pass; the Maven plugin was a bit better at assembling the list of input file names in such a way that the Antlr3 Tool class knew where to write the output Java source files properly; if Tapestry used a number of grammars in a number of different locations, the solution above would be insufficient. It also seems roundabout to use Ant to launch a Java application ... I didn't see an easier way (though I have no doubt its hidden inside the Gradle documentation).

My experience getting this working was mostly positive; there's a very large amount of documentation for Gradle that helped, though it can be a bit daunting, as the information you need is often scattered across a mix of the Gradle DSL reference, the User Guide, the Javadoc and the GroovyDoc. Too often, it feels like a solution is only understandable once finished, working backwards from some internal details of Gradle (such as which exact classes it chooses to instantiate in a given situation) back through the various interfaces, Java classes, and Groovy MetaObject extensions to those classes.

In fact, key parts of what I did ultimately accomplish were discovered through web searches, not in the documentation. But, that also means that the system works.

Of course, this is the pot calling the kettle black ... one criticism of Tapestry can be paraphrased as we can customize it to do anything, and in just a few lines of code, but it can take three days to figure out where those lines of code go.

At the end of the day, I'm much happier with Gradle; the build process is faster, the build scripts are tiny and much, much easier to maintain, and the feedback from the tool is excellent. There's still many more issues to work out ... mostly in terms of Apache and Maven infrastructure:

  • Ensuring the Maven artifacts are created properly, with the right dependencies in the generated pom.xml
  • Generating a Maven archetype using Gradle
  • Generating JavaDoc and Tapestry component documentation with Gradle, along with a minimal amount of pages to link it together (akin to the Maven site plugin)
  • Generating source and binary artifacts and getting everything uploaded to the Apache Nexus properly

Regardless, I think all of these things will come together in good time. I'm not going back, and dearly hope to never use Maven again!

Wednesday, March 16, 2011

Better Namespacing in JavaScript

In my previous post, I discussed some upcoming changes in Tapestry's client-side JavaScript. Here we're going to dive a little deep on an important part of the overall package: using namespaces to keep client-side JavaScript from conflicting.
I'm not claiming to originate these ideas; they have been in use, in some variations, for several years on pages throughout the web.

Much as with Tapestry's Java code, it is high time that there is a distinction between public JavaScript functions and private, internal functions. I've come to embrace modular JavaScript namespacing.

One of the challenges of JavaScript is namespacing: unless you go to some measures, every var and function you define gets attached to the global window object. This can lead to name collisions ... hilarity ensues.

How do you avoid naming collisions? In Java you use packages ... but JavaScript doesn't have those. Instead, we define JavaScript objects to contain the variables and functions. Here's an example from Tapestry's built-in library:

Tapestry = {

  FORM_VALIDATE_EVENT : "tapestry:formvalidate",

  onDOMLoaded : function(callback) {
    document.observe("dom:loaded", callback);
  },

  ajaxRequest : function(url, options) {
    ...
  }, 

  ...
};

Obviously, just an edited excerpt ... but even here you can see the clumsy prototype for an abstraction layer. The limitation with this technique is two fold:

  • Everything is public and visible. There's no private modifier, no way to hide things.
  • You can't rely on using this to reference other properties in the same object, at least not inside event handler methods (where this is often the window object, rather than what you'd expect).

These problems can be addressed using a key feature of JavaScript: functions can have embedded variable and functions that are only visible inside that function. We can start to recode Tapestry as follows:

Tapestry = { 
    FORM_VALIDATE_EVENT : "tapestry:formvalidate"
};

function initializeTapestry() {
  var aPrivateVariable = 0;

  function aPrivateFunction() { }

  Tapestry.onDOMLoaded = function(callback) {
      document.observe("dom:loaded", callback);
  };

  Tapestry.ajaxRequest = function(url, options) {
    ...
  };
}

initializeTapestry();

Due to the rules of JavaScript closures, aPrivateVariable and aPrivateFunction() can be referenced from the other functions with no need for the this prefix; they are simply values that are in scope. And they are only in scope to functions defined inside the initializeTapestry() function.

Further, there's no longer the normal wierdness with the this keyword. In this style of coding, this is no longer relevant, or used. Event handling functions have access to variables and other functions via scoping rules, not through the this variable, so it no longer matters that this is often not what you'd expect ... and none of the nonsense about binding this back to the expected object that you see in Prototype and elsewhere. Again, this is a more purely functional style of JavaScript programming.

Often you'll see the function definition and evaluation rolled together:

Tapestry = { 
    FORM_VALIDATE_EVENT : "tapestry:formvalidate"
};

(function() {
  var aPrivateVariable = 0;

  function aPrivateFunction() { }

  Tapestry.onDOMLoaded = function(callback) {
      document.observe("dom:loaded", callback);
  };

  Tapestry.ajaxRequest = function(url, options) {
    ...
  };
})();

That's more succinct, but not necessarily more readable. I've been prototyping a modest improvement in TapX, that will likely be migrated over to Tapestry 5.3.

Tapx = {

  extend : function(destination, source) {
    if (Object.isFunction(source))
      source = source();

    Object.extend(destination, source);
  },
  
  extendInitializer : function(source) {
    this.extend(Tapestry.Initializer, source);
  }
}

This function, Tapx.extend() is used to modify an existing namespace object. It is passed a function that returns an object; the function is invoked and the properties of the returned object are copied onto the destintation namespace object (the implementation of extend() is currently based on utilities from Prototype, but that will change). Very commonly, it is Tapestry.Initializer that needs to be extended, to support initialization for a Tapestry component.


Tapx.extendInitializer(function() {

  function doAnimate(element) {
    ...
  }

  function animateRevealChildren(element) {
    $(element).addClassName("tx-tree-expanded");

    doAnimate(element);
  }

  function animateHideChildren(element) {
    $(element).removeClassName("tx-tree-expanded");

    doAnimate(element);
  }

  function initializer(spec) {
    ...
  }

  return {
    tapxTreeNode : initializer
  };
});

This time, the function defines internal functions doAnimate(), animateRevealChildren(), animateHideChildren() and initializer(). It bundles up initializer() at the end, exposing it to the rest of the world as Tapestry.Initializer.tapxTreeNode.

This is the pattern going forward as Tapestry's tapestry.js library is rewritten ... but the basic technique is applicable to any JavaScript application where lots of seperate JavaScript files need to be combined together.

Rethinking JavaScript in Tapestry 5.3

I've always had a love/hate relationship with JavaScript; some of the earliest motivations for Tapestry was to "encapsulate that ugly JavaScript stuff so I don't have to worry about it again." However, as I've come to appreciate JavaScript, over time, as a powerful functional language, and not as an incompletely implemented object oriented language, my revulsion for the language has disappeared ... even reversed.

Back around 2006, I started adding the client-side JavaScript features to Tapestry 5; this started with client-side form field validation, and grew to include a number of more sophisticated components. The good news is these features and components are fully encapsulated: they can be used freely throughout at Tapestry application without even knowing JavaScript. Tapestry includes the libraries (and related CSS documents) as needed, and encapsulates the necessary initialization JavaScript. The APIs for this were revamped a bit in Tapestry 5.2, but the core concept is unchanged.

The bad news is that the client-side is directly linked to Prototype and Scriptaculous (which are bundled right inside the Tapestry JAR file). These were great choices back in 2006, when jQuery was new and undocumented (or so my quite fallible memory serves). It seemed safe to follow Rails. Now, of course, jQuery rules the world. I've been talking for a couple of years about introducing an abstraction layer to break down the Prototype/Scriptaculous dependency; meanwhile I've recently seen that Rails and Grails are themselves moving to jQuery.

However, that abstraction layer is still important; I have clients that like MooTools; I have clients that are using YUI and ExtJS.

Certainly, it would have been too ambitious to try to start with such an abstraction layer from day 1. At the time, I had no real idea what the relationship between JavaScript on the client, and the application running on the server, would look like. Also, my JavaScript skills in 2006 are a fraction of what they are now. With several years of coding complex JavaScript and Ajax components for Tapestry, for TapX, and for clients, I think I have a much better understanding of what the APIs and abstraction layers should look like.

So suddenly, I have a number of goals:

  • Allow Tapestry to work on top any JavaScript framework
  • Support Prototype/Scriptaculous and jQuery as substrate frameworks "out of the box"
  • Make the built-in Tapestry library first class: documented and release-on-release compatible
  • Keep backwards compatibility to Tapestry 5.2

What I'm proposing is a gradual transition, over Tapestry 5.3 and 5.4, where new, documented, stable JavaScript APIs are introduced. and Tapestry and 3rd party libraries can code to the new APIs rather than to Prototype/Scriptaculous. The goal is that, eventually, it will be possible to switch the default substrate from Prototype/Scriptaculous over to jQuery.

Wednesday, March 09, 2011

Hibernate w/ transient objects

Am I missing something with Hibernate, or is it pretty darn hard to mix the following:

  • Session-per-request processing (the approach provided by the tapestry-hibernate module)
  • Transient objects (a wizard where a complex object is "built" across multiple request/response cycles)
  • Persistent objects (the transient keeps references to some persistent objects)

Hibernate seems to make it a bit tricky for me here. I get a lot of odd exceptions, because the new object has references and collections that ultimately point to persistent objects that are detached (their session is long gone).

I'm having to write a lot of code to reattach dependencies, just before I render the page (which traverses the transient object, eventually hitting persistent objects) and before persisting the transient object.

I'm having to iterate over various collections and a few fields and lock the object, to convert it back to a persistent object from a transient one:

    public static void reattach(Session session, Object transientObject) {
        if (transientObject != null) {
            session.buildLockRequest(LockOptions.NONE).lock(transientObject);
        }
    }
In other cases, where the transient object has a reference to an object that may already be present in the Session, I must use code like:
  category = (Session) session.get(Category.class, session.getId());

If Tapestry supported it, I suppose some of this would go away if we used a long-running Session that persisted between requests. However, that has its own set of problems, such as coordinating the lifecycle of such a session (when is it started? When is it discarded? What about in a cluster?)

My current solution feels kludgey, and not like Idiomatic Java, more like appeasing the API Gods. I'd really like to see this happen more automatically or transparently ... for instance, when persisting a transient instance, there should be a way for Hibernate to "gloss over" these detached objects and just do what I want. Perhaps its there and I'm missing it?