Tapestry Central: ANTLR and code generation

Sunday, December 07, 2008

ANTLR and code generation

In between packing (I'm moving across town) I'm doing a bit of work for Tapestry 5.1, TAP5-79: Improve Tapestry's property expression language to include OGNL-like features. People really miss being able to do a few cool things in OGNL, such as create lists and maps on the fly ... this is not uncommon when creating a page activation context.

The 5.0 code was based on regular expressions and hand parsing because it only supported a very limited number of options. For 5.1, the grammar will grow considerably, adding options for list and map creation, method invocation (with parameters), and perhaps property projection and list filtering. Hand-tooled parsers aren't going to keep up, so it was time to switch to a more complete solution.

I ended up choosing ANTLR because it seems well supported, has a book and good online documentation, and a set of supporting tools. ANTLR is used elsewhere as well, for example by Hibernate to parse HQL.

There is even decent support for ANTLR with Maven (while Tapestry still builds with Maven, something I hope to address soon). Because of this, I only check my grammar files into SVN, not the generated files; on the continuous integration server, the ANTLR plugin generates the lexer and parser code fresh for each build.

The only real down-side is the runtime dependency ... about 113K and problematic if Tapestry is ever combined with some other tool that has a dependency on a different and incompatible version. Hibernate (for better or worse) uses the ANTLR2 runtime library, which uses different package names.

My first step was to re-create Tapestry 5.0's behavior on top of ANTLR. Because of some complexity in the lexical part of the grammar (that darn ".." operator!) it took quite a bit of head bashing. I did eventually figure it out, and did what any self-respecting coder should do ... leave a simple, useful, documented example for the next poor slob.

Now I'm back into the side of code generation; Tapestry's property expression grammar is converted directly into bytecode; the intermediate language is Javassist, which is a significant subset of Java. So I parse the property expressions into a AST (abstract syntax tree), then generate what looks like Java code from that, which gets compiled in-process and turned directly into instantiable classes.

How would you test something like that? At one time, I would try to unit test that the generated code was correct. Eventually I hit some bugs where my tests passed, but the generated code was incorrect.

With code generation, there is no such thing as a unit test, it's always an integration test. You can try and limit the scope, but there's too many moving parts for a unit test to useful or credible.

Instead, I test my parsing and code generation logic by testing the generated objects' behavior. So I feed in a large number of expressions and objects to have expressions evaluated upon, and check that the results I get by reading and setting property expressions is correct. If I get the right results, I know the generated code is good.

5 comments:

Anonymous said...: take a look at javacc, https://javacc.dev.java.net/, if you want a runtime-dep free option (there's also a book and maven plugin :); 4:45 PM
Yves Zoundi said...: JavaCC is nice but it lacks books and documentation compared to ANTLR.
ANTLR can target many programming languages while JavaCC is Java centric. However JavaCC doesn't need any runtime dependencies. I also find JavaCC easier to work with the Lexer state, for something similar to incremental lexing.

Testing can be problematic but you might be able to add basic unit tests to integration tests. You could test the contents of simple expected generated code strings. You could load the generated code using the Java Compiler API or maybe some Beanshell.

Maven has its annoyances and problems, but when I look at what it can do compared to many other tools, I am truly convinced that Maven doesn't suck too much :-). Startup time, runtime, IDE support and documentation improved a lot since Maven 1.x. Ant, Sant, Ivy, Forrest, etc. also have their issues and/or limitations.

Yves Zoundi
VFSJFileChooser : http://vfsjfilechooser.sourceforge.net
XPontus XML Editor : http://xpontus.sf.net
Blog : http://yveszoundi.blogspot.com; 4:50 AM
Howard Lewis Ship said...: OGNL isn't broken and it isn't non-maintained (Jesse Kuhnert has taken over development of it). However, OGNL has performance issues: it was built for JDK 1.2, so it can't take advantage of JDK 1.5 concurrency features and it has a few choke points. And it is (despite Jesse's changes) still quite reflection based. Finally, Tapestry needs access to annotations of the bound property (this drives default validation and many other things).; 8:46 AM
Renat Zubairov said...: It would be even more interesting to see how good coverage of _generated_ code does your unit tests have :)
I mean if AST to Text transformation generates some if statements then it should be multiple time tested for all conditions inside. What do you think about running a cobertura on the code you submit to Javassist to see how good coverage of it do you have in your unit tests?; 11:49 AM
Howard Lewis Ship said...: Renat,

Code coverage is actually pretty good for the generated code; this is the current nightly coverage report.

I've been focusing on ensuring that all the code in PropertyConduitSourceImpl is tested.

A lot of the additional code concerns error recovery, something I haven't started in on yet. I'm actually pretty good with making the parser very rigid, as long as it can report the error properly.; 12:26 PM

Tapestry Central

Tapestry Training -- From The Source

Sunday, December 07, 2008

ANTLR and code generation

5 comments:

FeedBurner FeedCount

Labels

Followers

About Me

DZone Most Valuable Blogger