Tapestry Central: ANTLR and code generation

Sunday, December 07, 2008

ANTLR and code generation

In between packing (I'm moving across town) I'm doing a bit of work for Tapestry 5.1, TAP5-79: Improve Tapestry's property expression language to include OGNL-like features. People really miss being able to do a few cool things in OGNL, such as create lists and maps on the fly ... this is not uncommon when creating a page activation context.

The 5.0 code was based on regular expressions and hand parsing because it only supported a very limited number of options. For 5.1, the grammar will grow considerably, adding options for list and map creation, method invocation (with parameters), and perhaps property projection and list filtering. Hand-tooled parsers aren't going to keep up, so it was time to switch to a more complete solution.

I ended up choosing ANTLR because it seems well supported, has a book and good online documentation, and a set of supporting tools. ANTLR is used elsewhere as well, for example by Hibernate to parse HQL.

There is even decent support for ANTLR with Maven (while Tapestry still builds with Maven, something I hope to address soon). Because of this, I only check my grammar files into SVN, not the generated files; on the continuous integration server, the ANTLR plugin generates the lexer and parser code fresh for each build.

The only real down-side is the runtime dependency ... about 113K and problematic if Tapestry is ever combined with some other tool that has a dependency on a different and incompatible version. Hibernate (for better or worse) uses the ANTLR2 runtime library, which uses different package names.

My first step was to re-create Tapestry 5.0's behavior on top of ANTLR. Because of some complexity in the lexical part of the grammar (that darn ".." operator!) it took quite a bit of head bashing. I did eventually figure it out, and did what any self-respecting coder should do ... leave a simple, useful, documented example for the next poor slob.

Now I'm back into the side of code generation; Tapestry's property expression grammar is converted directly into bytecode; the intermediate language is Javassist, which is a significant subset of Java. So I parse the property expressions into a AST (abstract syntax tree), then generate what looks like Java code from that, which gets compiled in-process and turned directly into instantiable classes.

How would you test something like that? At one time, I would try to unit test that the generated code was correct. Eventually I hit some bugs where my tests passed, but the generated code was incorrect.

With code generation, there is no such thing as a unit test, it's always an integration test. You can try and limit the scope, but there's too many moving parts for a unit test to useful or credible.

Instead, I test my parsing and code generation logic by testing the generated objects' behavior. So I feed in a large number of expressions and objects to have expressions evaluated upon, and check that the results I get by reading and setting property expressions is correct. If I get the right results, I know the generated code is good.

5 comments:

Anonymous4:45 PM
take a look at javacc, https://javacc.dev.java.net/, if you want a runtime-dep free option (there's also a book and maven plugin :)
ReplyDelete
Replies
Yves Zoundi4:50 AM
JavaCC is nice but it lacks books and documentation compared to ANTLR.
ANTLR can target many programming languages while JavaCC is Java centric. However JavaCC doesn't need any runtime dependencies. I also find JavaCC easier to work with the Lexer state, for something similar to incremental lexing.

Testing can be problematic but you might be able to add basic unit tests to integration tests. You could test the contents of simple expected generated code strings. You could load the generated code using the Java Compiler API or maybe some Beanshell.

Maven has its annoyances and problems, but when I look at what it can do compared to many other tools, I am truly convinced that Maven doesn't suck too much :-). Startup time, runtime, IDE support and documentation improved a lot since Maven 1.x. Ant, Sant, Ivy, Forrest, etc. also have their issues and/or limitations.

Yves Zoundi
VFSJFileChooser : http://vfsjfilechooser.sourceforge.net
XPontus XML Editor : http://xpontus.sf.net
Blog : http://yveszoundi.blogspot.com
ReplyDelete
Replies
Howard Lewis Ship8:46 AM
OGNL isn't broken and it isn't non-maintained (Jesse Kuhnert has taken over development of it). However, OGNL has performance issues: it was built for JDK 1.2, so it can't take advantage of JDK 1.5 concurrency features and it has a few choke points. And it is (despite Jesse's changes) still quite reflection based. Finally, Tapestry needs access to annotations of the bound property (this drives default validation and many other things).
ReplyDelete
Replies
Renat Zubairov11:49 AM
It would be even more interesting to see how good coverage of _generated_ code does your unit tests have :)
I mean if AST to Text transformation generates some if statements then it should be multiple time tested for all conditions inside. What do you think about running a cobertura on the code you submit to Javassist to see how good coverage of it do you have in your unit tests?
ReplyDelete
Replies
Howard Lewis Ship12:26 PM
Renat,

Code coverage is actually pretty good for the generated code; this is the current nightly coverage report.

I've been focusing on ensuring that all the code in PropertyConduitSourceImpl is tested.

A lot of the additional code concerns error recovery, something I haven't started in on yet. I'm actually pretty good with making the parser very rigid, as long as it can report the error properly.
ReplyDelete
Replies

Add comment

Please note that this is not a support forum for Tapestry. Requests for help will be deleted. Please subscribe to the Tapestry user mailing list if you are in need of support, or contact me directly for professional (for pay) support.

Spammers: Don't bother. I delete your comments and it's a waste of time for both of us. 垃圾邮件发送者：不要打扰。我删除您的评论和它的时间对我们双方的浪费