Tapestry Training -- From The Source

Let me help you get your team up to speed in Tapestry ... fast. Visit howardlewisship.com for details on training, mentoring and support!

Friday, November 11, 2005

Improving Tapestry performance

I just spent the week with a high profile client that is interested in potentially using Tapestry for a very large scale site ... millions of hits per hour. Their in-house framework is quite capable of operating at this scale, through a combination of draconian restrictions on database access and server-side state, and a total avoidance of any kind of reflective object access. These are people who literally cannot give an inch on performance.

One of the ideas that bounced around was something promised for some future release of OGNL: bytecode enhancement. That is, in some cases, OGNL 3 is expected to identify places where it can create a class on the fly to expedite access to a property, rather than always relying on reflective access as it does today.

Alas, that hasn't happened yet, and Tapestry is still using OGNL 2.6.7.

But, I thought, what if we created a new binding prefix to use instead of OGNL, for this purpose. Because of HiveMind, this approach can be packaged seperately from the framework proper, and plug right in.

... and it works. I built a little peformance test harness and tried to figure out how many nanoseconds it takes to perform an operation; an operation involves a read and then an update. Here's one of the operations from the harness:

        Op op = new Op()
        {
            public void run(PropertyAccessor accessor)
            {
                Long value = (Long) accessor.readProperty();

                long primitive = value.longValue();

                accessor.writeProperty(new Long(primitive + 1));
            }
        };

The PropertyAccessor object is either created from bytecode, or implemented using OGNL (so that we can make the comparisons).

I did a number of test runs, with a number of operations:

              10000 iterations |  Direct ns |    OGNL ns
------------------------------ | ---------- | ----------
                 name - warmup |    4288.00 |  847364.00
                          name |    1777.00 |   18426.00
                  int - warmup |    2891.00 |   81127.00
                           int |     838.00 |    7497.00
                 long - warmup |    2969.00 |   28207.00
                          long |     617.00 |    7256.00

             100000 iterations |  Direct ns |    OGNL ns
------------------------------ | ---------- | ----------
                 name - warmup |    4282.00 |  819634.00
                          name |     972.00 |    7527.00
                  int - warmup |    2947.00 |   74425.00
                           int |     242.00 |    5955.00
                 long - warmup |    2910.00 |   27492.00
                          long |     209.00 |    6046.00

             500000 iterations |  Direct ns |    OGNL ns
------------------------------ | ---------- | ----------
                 name - warmup |    4182.00 |  852125.00
                          name |     857.00 |    6756.00
                  int - warmup |    2958.00 |   81820.00
                           int |     170.00 |    5724.00
                 long - warmup |    2793.00 |   34990.00
                          long |     215.00 |    5785.00

          2,000,000 iterations |  Direct ns |    OGNL ns
------------------------------ | ---------- | ----------
                 name - warmup |    4251.00 |  843185.00
                          name |     823.00 |    6553.00
                  int - warmup |    2927.00 |   48788.00
                           int |     144.00 |    5799.00
                 long - warmup |    2961.00 |   34945.00
                          long |     180.00 |    6173.00

The results show that the direct access is around 10x faster than reflective access, which is in line with the general documentation about reflection in JDK 1.5. Still, I'm troubled that the cost per operation seems to continue going down as the number of operations increases. This could be the effect of hotspot (though the elapsed time seems short for hotspot to get very involved) ... or it could represent a problem in my performance test fixture.

To use this, you just use the prefix "prop:" instead of "ognl:". And, of course, it only works for simple properties, not property paths or the full kind of expressions used in OGNL.

I'm hosting the code on JavaForge and will make some kind of release available soon. Perhaps it will migrate into the framework proper at some point.

6 comments:

Anonymous said...

What about using janino or javassist to compile java expressions?
I've been using janino with great success to "compile" reflective code.

It's really kick ass when you have a AST and, instead of visiting nodes and using reflection, you generate some code that just do the trick.

Anonymous said...

Great thing, why not contribute it to tacos for the time being?

Unknown said...

Tacos: Perhaps eventually, but this is a chance for me to experiment with JavaForge and maybe Maven2. It's also not ready for primetime just yet, since I'm still investigating how to make it smarter about type conversions.

As I remember, Drew was working on a "master plan" for this kind of thing in OGNL and was going to be bytecode library agnostic (some kind of plugin system that would see which bytecode library, if any, was available). Instead, he got side-tracked into a new job and some other improvements to OGNL.

Anonymous said...

May be introduce a compiled page?
p.s. does you client know about the rewind issues?

Anonymous said...

"The results show that the direct access is around 10x faster than reflective access, which is in line with the general documentation about reflection in JDK 1.5. "
JVM generates bytecode to optimize reflective calls since 1.4.2, probaby you made a mistake in this test. Direct access is faster than reflection if you know object type in advance and can avoid prameter wrapping, but you can not avoid it in dynamic expression language.

Anonymous said...

It does not look like the JDK does that on Windows. I measured: if a call to a member function takes 1 unit of time, an interface call takes about 2 units and an invocation through Method.invoke takes 10-15 units on JDK1.5/Windows.

Others have even worse results: http://www.jot.fm/issues/issue_2005_12/article3