Tapestry Training -- From The Source

Let me help you get your team up to speed in Tapestry ... fast. Visit howardlewisship.com for details on training, mentoring and support!

Wednesday, February 18, 2009

Speeding up Tapestry 5.1

I've very excited about some big changes coming in Tapestry 5.1: A significant increase in performance.

This may come as a surprise ... that performance changes are even needed! Tapestry's response time is nearly instantaneous for typical applications. I'm really talking about atypical applications ... applications that push the boundaries on performance and scalability.

This has been driven by a couple of clients who are building large applications on Tapestry: both large numbers of concurrent users, and very complex pages, containing hundreds of components that, due to loops, render thousands of times.

One particular client, Lithium, has been moving from a JSP to a Tapestry 5 solution for their network of community sites. They've been a great help in terms of tracking performance problems and solutions.

Make it Work. Make it Right. Make it Fast.

Tapestry 5 is a complete rewrite of the aging Tapestry code base ... though its hard for me to think of it as "new", since I've been working on it since 2006! Still, the first part of Tapestry 5 was to make it work and Tapestry has lead a lot of innovation here, especially in terms of live class reloading, and the new Inversion of Control container. By the time of the final release, Tapestry was "right" and it was fast enough ... but no effort had been made at any point to make it fast. Tapestry 5.0 was coasting along on good basic design, and the raw speed of Java. With Lithium's input, I've been working to make the actual code live up to the performance goals.

Comparing with JavaServer Pages

Comparing Tapestry application performance can be an uphill battle, since Tapestry's memory usage and processing pattern is completely different than something simple, like a servlet or JavaServer Page (JSP). One hurdle that may never be surmounted is the intermediate representation: JSPs have none; the first bit out output will stream to the client, which is brutallly efficient.

Tapestry, by contrast, actually renders templates and components into a light-weight DOM (Document Object Model). Some post-processing of the DOM then occurs ... some new elements are added related to JavaScript and other features. Then the DOM is rendered to a character stream. Tapestry is always going to require more memory per request and take longer to begin rendering content.

Further, Tapestry pages are full, stateful objects, rather than the singleton objects that JSPs compile down to. This too represents more overhead: to create them in the first place, to manage (and reset) their internal state, and to pool them between requests.

Interestingly, for smaller pages, Tapestry is as fast or faster than JSPs. You can see why if you look at the Jasper JSP implementation: it spends a lot of time managing pools of JSP tag instances, constantly checking them out, configuring them for a single use, then cleaning them up and putting them back into the pool. Tapestry uses a much more coarse caching policy (entire pages, not components within pages) and has more opportunities to optimize the flow of data within the page, between components.

Speeding up Page Rendering

My first pass at improving performance was to optimize Tapestry rendering state machine. Tapestry breaks the rendering of each individual component up into a series of states: Tapestry Rendering States

A component may provide a render phase method for each state. But most components will supply methods for just one or two of these phases. Some phases only make sense for components with templates (not all components do). Tapestry 5.0.18 grinds through each state for each component regardless ... a bit of thrash in the middle of rendering. By analyzing which render phase methods a component actually implements, Tapestry 5.1 is able to optimize away large chunks of the state machine. This results in considerably fewer render operations: less work to accomplish the exact same output. In my sample test application the number of render operations declined 25 - 30%. More importantly, combined with a few spot optimizations, the Tapestry application came down to spitting distance to the JSP application in terms of performance, even though it had vastly more complex behavior.

I actually experimented with a number of other options which I won't detail here. An important part was to measure performance, using JMeter and YourKit. I spent several days analyzing hot spots and trying out different theories.

Speeding up Page Assembly

This was a much more interesting change for real applications and real development. In Tapestry, pages are objects with internal state (not singletons the way servlets and Struts Actions are); this means we need to create multiple instances of pages when two or more requests reference the same page simulateneously.

Instantiating a page, the function of the PageLoader service, is rather involved; it must coordinate the contents of the page's template with the page's components and their templates; it must match up attributes in templates to parameters of components and hook those up as well. One thing I noticed is that there was a lot of duplicated effort when the same component is used multiple times, either on the same page, or on multiple pages.

My approach here was to split page instantiation into two phases: a one time analysis phase to figure out what needs to be done to actually construct a page, and a second repeatable phase to perform the construction. The goal here was to limit the amount of computation necessary when constructing a page (especially the second time).

The end result is a lot of code like this:

   private void expansion(AssemblerContext context)
    {
        final ExpansionToken token = context.next(ExpansionToken.class);

        context.add(new PageAssemblyAction()
        {
            public void execute(PageAssembly pageAssembly)
            {
                ComponentResources resources = pageAssembly.activeElement.peek().getComponentResources();

                RenderCommand command = elementFactory.newExpansionElement(resources, token);

                pageAssembly.addRenderCommand(command);
            }
        });
    }

This is a big win for development in large projects with many large and complex shared components, cutting down on the refresh time after change to a Java component class.

Speeding Up The Client

Improvements to the server side are one thing, but in many cases, a bigger win is optimizing the client side. Web applications are more than just individual HTML pages: all the JavaScript, stylesheets, images and other assets are just as important to a user's perception of performance.

Tapestry 5.1 addresses this in two ways: versioning and compression.

Versioning

In Tapestry 5.0, classpath assets (stored in JARs) are exposed through the Tapestry filter. These assets end up with a URL that includes a version number, and are provided to the client with a far-future expires header. This means that the client web browser will aggresively cache the file. This doesn't help with the first hit to a web site, but makes a big difference when navigating through the site, or making return visit, since most of what the browser needs to get content up on the screen will already be present without an HTTP request.

Tapestry 5.1 extends this: context assets (files stored in the web application context) can now be exposed with an alternate URL that also incudes an application-defined version number, and also get the far-future expires header.

Compression

Tapestry 5.1 also adds content compression: if the client supports it, then rendered pages and static assets that exceed a configurable size can be sent to the client as a GZIP compressed stream. This does help with the first visit to the application.

There's an advantage to letting Tapestry do the compression; Tapestry will cache the compressed bytestream, so that later requests (from other clients) for the same content will get the compressed version without paying the server-side cost to compress the content. The more traditional approach, using a servlet filter, doesn't have the ability to determine what content is dynamic and what is static, so it has to compress and re-compress the same content blindly.

Backwards Compatibility

The crowning achievement of all these changes is the compatibility issue: the only things that changed were internal implementations and some internal interfaces. Existing Tapestry applications can upgrade and immediately benefit from all the changes with at most a recompile.

Part of my goals for Tapestry is to ensure that your application can start small and grow with you. These performance improvements will not be visible for a small in-house application, or a niche application ... but once your application is successful and grows in scope and popularity, the performance you need is already in there by default.

6 comments:

Unknown said...

Wow, great news!
Do you have an expectation as for when we'll have a 5.1 release available?

Unknown said...

hi howard, great news. are there any performance improvements within Tapestry IOC? There is the PeformanceComparion test class within the Guice distribution (or via SVN) that tests the speed of spring, guice and manual object creation. i added Tapestry IOC and found that it was 10 times faster than spring but still 3-4 times slower than guice. i know tapestry ioc offers more functionality than guice but still its interesting. especially as guice only uses java proxies.

g,
kris

Unknown said...

A link to the performance comparison would be nice. I'd be interested in that comparison in terms of runtime speed (once all services have been instantiated) but Guice may win there; I don't think it uses proxies as aggresively as T5 IoC (which is why I find its approach to service scope very limiting). A method invocation of a Tapestry service goes to the proxy, which passes through a quick synchronized block to obtain the core service implementation (or the outermost decorator or advice object). In any case, even without the synchronized block, you are paying for an extra method invocation. Further, Hotspot can optimize better when a given interface has exactly one implementation, but that's never the case in T5 IoC because of the runtime-generated proxies.

Still, if services talking to services is 1/1000th of the runtime overhead, then the fact that in Guice it is 1/10000th isn't going to make a measurable difference.

Unknown said...

I'm going to be starting a vote on 5.1 this week. This will just be to release an alpha snapshot (rather than relying on a nightly snapshot). I hope to progress through to a 5.1 final release in short order, however.

Massimo said...

One sentence hits my attention:

There's an advantage to letting Tapestry do the compression; Tapestry will cache the compressed bytestream, so that later requests (from other clients) for the same content will get the compressed version without paying the server-side cost to compress the content.

I didn't have a chance to look at the code yet but that's an interesting feaures, very one indeed, but how can that deal with "cookies dependant" content?

Unknown said...

If its not clear, the caching of GZIPed content is for static assets, not rendered pages.