Tapestry Training -- From The Source

Let me help you get your team up to speed in Tapestry ... fast. Visit howardlewisship.com for details on training, mentoring and support!

Wednesday, July 14, 2010

Everyone out of the Pool! Tapestry goes singleton!

Tapestry applications are inherently stateful: during and between requests, information in Tapestry components, value stored in fields, stick around. This is a great thing: it lets you program a web application in a sensible way, using stateful objects full of mutable properties and methods to operate on those properties.

It also has its downside: Tapestry has to maintain a pool of page instances. And in Tapestry, page instances are big: a tree of hundreds or perhaps thousands of interrelated objects: the tree of Tapestry structural objects that forms the basic page structure, the component and mixin objects hanging off that tree, the binding objects that connect parameters of components to properties of their containing component, the template objects that represents elements and content from component templates, and many, many more that most Tapestry developers are kept unawares of.

This has proven to be a problem with biggest and busiest sites constructed using Tapestry. Keeping a pool of those objects, checking them in and out, and discarded them when no longer needed is draining needed resources, especially heap space.

So that seems like an irreconcilable problem eh? Removing mutable state from pages and components would turn Tapestry into something else entirely. On the other hand, allowing mutable state means that applications, especially big complex applications with many pages, become memory hogs.

I suppose one approach would be to simply create a page instance for the duration of a request, and discard it at the end. However, page construction in Tapestry is very complicated and although some effort was expended in Tapestry 5.1 to reduce the cost of page construction, it is still present. Additionally, Tapestry is full of small optimizations that improve performance ... assuming a page is reused over time. Throwing away pages is a non-starter.

So we're back to square one ... we can't eliminate mutable state, but (for large applications) we can't live with it either.

The best solution would be to require that all those mutable fields be, instead, ThreadLocal objects, and to change all the logic that accesses them to instead read and write values to the ThreadLocal. Oh, and clean up each and every one at the end of the request, so that information doesn't bleed through to the next request. That would be an incredible imposition on Tapestry developers.

Fortunately, Tapestry has lots of options for meta-programming Tapestry component classes.

Tapestry has already been down this route: the way persistent fields are handled gives the illusion that the page is kept around between requests. You might think that Tapestry serializes the page and stores the whole thing in the HttpSession. In reality, Tapestry is shuffling just the individual persistent field values in to and out of the session. To both the end user and the Tapestry developer, it feels like the entire page is live between requests, but it's really a bit of a shell game, providing an equivalent page instance that has the same values in its fields.

What's going on in trunk (Tapestry 5.2 alpha) right now is extrapolating that concept from just persistent fields to all mutable fields. Every access to every mutable field in a Tapestry page is converted, as part of the class transformation process, into an access against a per-thread Map of keys and values. Each field gets a unique identifying key. The Map is discarded at the end of the request.

The end result is that a single page instance can be used across multiple threads without any synchronization issues and without any field value conflicts.

This idea was suggested in years past, but the APIs to accomplish it (as well as the necessary meta-programming savvy) just wasn't available. However, as a side effect of rewriting and simplifying the class transformation APIs in 5.2, it became very reasonable to do this.

Let's take an important example: the handling of typical, mutable fields. This is the responsibility of the UnclaimedFieldWorker class, part of Tapestry component class transformation pipeline. UnclaimedFieldWorker finds fields that have not be "claimed" by some other part of the pipeline and converts them to read and write their values to the per-thread Map. A claimed field may store an injected service, asset or component, or be a component parameter.

public class UnclaimedFieldWorker implements ComponentClassTransformWorker
{
    private final PerthreadManager perThreadManager;

    private final ComponentClassCache classCache;

    static class UnclaimedFieldConduit implements FieldValueConduit
    {
        private final InternalComponentResources resources;

        private final PerThreadValue<Object> fieldValue;

        // Set prior to the containingPageDidLoad lifecycle event
        private Object fieldDefaultValue;

        private UnclaimedFieldConduit(InternalComponentResources resources, PerThreadValue<Object> fieldValue,
                Object fieldDefaultValue)
        {
            this.resources = resources;

            this.fieldValue = fieldValue;
            this.fieldDefaultValue = fieldDefaultValue;
        }

        public Object get()
        {
            return fieldValue.exists() ? fieldValue.get() : fieldDefaultValue;
        }

        public void set(Object newValue)
        {
            fieldValue.set(newValue);

            // This catches the case where the instance initializer method sets a value for the field.
            // That value is captured and used when no specific value has been stored.

            if (!resources.isLoaded())
                fieldDefaultValue = newValue;
        }

    }

    public UnclaimedFieldWorker(ComponentClassCache classCache, PerthreadManager perThreadManager)
    {
        this.classCache = classCache;
        this.perThreadManager = perThreadManager;
    }

    public void transform(ClassTransformation transformation, MutableComponentModel model)
    {
        for (TransformField field : transformation.matchUnclaimedFields())
        {
            transformField(field);
        }
    }

    private void transformField(TransformField field)
    {
        int modifiers = field.getModifiers();

        if (Modifier.isFinal(modifiers) || Modifier.isStatic(modifiers))
            return;

        ComponentValueProvider<FieldValueConduit> provider = createFieldValueConduitProvider(field);

        field.replaceAccess(provider);
    }

    private ComponentValueProvider<FieldValueConduit> createFieldValueConduitProvider(TransformField field)
    {
        final String fieldName = field.getName();
        final String fieldType = field.getType();

        return new ComponentValueProvider<FieldValueConduit>()
        {
            public FieldValueConduit get(ComponentResources resources)
            {
                Object fieldDefaultValue = classCache.defaultValueForType(fieldType);

                String key = String.format("UnclaimedFieldWorker:%s/%s", resources.getCompleteId(), fieldName);

                return new UnclaimedFieldConduit((InternalComponentResources) resources,
                        perThreadManager.createValue(key), fieldDefaultValue);
            }
        };
    }
}

That seems like a lot, but lets break it down bit by bit.

    public void transform(ClassTransformation transformation, MutableComponentModel model)
    {
        for (TransformField field : transformation.matchUnclaimedFields())
        {
            transformField(field);
        }
    }

    private void transformField(TransformField field)
    {
        int modifiers = field.getModifiers();

        if (Modifier.isFinal(modifiers) || Modifier.isStatic(modifiers))
            return;

        ComponentValueProvider<FieldValueConduit> provider = createFieldValueConduitProvider(field);

        field.replaceAccess(provider);
    }

The transform() method is the lone method for this class, as defined by ComponentClassTransformWorker. It uses a method on the ClassTransformation to locate all the unclaimed fields. TransformField is the representation of a field of a component class during the transformation process. As we'll see it is very easy to intercept access to the field.

Some of those fields are final or static and are just ignored. A ComponentValueProvider is a callback object: when the component (whatever it is) is first instantiated, the provider will be invoked and the return value stored into a new field. A FieldValueConduit is an object that takes over responsibility for access to a TransformField: internally, all read and write access to the field is passed through the conduit object.

So, what we're saying is: when the component is first created, use the callback to create a conduit, and change any read or write access to the field to pass through the created conduit. If a component is instantiated multiple times (either in different pages, or within the same page) each instance of the component will end up with a specific FieldValueConduit.

Fine so far; it comes down to what's inside the createFieldValueConduitProvider() method:

    private ComponentValueProvider<FieldValueConduit> createFieldValueConduitProvider(TransformField field)
    {
        final String fieldName = field.getName();
        final String fieldType = field.getType();

        return new ComponentValueProvider<FieldValueConduit>()
        {
            public FieldValueConduit get(ComponentResources resources)
            {
                Object fieldDefaultValue = classCache.defaultValueForType(fieldType);

                String key = String.format("UnclaimedFieldWorker:%s/%s", resources.getCompleteId(), fieldName);

                return new UnclaimedFieldConduit((InternalComponentResources) resources,
                        perThreadManager.createValue(key), fieldDefaultValue);
            }
        };
    }

Here we capture the name of the field and its type (expressed as String). Inside the get() method we determine the initial default value for the field: typically just null, but may be 0 (for a primitive numeric field) or false (for a primitive boolean field).

Next we build a unique key used to store and retrieve the field's value inside the per-thread Map. The key includes the complete id of the component and the name of the field: thus two different component instances, in the same page or across different pages, will have their own unique key.

We use the PerthreadManager service to create a PerThreadValue for the field. You can think of a PerThreadValue as a specialized kind of ThreadLocal that automatically cleans itself up at the end of the request.

Lastly, we create the conduit object. Let's look at the conduit in more detail:

    static class UnclaimedFieldConduit implements FieldValueConduit
    {
        private final InternalComponentResources resources;

        private final PerThreadValue<Object> fieldValue;

        // Set prior to the containingPageDidLoad lifecycle event
        private Object fieldDefaultValue;

        private UnclaimedFieldConduit(InternalComponentResources resources, PerThreadValue<Object> fieldValue,
                Object fieldDefaultValue)
        {
            this.resources = resources;

            this.fieldValue = fieldValue;
            this.fieldDefaultValue = fieldDefaultValue;
        }

We use the special InternalComponentResources interface because we'll need to know if the page is loading, or in normal operation (that's coming up). We capture our initial guess at a default value for the field (remember: null, false or 0) but that may change.

        public Object get()
        {
            return fieldValue.exists() ? fieldValue.get() : fieldDefaultValue;
        }

Whenever code inside the component reads the field, this method will be invoked. It checks to see if a value has been stored into the PerThreadValue object this request; if so the stored value is returned, otherwise the field default value is returned.

Notice the distinction here between null and no value at all. Just because the field is set to null doesn't mean we should switch over the the default value (assuming the default is not null).

The last hurdle is updates to the field:

      public void set(Object newValue)
        {
            fieldValue.set(newValue);

            // This catches the case where the instance initializer method sets a value for the field.
            // That value is captured and used when no specific value has been stored.

            if (!resources.isLoaded())
                fieldDefaultValue = newValue;
        }

The basic logic is just to stuff the value assigned to the component field into the PerThreadValue object. However, there's one special case: a field initialization (whether it's in the component's constructor, or at the point in the code where the field is first defined) turns into a call to set(). We can differentiate the two cases because that update occurs before the page is marked as fully loaded, rather than in normal use of the page.

And that's it! Now, to be honest, this is much more detail than a typical Tapestry developer ever needs to know. However, it's a good demonstration of how Tapestry's class transformation APIs make Java code fluid; capable of being changed dynamically (under carefully controlled circumstances).

Back to pooling: how is this going to affect performance? That's an open question, and putting together a performance testing environment is another task at the top of my list. My suspicion is that the new overhead will not make a visible difference for small applications (dozens of pages, reasonable number of concurrent users) ... but for high end sites (hundreds of pages, large numbers of concurrent users) the avoidance of pooling and page construction will make a big difference!

4 comments:

Kalle Korhonen said...

:) Had Tapestry suddenly gone the "create a page instance for the duration of a request, and discard it at the end" route, it could have given up on the "static structure, dynamic behavior" as well... I'm afraid it's too long and too technical of a post for people to really appreciate the careful balance between performance and resource consumption you get out of this. In the end, "performance almost always matters".

ccordenier said...

What a great news ! Once again, Tapestry 5 demonstrates how natural it is to handle this kind of consideration with its powerful class transformation mechanism. Small units of work for big changes!

Howard said...

@Kalle

I don't think so; you could maybe change the structure for the current request, but how to you restore the structure in a later request? That requires some kind of information beyond Tapestry's ability to weave into URLs.

Kalle Korhonen said...

@Howard

Right, restoring the previous (dynamic) view tree (besides just the state of it) is the rather complex problem that both JSF and Wicket are wrestling with, with more or less success. The problem demonstrably can be solved, but not without higher memory consumption. I'm happy that Tapestry is going the opposite direction in regards to memory consumption.