Thursday, August 19, 2010

Groovin' on the Testin'

I'm at the point now where I'm writing Groovy code for (virtually) all my unit and integration tests. Tapestry's testing code is pretty densely written ... care of all those explicit types and all the boilerplate EasyMock code.

With Groovy, that code condenses down nicely, and the end result is more readable. For example, here's an integration test:

    @Test
    void basic_links() {
        clickThru "ActivationRequestParameter Annotation Demo"
        
        assertText "click-count", ""
        assertText "click-count-set", "false"
        assertText "message", ""
        
        clickAndWait "link=increment count"
        
        assertText "click-count", "1"
        assertText "click-count-set", "true"
        
        clickAndWait "link=set message"
        
        assertText "click-count", "1"
        assertText "click-count-set", "true"
        assertText "message", "Link clicked!"        
    }

That's pretty code; the various assert methods are simple enough that we can strip away the unecessary parenthesis.

What really hits strong is making use of Closures though. A lot of the unit and integration tests have a big setup phase where, often, several mock objects are being created and trained, followed by some method invocations on the subject, followed by some assertions.

With Groovy, I can easily encapsulate that as templates methods, with a closure that gets executed to supply the meat of the test:

class JavaScriptSupportAutofocusTests extends InternalBaseTestCase
{
    private autofocus_template(expectedFieldId, cls) {
        def linker = mockDocumentLinker()
        def stackSource = newMock(JavaScriptStackSource.class)
        def stackPathConstructor = newMock(JavaScriptStackPathConstructor.class)
        def coreStack = newMock(JavaScriptStack.class)
        
        // Adding the autofocus will drag in the core stack
        
        expect(stackSource.getStack("core")).andReturn coreStack
        
        expect(stackPathConstructor.constructPathsForJavaScriptStack("core")).andReturn([])
        
        expect(coreStack.getStacks()).andReturn([])
        expect(coreStack.getStylesheets()).andReturn([])
        expect(coreStack.getInitialization()).andReturn(null)
        
        JSONObject expected = new JSONObject("{\"activate\":[\"$expectedFieldId\"]}")
        
        linker.setInitialization(InitializationPriority.NORMAL, expected)
        
        replay()
        
        def jss = new JavaScriptSupportImpl(linker, stackSource, stackPathConstructor)
        
        cls jss
        
        jss.commit()
        
        verify()
    }
    
    @Test
    void simple_autofocus() {
        
        autofocus_template "fred", { 
            it.autofocus FieldFocusPriority.OPTIONAL, "fred"
        }
    }
    
    @Test
    void first_focus_field_at_priority_wins() {
        autofocus_template "fred", {
            it.autofocus FieldFocusPriority.OPTIONAL, "fred"
            it.autofocus FieldFocusPriority.OPTIONAL, "barney"
        }
    }
    
    @Test
    void higher_priority_wins_focus() {
        autofocus_template "barney", {
            it.autofocus FieldFocusPriority.OPTIONAL, "fred"
            it.autofocus FieldFocusPriority.REQUIRED, "barney"
        }
    }
}

That starts being neat; with closures as a universal adapter interface, it's really easy to write readable test code, where you can see what's actually being tested.

I've been following some of the JDK 7 closure work and it may make me more interested in coding Java again. Having a syntax nearly as concise as Groovy (but still typesafe) is intriguing. Further, they have an eye towards efficiency as well ... in many cases, the closure is turned into a synthetic method of the containing class rather than an entire standalone class (the way inner classes are handled). This is good news for JDK 7 ... and I can't wait to see it tame the class explosion in languages like Clojure and Scala.

Tuesday, August 17, 2010

Tapestry Frequently Asked Questions

I'm taking some time to work on the Tapestry documentation ... starting with the FAQ. It's great fun, though this could get to be quite large. I'm just spewing out content right now, over time we'll clean it up, reorganize it, and add further hyperlinks and annotations.

In fact, as I'm working on the FAQ, I'm thinking this might be the best way to document open source projects in general. User's guides and reference documents are rarely read, everyone just Google's their question, so put those questions in their most findable format. Also, it's hard to write a consistent user guide start to finish ... but more reasonable to document one tidbit at a time.

Also, I'm reminded of The Little Schemer, a book that teaches the entire Scheme language (a Lisp variant) via a series of questions of ever broadening scope.

Feel free to suggest additional FAQ topics on the Tapestry Users mailing list.

Monday, August 09, 2010

Tapestry 5.2 leaves the gate

It's been a long time coming. Originally, I had thought we'd be producing Tapestry 5.2 six to eight months after Tapestry 5.1 ... instead, it's been more like 14 months just to get to the alpha release. Why? Well, in that time, I've personally changed jobs (back to an independent consultant), and I've been actively using the nightly snapshots of Tapestry 5.2 in two different projects for two different clients. I've had a lot of chances to see Tapestry in practice and, as always, identify the rough edges and smooth them out.

This new release enhances one of Tapestry secret strengths: meta-programming. It is now ridiculously easy to extend the behavior of components, or method or fields within components, using annotations .... without getting mixed up in all that Javassist business. I'm using that now just about everywhere you might think about using a base class: everything from securing page access, to caching, to integration with Google Analytics.

The big change here is the switch from pooled pages to singletons: In Tapestry 5.1 and earlier, Tapestry kept a pool for page instances. On each request, a localized page instance was pulled from the pool, used exclusively by the one request thread, then returned to the pool. The pool had to be able to expand dynamically, and shrink to release memory.

Starting with Tapestry 5.2, the page pool is deprecated (and only enabled with extra configuration). Instead, a single page instance is created and shared between threads. That may raise your red alert flag ... doesn't that make Tapestry non-thread-safe?

Nope. Tapestry now reworks your simple POJO classes, changing access to all local mutable fields to instead store the value in a per-thread Map. It's an extrapolation of how Tapestry already managed persistent fields (storing the persistent field values in the Session between requests) ... but it now applies to all request-scoped state.

It's an interesting trade off: a lot less memory (just a single instance of each page and all its components) for a bit more work during each request. Part of the reason for this alpha release is to get this code into more hands and get more performance analysis on the result. I'm confident that these changes will not noticeably affect small applications and reasonable request loads but will make a big difference in handling larger applications with heavy request loads.

Meanwhile, the goal is to keep the APIs stable, address a bunch of bugs, and get another release out soon, then vote that up as a beta release. Preferably before JavaOne!

Friday, July 30, 2010

Choosing the Right Web Framework

Thank you Google Alerts, for pointing out this article on choosing a Java web framework. It's over a year old, but I think the things that make Tapestry special have only gotten stronger in the intervening time.

Wednesday, July 28, 2010

Git on Mac OS X: Don't ignore case!

By default, Mac OS X uses a case insensitive file system, and Git seems to honor that. The problem is, most programming languages, especially Java, are case sensitive. Class "JavaScriptSupport" needs to be in file "JavaScriptSupport.java" and not "JavascriptSupport.java". This is even worse when sharing code via a repository since some other developers may check out code on a case sensitive file system.

I was just renaming some classes, from things like "JavascriptStack" to "JavaScriptStack" (because the language is called "JavaScript" not "Javascript") ... and I was dismayed that Git saw that as an in-place update to a file, not a rename of the file.

Unfortunately, it's not as simple as git config core.ignorecase false to make Git do the right thing. That's an essential part of it, but Git still sees changes to the original naming of the file as a change, not a deletion.

I had to use the trick of one commit renaming JavascriptStack.java --> JSStack.java, then a second commit renaming JSStack.java --> JavaScriptStack.java.

Monday, July 26, 2010

Tapestry 5 Training in London: Oct 5 - 8

SkillsMatter Logo I'm once again partnering with SkillsMatter to teach my full Tapestry workshop. This is an expanded version of the class, which is growing from three days up to four; the additional day will ensure that we have time for all the existing materials, and add a new section on testing using TestNG, Selenium and Groovy. It will also give us more time to explore student directed ideas, such as security and meta-programming.

The class will be taught at SkillsMatter's offices in London, from October 5th through the 8th.

Wednesday, July 14, 2010

Everyone out of the Pool! Tapestry goes singleton!

Tapestry applications are inherently stateful: during and between requests, information in Tapestry components, value stored in fields, stick around. This is a great thing: it lets you program a web application in a sensible way, using stateful objects full of mutable properties and methods to operate on those properties.

It also has its downside: Tapestry has to maintain a pool of page instances. And in Tapestry, page instances are big: a tree of hundreds or perhaps thousands of interrelated objects: the tree of Tapestry structural objects that forms the basic page structure, the component and mixin objects hanging off that tree, the binding objects that connect parameters of components to properties of their containing component, the template objects that represents elements and content from component templates, and many, many more that most Tapestry developers are kept unawares of.

This has proven to be a problem with biggest and busiest sites constructed using Tapestry. Keeping a pool of those objects, checking them in and out, and discarded them when no longer needed is draining needed resources, especially heap space.

So that seems like an irreconcilable problem eh? Removing mutable state from pages and components would turn Tapestry into something else entirely. On the other hand, allowing mutable state means that applications, especially big complex applications with many pages, become memory hogs.

I suppose one approach would be to simply create a page instance for the duration of a request, and discard it at the end. However, page construction in Tapestry is very complicated and although some effort was expended in Tapestry 5.1 to reduce the cost of page construction, it is still present. Additionally, Tapestry is full of small optimizations that improve performance ... assuming a page is reused over time. Throwing away pages is a non-starter.

So we're back to square one ... we can't eliminate mutable state, but (for large applications) we can't live with it either.

The best solution would be to require that all those mutable fields be, instead, ThreadLocal objects, and to change all the logic that accesses them to instead read and write values to the ThreadLocal. Oh, and clean up each and every one at the end of the request, so that information doesn't bleed through to the next request. That would be an incredible imposition on Tapestry developers.

Fortunately, Tapestry has lots of options for meta-programming Tapestry component classes.

Tapestry has already been down this route: the way persistent fields are handled gives the illusion that the page is kept around between requests. You might think that Tapestry serializes the page and stores the whole thing in the HttpSession. In reality, Tapestry is shuffling just the individual persistent field values in to and out of the session. To both the end user and the Tapestry developer, it feels like the entire page is live between requests, but it's really a bit of a shell game, providing an equivalent page instance that has the same values in its fields.

What's going on in trunk (Tapestry 5.2 alpha) right now is extrapolating that concept from just persistent fields to all mutable fields. Every access to every mutable field in a Tapestry page is converted, as part of the class transformation process, into an access against a per-thread Map of keys and values. Each field gets a unique identifying key. The Map is discarded at the end of the request.

The end result is that a single page instance can be used across multiple threads without any synchronization issues and without any field value conflicts.

This idea was suggested in years past, but the APIs to accomplish it (as well as the necessary meta-programming savvy) just wasn't available. However, as a side effect of rewriting and simplifying the class transformation APIs in 5.2, it became very reasonable to do this.

Let's take an important example: the handling of typical, mutable fields. This is the responsibility of the UnclaimedFieldWorker class, part of Tapestry component class transformation pipeline. UnclaimedFieldWorker finds fields that have not be "claimed" by some other part of the pipeline and converts them to read and write their values to the per-thread Map. A claimed field may store an injected service, asset or component, or be a component parameter.

public class UnclaimedFieldWorker implements ComponentClassTransformWorker
{
    private final PerthreadManager perThreadManager;

    private final ComponentClassCache classCache;

    static class UnclaimedFieldConduit implements FieldValueConduit
    {
        private final InternalComponentResources resources;

        private final PerThreadValue<Object> fieldValue;

        // Set prior to the containingPageDidLoad lifecycle event
        private Object fieldDefaultValue;

        private UnclaimedFieldConduit(InternalComponentResources resources, PerThreadValue<Object> fieldValue,
                Object fieldDefaultValue)
        {
            this.resources = resources;

            this.fieldValue = fieldValue;
            this.fieldDefaultValue = fieldDefaultValue;
        }

        public Object get()
        {
            return fieldValue.exists() ? fieldValue.get() : fieldDefaultValue;
        }

        public void set(Object newValue)
        {
            fieldValue.set(newValue);

            // This catches the case where the instance initializer method sets a value for the field.
            // That value is captured and used when no specific value has been stored.

            if (!resources.isLoaded())
                fieldDefaultValue = newValue;
        }

    }

    public UnclaimedFieldWorker(ComponentClassCache classCache, PerthreadManager perThreadManager)
    {
        this.classCache = classCache;
        this.perThreadManager = perThreadManager;
    }

    public void transform(ClassTransformation transformation, MutableComponentModel model)
    {
        for (TransformField field : transformation.matchUnclaimedFields())
        {
            transformField(field);
        }
    }

    private void transformField(TransformField field)
    {
        int modifiers = field.getModifiers();

        if (Modifier.isFinal(modifiers) || Modifier.isStatic(modifiers))
            return;

        ComponentValueProvider<FieldValueConduit> provider = createFieldValueConduitProvider(field);

        field.replaceAccess(provider);
    }

    private ComponentValueProvider<FieldValueConduit> createFieldValueConduitProvider(TransformField field)
    {
        final String fieldName = field.getName();
        final String fieldType = field.getType();

        return new ComponentValueProvider<FieldValueConduit>()
        {
            public FieldValueConduit get(ComponentResources resources)
            {
                Object fieldDefaultValue = classCache.defaultValueForType(fieldType);

                String key = String.format("UnclaimedFieldWorker:%s/%s", resources.getCompleteId(), fieldName);

                return new UnclaimedFieldConduit((InternalComponentResources) resources,
                        perThreadManager.createValue(key), fieldDefaultValue);
            }
        };
    }
}

That seems like a lot, but lets break it down bit by bit.

    public void transform(ClassTransformation transformation, MutableComponentModel model)
    {
        for (TransformField field : transformation.matchUnclaimedFields())
        {
            transformField(field);
        }
    }

    private void transformField(TransformField field)
    {
        int modifiers = field.getModifiers();

        if (Modifier.isFinal(modifiers) || Modifier.isStatic(modifiers))
            return;

        ComponentValueProvider<FieldValueConduit> provider = createFieldValueConduitProvider(field);

        field.replaceAccess(provider);
    }

The transform() method is the lone method for this class, as defined by ComponentClassTransformWorker. It uses a method on the ClassTransformation to locate all the unclaimed fields. TransformField is the representation of a field of a component class during the transformation process. As we'll see it is very easy to intercept access to the field.

Some of those fields are final or static and are just ignored. A ComponentValueProvider is a callback object: when the component (whatever it is) is first instantiated, the provider will be invoked and the return value stored into a new field. A FieldValueConduit is an object that takes over responsibility for access to a TransformField: internally, all read and write access to the field is passed through the conduit object.

So, what we're saying is: when the component is first created, use the callback to create a conduit, and change any read or write access to the field to pass through the created conduit. If a component is instantiated multiple times (either in different pages, or within the same page) each instance of the component will end up with a specific FieldValueConduit.

Fine so far; it comes down to what's inside the createFieldValueConduitProvider() method:

    private ComponentValueProvider<FieldValueConduit> createFieldValueConduitProvider(TransformField field)
    {
        final String fieldName = field.getName();
        final String fieldType = field.getType();

        return new ComponentValueProvider<FieldValueConduit>()
        {
            public FieldValueConduit get(ComponentResources resources)
            {
                Object fieldDefaultValue = classCache.defaultValueForType(fieldType);

                String key = String.format("UnclaimedFieldWorker:%s/%s", resources.getCompleteId(), fieldName);

                return new UnclaimedFieldConduit((InternalComponentResources) resources,
                        perThreadManager.createValue(key), fieldDefaultValue);
            }
        };
    }

Here we capture the name of the field and its type (expressed as String). Inside the get() method we determine the initial default value for the field: typically just null, but may be 0 (for a primitive numeric field) or false (for a primitive boolean field).

Next we build a unique key used to store and retrieve the field's value inside the per-thread Map. The key includes the complete id of the component and the name of the field: thus two different component instances, in the same page or across different pages, will have their own unique key.

We use the PerthreadManager service to create a PerThreadValue for the field. You can think of a PerThreadValue as a specialized kind of ThreadLocal that automatically cleans itself up at the end of the request.

Lastly, we create the conduit object. Let's look at the conduit in more detail:

    static class UnclaimedFieldConduit implements FieldValueConduit
    {
        private final InternalComponentResources resources;

        private final PerThreadValue<Object> fieldValue;

        // Set prior to the containingPageDidLoad lifecycle event
        private Object fieldDefaultValue;

        private UnclaimedFieldConduit(InternalComponentResources resources, PerThreadValue<Object> fieldValue,
                Object fieldDefaultValue)
        {
            this.resources = resources;

            this.fieldValue = fieldValue;
            this.fieldDefaultValue = fieldDefaultValue;
        }

We use the special InternalComponentResources interface because we'll need to know if the page is loading, or in normal operation (that's coming up). We capture our initial guess at a default value for the field (remember: null, false or 0) but that may change.

        public Object get()
        {
            return fieldValue.exists() ? fieldValue.get() : fieldDefaultValue;
        }

Whenever code inside the component reads the field, this method will be invoked. It checks to see if a value has been stored into the PerThreadValue object this request; if so the stored value is returned, otherwise the field default value is returned.

Notice the distinction here between null and no value at all. Just because the field is set to null doesn't mean we should switch over the the default value (assuming the default is not null).

The last hurdle is updates to the field:

      public void set(Object newValue)
        {
            fieldValue.set(newValue);

            // This catches the case where the instance initializer method sets a value for the field.
            // That value is captured and used when no specific value has been stored.

            if (!resources.isLoaded())
                fieldDefaultValue = newValue;
        }

The basic logic is just to stuff the value assigned to the component field into the PerThreadValue object. However, there's one special case: a field initialization (whether it's in the component's constructor, or at the point in the code where the field is first defined) turns into a call to set(). We can differentiate the two cases because that update occurs before the page is marked as fully loaded, rather than in normal use of the page.

And that's it! Now, to be honest, this is much more detail than a typical Tapestry developer ever needs to know. However, it's a good demonstration of how Tapestry's class transformation APIs make Java code fluid; capable of being changed dynamically (under carefully controlled circumstances).

Back to pooling: how is this going to affect performance? That's an open question, and putting together a performance testing environment is another task at the top of my list. My suspicion is that the new overhead will not make a visible difference for small applications (dozens of pages, reasonable number of concurrent users) ... but for high end sites (hundreds of pages, large numbers of concurrent users) the avoidance of pooling and page construction will make a big difference!

/Scripts