Friday, August 28, 2009

Keeping track of Clojure dependencies

I've been coding more of Cascade and I'm tending to do lots of small namespaces. I'm concerned about circular dependencies, so I've been using OmniGraffle to keep track of what uses what:

I end up keeping this open in the background and updating it manually as I add new namespaces or change dependencies. I'm pretty sure it's accurate.

It does raise the question ... am I using the correct level of granularity? I think I am, and the individual files are pretty short:

$ find src/main/clojure -name \*.clj | xargs wc -l
      51 src/main/clojure/cascade/config.clj
     115 src/main/clojure/cascade/dispatcher.clj
      92 src/main/clojure/cascade/dom.clj
     107 src/main/clojure/cascade/filter.clj
      51 src/main/clojure/cascade/internal/parse_functions.clj
      94 src/main/clojure/cascade/internal/parser.clj
     205 src/main/clojure/cascade/internal/utils.clj
     113 src/main/clojure/cascade/internal/viewbuilder.clj
      36 src/main/clojure/cascade/jetty.clj
      39 src/main/clojure/cascade/logging.clj
      29 src/main/clojure/cascade/map_utils.clj
      56 src/main/clojure/cascade/mock.clj
      73 src/main/clojure/cascade/path_map.clj
      41 src/main/clojure/cascade/pipeline.clj
      30 src/main/clojure/cascade/urls.clj
      81 src/main/clojure/cascade.clj
    1213 total

... and that includes the ASL header comment in each file. Short and sweet.

Maintaining this chart seems like something I should automate at some point, however: It should be possible to use the reader to parse a Clojure source file without evaluating it; the first form should be the (ns), from which can be expanded the used and required namespaces (with a bit of effort, because of Clojure's very concise syntax). One more project for the back burner ...

Thursday, August 27, 2009

Return to Independent Consulting

I'd like to announce to the Tapestry community that I've returned to independent consulting. As an independent consultant, I'll have more opportunities to pursue training, mentoring, and project work that did not fit with Formos' overall goals.

Formos continues to be committed to Tapestry, and to maintaining the Tapestry360 web site. I'd like to thank Matt Tunnel, President of Formos, for the opportunities he's provided: a "dream job" that let me focus on completing Tapestry 5.0 and 5.1, with a scope of features far beyond what I had originally envisioned when I started Tapestry 5 over four years ago.

Now is a new chapter; I'm starting to search for my next dream job, while actively seeking out new Tapestry training, mentoring and support projects, as well as working with my existing clients. In addition, I'm using my improved freedom to pursue other important technologies beyond Tapestry, such as Clojure, Cappuccino, and CouchDB. I expect to be able to offer the same kind of compelling training and project work in these technologies as I currently provide for Tapestry.

I'm also taking this time to pursue one of the opportunities I could not take on while at Formos: a Tapestry 5 book. I'm currently contacting a number of different publishers to find the best home for a new book specifically about Tapestry 5.

I'd also like to thank the Tapestry community for all the enthusiasm and dedication that you've given to Tapestry. I'm looking forward to helping you create even more insanely great applications!

Sunday, August 16, 2009

Article: Meta-Programming Java

In the last couple of years, if you mention the term meta-programming, people's ears perk up ... and they start looking around for Ruby. That's fair; Ruby makes a lot of meta-programming concepts very, very easy. However, that doesn't mean you can't do any meta-programming in Java; you just are a bit more limited and need a lot more infrastructure.

Tapestry 5, both the web framework and the underlying Inversion of Control container, is rife with meta-programming options. Let's talk about one of the most versatile: the thunk.

Thunks and Laziness

A thunk is a placeholder for a value to be computed as-needed. The Haskell programming language makes great use of these; thunks are the essense of lazy programming: each thunk represents a set of parameters to a function1 and the function itself.

The upshot of this is that when you see a function call (or other expression) in Haskell code, what really happens is that a thunk of the invocation of that function is created to capture the values to be passed in (some of which may themselves be thunks of other expressions). Its only when the value is needed, when the result of the expression is used in some other expression that is evaluated, that the thunk itself gets evaluated; the function is invoked, the return value is cached in the thunk and returned. This makes the order in which things happen in Haskell very difficult to predict, especially from the outside. Because of thunks, algorithms that look tail recursive aren't (the recursive call is just another thunk, evaulated serially). Further, algorithms that appear to be infinite, aren't: the thunks ensure that just values that are actually needed are ever computed.

It's an elegant and powerful approach, and it's even fast, because the fastest code is the code that is never executed in the first place.

Other languages have this feature; Clojure reflects its Lisp heritage in that almost everything operates in terms of accessing, iterating and transforming collections ... and all of those collection operations are lazy as well. Unlike Haskell, this is more a function of a carefully crafted standard library than a direct offshoot of the language, but the end result is quite similar.

But what happens when you want to accomplish some of these features (such as lazy evaluation) within the tight constraints of standard Java? That's when you need to get creative!

Thunks in Tapestry 5

Tapestry 5 uses thunks in many different places; the most common one is the use of proxies for Tapestry 5 IoC services. In Tapestry 5 every service has an interface2. Let's take a peek at a typical service in Tapestry 5, to illustrate the typed-thunk concept.

Listing 1: ComponentMessagesSource.java

public interface ComponentMessagesSource
{
    Messages getMessages(ComponentModel componentModel, Locale locale);

    InvalidationEventHub getInvalidationEventHub();
}

The purpose of the ComponentMessagesSource service is to provide a Messages object representing a particular component's message catalog. This is part of Tapestry's localization support: every page and component has easy access to its own message bundle, which includes messages inherited from base components and from a global message catalog.

A central tenet of Tapestry 5 is that service instantiation is lazy: services are only constructed as needed. What does "as needed" mean? It means, the first time any method of the service is invoked. This kind of lazy instantiation is accomplished by using thunks. So for a service such as ComponentMessagesSource, there will be a class somewhat like ComponentMessagesSourceThunk to handle the lazy instantiation:

Listing 2: ComponentMessagesSourceThunk.java

public interface ComponentMessagesSourceThunk implements ComponentMessagesSource
{
    private final ObjectCreator creator;

    public ComponentMessagesSourceThunk(ObjectCreator creator) { this.creator = creator; }

    private ComponentMessagesSourceThunk delegate() { return (ComponentMessagesSourceThunk) creator.createObject(); }

    public Messages getMessages(ComponentModel componentModel, Locale locale)
    {
        return delegate().getMessages(componentModel, locale);
    }

    public InvalidationEventHub getInvalidationEventHub()
    {
        return delegate().getInvalidationEventHub();
    }
}

You won't find the above class in the Tapestry source code: it is generated on-the-fly by Tapestry. That's great, because I know I'd hate to have to supply a service interface, a service implementation and a thunk class for each service; the interface and implementation is already plenty! One of the reasons that Tapestry all but requires that services have a service interface is to support the automatic creation of thunks or other proxies around the interface.

However, you can see the pattern: every method of the interface is, of course, implemented in the thunk. That's what it means to implement an interface. Each method obtains the delegate and then re-invokes the same method with the same parameters on the delegate. The trick is that the first time any of these methods are invoked, the delegate does not yet exist. The ObjectCreator will create the delegate object during that first invocation, and keep returning it subsequently. That's the essence of lazy instantiation.

The point here is that for any interface, you can create a typed-thunk that can stand in for the real object, hiding the real object's lifecycle: it gets created on demand by the ObjectCreator. Code that uses the thunk has no way of telling the thunk from the real objects ... the thunk implements all the methods of the interface and performs the right behaviors when those methods get invoked.

Creating Thunks Dynamically

Before we can talk about using thunks, we need to figure out how to create them dynamically, at runtime. Let's start by specifying the interface for a service that can provide thunks on demand, then figure out the implementation of that service.

Listing 3: ThunkCreator.java

public interface ThunkCreator
{
    /**
     * Creates a Thunk of the given proxy type.
     *
     * @param proxyType     type of object to create (must be an interface)
     * @param objectCreator provides an instance of the same type on demand (may be invoked multiple times)
     * @param description   to be returned from the thunk's toString() method
     * @param <T>           type of thunk
     * @return thunk of given type
     */
    <T> T createThunk(Class<T> proxyType, ObjectCreator objectCreator, String description);
}

Remember that this is just an automated way of producing instances of classes similar to ComponentMessagesSourceThunk. A simple implementation of this service is possible using JDK Proxies:

Listing 4: ThunkCreatorImpl.java

public class ThunkCreatorImpl implements ThunkCreator
{
    public <T> T createThunk(Class<T> proxyType, final ObjectCreator objectCreator, final String description)
    {
        InvocationHandler handler = new InvocationHandler()
        {
            public Object invoke(Object proxy, Method method, Object[] args) throws Throwable
            {
                if (method.getName().equals("toString") && method.getParameterTypes().length == 0)
                    return description;

                return method.invoke(objectCreator.createObject(), args);
            }
        };

        Object proxy = Proxy.newProxyInstance(Thread.currentThread().getContextClassLoader(),
                                              new Class[] { proxyType },
                                              handler);

        return proxyType.cast(proxy);
    }
}

JDK Proxies were introduced way back in JDK 1.3 and caused a real flurry of activity because they are so incredibly useful. A call to Proxy.newProxyInstance() will create an object conforming to the provided interfaces (here specified as the proxyType parameter). Every method invocation is routed through a single InvocationHandler object. The InvocationHandler simply re-routes method invocations to the object returned from objectCreator.createObject().

Tapestry's implementation of ThunkCreator uses the Javassist bytecode manipulation library to generate a custom class at runtime. The generated class is much closer to the example CompnentMessagesSourceThunk; it doesn't use JDK proxies or reflection. This means that Java's Hotspot compiler can do a better job optimizing the code. In reality, you'll be hard pressed to spot a difference in performance unless you use these thunks inside a very tight loop.

Great so far; now lets think about how we could use this in another way. What if you have a service that returns an object that is expensive to construct and may not even get used? An example of this in Tapestry is the Messages object, obtained from the ComponentMessagesSource service. Building a Messages instance for a component involves a lot of hunting around the classpath looking for properties files, not just for the component but for its base-class and for application-wide message bundles. That means a lot of I/O and and a lot of blocking, waiting for the disk drive to catch up. In many cases, these Messages objects are injected into components, but aren't used immediately. In terms of getting markup into the user's browser faster, avoiding all of those file lookups and file reads until absolutely necessary is an appreciable win.

Our goal is to intercept the call to ComponentMessagesSource.getMessages() and capture the parameters to the method. Instead of invoking the method, we want to return a thunk that encapsulates the method call. This is where we can really start to talk about meta-programming, not just programming: we aren't going to change the ComponentMessagesSource service implementation to accomplish this, we are going to meta-program the service. This is a key point: A Tapestry service is the sum of its interface, its implementation, and all the other parts provided by Tapestry. We can use Tapestry to augment the behavior of a service without changing the implementation of the service itself.

This approach is in stark contrast to, say, Ruby. When meta-programming Ruby you often end up writing and rewriting the methods defined by the class in place. In Java, you will instead layer on new objects implementing the same interface to provide the added behavior.

Accomplishing all this is suprisingly easy ... given the infrastructure that Tapestry 5 IoC already provides.

Lazy Advice

The goal with lazy advice is that invoking a method on a service short-circuits the method invocation: a thunk is returned that is a replacement for the return value of the method. Invoking a method on a thunk will invoke the actual service method, then re-invoke the method on the actual value returned from the method.

Image 1: Lazy Advice Thunk/

This is shown in image 1. The service method is represented by the blue line. The advice intercepts the call (remembering the method parameters) and returns a thunk. Later, the caller invokes a method on the thunk (the green line). The thunk will invoke the service method using the saved parameters (this is the lazy part), then re-invoke the method on the returned value.

To the caller, there is no evidence that the thunk even exists; the service method just returns faster than it should, and the first method invocation on the return value takes a little longer than it should.

Now we know what the solution is going to look like .. but how do we make it actually happen? How do we get "in there" to advise service methods?

Advising Service Methods

Tapestry's Inversion of Control Container is organized around modules: classes that define services. This is in contrast to Spring, which relies on verbose XML files. Tapestry uses a naming convention to figure out what methods of a module class do what. Methods whose name starts with "build" define services (and are ultimately used to instantiate them). Other method name prefixes have different meanings.

Module method names prefixed with "advise" act as a hook for a limited amount of Aspect Oriented Programming. Tapestry allows an easy way to provide around advice on method invocations ... a more intrusive system such as AspectJ can easily intercept access to fields or even the construction of classes and has more facilities for limiting the scope of advice so that it only applies to invocations in specific classes or packages. Of course, it works by significantly rewriting the bytecode of your classes and Tapestry's IoC container aims for a lighter touch.

Being able to advise service methods was originally intended to support logging of method entry and exit, or other cross-cutting converns such as managing transactions or enforcing security access constraints. However, the same mechanism can go much further, controlling when method invocations occur, in much the same way that the lazy thunk described above operates.

Listing 5 shows the method advice for the ComponentMessagesSource service.

Listing 5: TapestryModule.java

    @Match("ComponentMessagesSource")
    public static void adviseLazy(LazyAdvisor advisor, MethodAdviceReceiver receiver)
    {
        advisor.addLazyMethodInvocationAdvice(receiver);
    }

This method is used to advise a specific service, identified by the service's unique id, here "ComponentMessagesSource". An advisor method may advise many different services; we could use glob names or regular expressions to match a wider range of services. An advisor method recieves a MethodAdviceReceiver as a parameter; additional parameters are injected services. The intent of module classes is to contain a minimal amount of code, so it makes sense to move the real work into a service, especially because it is so easy to inject services directly into the advisor method.

The LazyAdvisor service, built into Tapestry, does most of the work:

Listng 6: LazyAdvisorImpl.java

public class LazyAdvisorImpl implements LazyAdvisor
{
    private final ThunkCreator thunkCreator;

    public LazyAdvisorImpl(ThunkCreator thunkCreator)
    {
        this.thunkCreator = thunkCreator;
    }

    public void addLazyMethodInvocationAdvice(MethodAdviceReceiver methodAdviceReceiver)
    {
        for (Method m : methodAdviceReceiver.getInterface().getMethods())
        {
            if (filter(m))
                addAdvice(m, methodAdviceReceiver);
        }
    }

    private void addAdvice(Method method, MethodAdviceReceiver receiver)
    {
        final Class thunkType = method.getReturnType();

        final String description = String.format("<%s Thunk for %s>",
                                                 thunkType.getName(),
                                                 InternalUtils.asString(method));

        MethodAdvice advice = new MethodAdvice()
        {
            /**
             * When the method is invoked, we don't immediately proceed. Intead, we return a thunk instance
             * that defers its behavior to the lazily invoked invocation.
             */
            public void advise(final Invocation invocation)
            {
                ObjectCreator deferred = new ObjectCreator()
                {
                    public Object createObject()
                    {
                        invocation.proceed();

                        return invocation.getResult();
                    }
                };

                ObjectCreator cachingObjectCreator = new CachingObjectCreator(deferred);

                Object thunk = thunkCreator.createThunk(thunkType, cachingObjectCreator, description);

                invocation.overrideResult(thunk);
            }
        };

        receiver.adviseMethod(method, advice);
    }

    private boolean filter(Method method)
    {
        if (method.getAnnotation(NotLazy.class) != null) return false;

        if (!method.getReturnType().isInterface()) return false;

        for (Class extype : method.getExceptionTypes())
        {
            if (!RuntimeException.class.isAssignableFrom(extype)) return false;
        }

        return true;
    }
}

The core of the LazyAdvisor service is in the addAdvice() method. A MethodAdvice inner class is defined; the MethodAdvice interface has only a single method, advise(). The advise() method will be passed an Invocation that represents the method being invoked. The Invocation captures parameters passed in as well as the return value or any checked exceptions that are thrown. Invoking the proceed() method continues on to the original method of the service3.

At this point, the thunk encapsulates the original method invocation; we even have an object for that: the Invocation instance originally passed to the advise() method. Invoking any method on the thunk will cause the ObjectCreator.createObject() method to be triggered: this is where we finally invoke proceed() and return the value for the lazily invoked method.

Other uses for Thunks

In essence, this thunk approach gives you the ability to control the context in which a method is executed: is it executed right now, or only when needed? It is only a little jump from that to executing the method in a background thread. In fact, Tapestry includes a ParellelExecutor service that can be used for just that.

Conclusion

Type-safe thunks are a powerful and flexible technique for controlling when (or even if) a method is invoked without sacrificing type safety. Unlike more intrusive techniques that rely on manipulating the bytecode of existing classes, type-safe thunks can be easily and safely introduced into existing code bases. More than that, this exercise opens up many exciting possibilities: these techniques (coding to interfaces, multiple objects with the same interface, delegation) open up a path to a more fluid, more responsive, more elegant approach to coding complex behaviors and interactions ... while reducing the total line count and complexity of your code.

One of the things I am most happy about in Tapestry is the way in which we can build up complex behaviors from simple pieces. Everything stacks together, concisely and with minimum fuss:

  • We can create a thunk around an ObjectCreator, to defer the instantiation of an object
  • We can capture a method invocation and convert that into an ObjectCreator and a lazy thunk
  • We can advise a method without changing the actual implementation, to provide the desired laziness
  • Tapestry can call an advisor method of our module when constructing the ComponentMessagesSource service
  • We can inject services that do the advising right into advisor methods

Footnotes

1 Actually, all functions in Haskell take exactly one parameter which is both mind-blowing and not relevant to the discussion.

2 Services can be based on classes rather than interfaces, but then you lose a lot of these interface-based features, such as lazy proxies.

3Or, if the method has been advised multiple times, invoking proceed() may invoke the next piece of advice. For example, you may have added advice to a method for logging method entry and exit, and for managing database transactions as well as lazy evaluation.

Saturday, August 15, 2009

Detailed analysis of Tapestry 5

Sebastian Hennebrueder has just finished a detailed analysis of Tapestry 5. He comes at it from a few odd angles (for instance, he likes PicoContainer and shows how to integrate it). After a few misteps, he reaches these conclusions:

Once I overcame the first hurdles, I became more and more impressed. Building CRUD (create, read, update, delete) dialogs is incredible fast. The form component renders a form for a model, adding labels, input fields and validations. All this information is extracted from the model and its annotation and you don't have to write a single line of code. Here is the code for a complete form.

<t:beaneditform object="person"/>

You have control over the generated form and the possibility to change whatever you need either application wide or just in a single form. As a consequence, you get even less code than in a Ruby on Rails application. The learning curve is of course steeper than the one of the Stripes framework, but this is naturally. Stripes is a thin layer above the underlying technologies. Tapestry abstracts from the underlying technology in order to provide a lot of powerful functionality.

After having explored the functionality of the framework, writing my own components, writing mixins to extend existing components, I came to the conclusion that Tapestry is one of the most innovative frameworks and probably even the best candidate for enterprise applications.

To be honest, I think he makes the initial steps slighlty more complicated than they need to be and he properly criticizes the current state of the documentation. But he reaches the above conclusions, then goes into more detail, and finally outlines some performance data.

Quickie video from OSCON

In this video (from Greg Pollack), I have a very short segment discussing Clojure. Oddly, the fun part of what I said was clipped, which can be summarized as "Clojure is fast and I haven't had this much fun programming since I first learned Object Oriented programming (Objective-C, in 1995) or first started programming at age 13." I'm surprised they cut it because a few people in the speaker's lounge actually clapped after I was done!