Thursday, July 31, 2003

HiveMind going strong

HiveMind stuff is going strong; just brought the documentation up-to date and continued cleaning up things and simplifying interfaces.

Something that just went was the class attribute on the <module> element. The idea was to allow subclassing of the module to provide a common location for code. This kind of parallels what you would do in an Eclipse plugin. In HiveMind, it isn't necessary ... you'd just create a service and put the common logic there. So there's no need to allow subclasses of the base Module class (was BaseModule, now its ModuleImpl).

I'm definately looking forward to merging Tapestry and HiveMind together (that will form Tapestry 3.1 and HiveMind 1.0-beta).

Wednesday, July 30, 2003

Whew!

Humpty dumpty is back together again. I've reformulated/refactored/reinvented HiveMind as discussed in the previous few 'blog entries.

Still need to catch up on documentation and the XSL for creating HiveMind registry documentation.

Code coverage is around 84% and I haven't even tried to fill in the gaps yet.

I was able to streamline the key interfaces (Module, ServiceExtensionPoint and ExtensionPoint) quite a bit by moving the logic that caches service implementations into ServiceExtensionPointImpl, and the logic that caches extension point elements into ExtensionPointImpl. Looks better.

HiveMind seems a bit faster now ... it's hard to say, because the test suite has changed quite a bit, but I think the new SAX parser is faster than Digester ... one less layer: a lot less logic to figure out which "rule" to fire in response to a given SAX startElement() call. Also, most of the time, the DescriptorParser can complety ignore character content provided via characters() ... doesn't have to stuff it into a StringBuffer just to throw it away later.

Version Numbering

Here's an article about how some projects mis-use version numbering: The Fishbowl: Version Numbers and You

Tapestry and HiveMind follow a different cycle, based on a major version number and a minor version number, i.e., 3.1. Major version 3, minor version 1. Major versions don't happen very often, since they represent a near-total rewrite of the framework. Minor versions change in an orderly fashion, representing incremental improvements. In general, we want to maintain backwards compatibilitily from one minor version to its predecessor, but all bets are off when the major version changes.

A change to minor version represents a new release, which has its own lifecycle (and takes several months). Releases go through stages:

  • alpha - code coming together, features being added, removed, documentation nowhere, too chaotic to use for real work. Additionally, deprecated code from previous releases may be stripped out.
  • beta - things setting down, the goal is to stop adding features, stabilize and document
  • rc - release candidate; exposed to a wide audiance so as to find any final bugs
  • GA - General Availability - the final version that people should use.

So for each release, there will be a number of incremental versions, as the code goes through these cycles. For example: 2.3, 3.0-alpha-1, 3.0-alpha-2, 3.0-beta-1, 3.0-beta-2, 3.0-beta-3, 3.0-rc-1, 3.0, 3.1-alpha-1.

It's possible, though we've avoided it so far, to "double dip"; for example, drop from beta back into alpha.

This system works well; the older numeric system (from the 1.0 and early 2.0 days of Tapestry) made it hard to tell how stable the code was, now the stability (alpha, beta, rc) is part of the name (and thus, incorporated into the distribution and even the names of the library JARs).

I adapted this naming sequence from somewhere, can't remember exactly where. I find it much more managable than a fully numeric system (quick: how stable is 3.7.9.2.1?) The code goes through known stages (we do a vote on each stage transition). I've been encouraging other projects (such as OGNL) to make use of this same naming system.

Monday, July 28, 2003

Improving HiveMind naming conventions

Because I'm working to make HiveMind feel a bit more like Eclispe plugins, I'm beginning to rename things to be more consistent.
  • <configuration> --> <extension-point>
  • <contribute-configuration> --> <extension>
  • <contribute-service> --> <extend-service>

In addition, I'm scrapping "internal contributions" to extension-points. Instead, we'll allow <extension> to use an unqualified name for an extension point that's within the same module.

ApacheCon US 2003

Per Andrew C. Oliver's suggestion, I've submitted to present a session on Tapestry at ApacheCon US 2003. It'll be a challenge doing anything meaningful in 50 minutes ... I may need to request two consecutive sessions. I already have most of the materials I'll need, since I've been working on an actual multi-day Tapestry course.

Thursday, July 24, 2003

Thoughts on Exceptions

Everyone has their own approach to exceptions. Here's a few of mine.

Runtime vs. Checked

Checked exceptions should be used only when they will be explicitly caught and code paths will diverge based on the type of exception. This is, in fact, extremely rare (and some JVM code violates this pattern). For example:

try
{
  foo.bar();
}
catch (FooFrobbedException ex)
{
  // recover

  gnipGnip(foo);
}
That's where you want to use a checked exception. Generally, these situations only occur along software boundaries, usually between the code-you-write and the code-you-got (from a library). Questions to ask: can this be recovered from? and will anyone catch this?

Catch and Wrap

This again occurs along boundaries; often you'll catch someone else's checked exception, and (since there is no recovery), rethrow it (in a runtime exception). What's essential is that you keep a reference to the original exception. I tend to call this a "root cause". Tapestry exceptions have been doing this forever; JDK 1.4 has added the ability to attach root cause exceptions consistently (in a somewhat ugly way to maintain backwards compatibility). In JDK 1.3 and earlier it was hit and miss.

Why is this important? Because as you catch and throw, you are losing information; a SAXParseException may identify the public id, line and column but as it goes up the stack, that information is lost; we lose the line and column when we say "can't parse foo.xml", and we may lose even that amount of context when we throw an exception that says "can't accomplish task foo".

Add information

If you are going to catch an exception and rethrow it, add information along the way. Extend the exception message with more information, to provide more context. Add additional read-only properties to the exception to identify more information. The point of this wrapping and rethrowing is that an uncaught, top-level exception provides you with the tools to identify the bug and fix the problem.

Log it or not?

In some application's I've worked on, every time an exception occurs, it gets logged as an ERROR. Of course, that exception gets wrapped and rethrown ... and caught, and logged and rethrown. The end result is thousands of lines of meaningless stack trace scrolling by, which only muddles things ... it doesn't help track down the problem or even the source.

The right approach: catch the exception, and log it, as DEBUG (or at most WARN) before wrapping and rethrowing it ... but only at code boundaries. These are those same code boundaries between, typically, your applications code and library and framework code you've got or bought. That's the only interesting place. Why not ERROR? Because you have top level reporting (next item).

Top level exception reporting

This is the most important part of exception handling ... what to do when nobody catches and handles the exception. It all comes together here, something Tapestry has excelled at since before 1.0. When a top-level exception makes it to the top loop (in Tapestry's case, to the Engine's service() method) you want to produce as much information as possible.

Tapestry's approach has been to identify the stack of exceptions, starting with the outer-most one and working inwards. Each exception's class name and message is displayed, along with all readable properties. Along the way, it finds the next innermost exception and does the same thing ... down, down, down to the deepest exception. That's where the stack trace is displayed.

Tapestry includes a utility class, ExceptionAnalyzer, for just this purpose (it may move to HiveMind in the future).

More JSF propaganda and Tapestry counter-propaganda

OnJava article: Why Web Developers Need Java Server Faces [Jul. 23, 2003]

As usual, the article doesn't bother to make the case for JSF beyond vague allusions to events and MVC.

Tapestry collaborator Mind Bridge has told me he thinks JSF has gone from threat to friend; JSF validates the concept of web application components (in the minds of the majority who are ready to drink whatever special kool-aide comes out of Sun and the JCP), but delivers so dismally that people will be forced to find an alternative: Tapestry.

Kind of thrashing

I've been kind of thrashing on this whole schema/DFA/validator thing. I decided (in the shower, where I do all my best thinking) to temporarily abandon formal validation, and concentrate on everything else.

Over the last couple of days, I've been using the existing HiveMind with Vista (work project), where I've been setting up all the startup, post-startup and shutdown tasks for the server as HiveMind services and contributions.

It's working like a charm, but is too verbose; contributions look something like:

<new>
  <set property="title" value="Cache Initialization"/>
  <set property="order" value="100"/>
  <set-create property="runnable" class="com.webct.vista.framework.cache.CacheStartup"/>
</new>

Under the new system, this would look more like:

<task order="100" title="Cache Initialization">
  <runnable class="com.webct.vista.framework.cache.CacheStartup"/>
</task>

And its only that complex because I need a few different ways to define what to execute (the existing code is based on invoking static methods on classes, so I currently have a syntax for accomplishing that using reflection).

Sunday, July 20, 2003

More thoughts on improved HiveMind

Part of this new design for HiveMind is a change to implementation factory parameters. These will also be a schema ... basically, each implementation factory will have a kind of implicit configuration extension point to define its parameters.

This is important, because all the module deployment descriptor tags related to setting of properties are going to fade away; if they come back, they will be reimplemented as specific to a implementation factory service.

I'm actually struggling a little bit with the idea of processing the XML of configuration contributions (and factory service parameters). I'll be emulating a subset of W3C schema give-or-take, and dealing with that is a bit of work. Just need a little block of time to think about how to approach it properly, with all the kinds of variations that are possible (in terms of sequences, choices and setting bounds on the number of occurances for an element, choice or sequence). It's a little frustrating because its certainly a problem that has been solved repeatedly by others ... and, I can look at their code, but it may just be eaiser to hash it out myself.

Friday, July 18, 2003

Philosophy of configuration contributions to HiveMind

A lot of the fancy stuff in HiveMind is in HiveMind services, but the meat-and-potatoes are in HiveMind configurations.

A configuration extension point in HiveMind is a container to which Java objects are contributed. The extension point defines the "flavor" of objects that may be contributed in terms of a Java class or interface that contributions must be assignable to. It may also defined either a factory service to create new contribution instances, or define a class to be instantiated (this is to support <new> in contributions).

This is a very Java-object centric approach. Unlike Eclipse plugin contributions, which are very XML based (you use a subset of XML schema to define the "flavor" of contributions in Eclipse), you are really just providing instructions on how to instantiate and configure/assemble contribution objects.

A debate here at WebCT (I'm developing HiveMind as general purpose, but to fit the needs of my employer) is whether this is appropriate.

One contrary suggestion is to move closer to Eclipse plugins; that contributions should be expressed as XML and the configuration extension point should be responsible for converting them from XML to Java objects.

If you look at the Eclipse code, plugins have to waste a good amount of code walking XML elements and attributes to deduce what the content of the extension point is. HiveMind client code, the code that gets the contents of an extension point, just has to cast the elements from the List to the right type. I really don't want to have to change that ... that's a lot of potentially buggy code to write, just to access the elements of the configuration.

But what if the extension point included a kind of schema for contributions, and the schema mixed in some ideas from Digester about how to process contributions? We'd have to write our own souped-up digester capable of adapting its rules on the fly. Anyway, I can then imagine a configuration extension point looking something like:

<configuration id="SymbolSource" element-type="org.apache.commons.hivemind.SymbolSourceContribution">
  <description> ... </description>
  <schema>
    <element name="symbol-source">
      <rules>
        <create class="org.apache.commons.hivemind.SymbolSourceElement"/>
      </rules>

      <attribute name="order" type="int" required="false">
        <rules>
          <set-property property-name="order"/>
        </rules>
      </attribute>

      <attribute name="class" type="org.apache.commons.hivemind.SymbolSource">
        <rules>
          <instantiate-instance property-name="source"/>
        </rules>
      </attribute>

      
    </element>
  </schema>
</configuration>

Obviously, this is a lot more work for the person defining a service extension point, but it means that contributions to the extension point are more succinct and readable:

<contribute-configuration id="org.apache.commons.hivemind.SymbolSource">
  <symbol-source order="100" class="foo.bar.Source"/>
</contribute-configuration>

The <rules> element contains Digester-like rules for constructing objects, setting properties, connecting parent and child objects, as so forth.

In addition, we can attach a description to individual elements and attributes, which is nice.

Of course, there would be analogs for more of the simple XML Schema elements, such as sequence and choice. This is just a bit of a sketch, there are many details to be filled out and its quite a programming challenge.

Other artifacts within a module specification will also need a <schema>, such as a service implementation factory (to define what may be contributed inside a <parameters> element).

There are a lot of outstanding issues, though:

  • How to cleanly handle contributing a reference to a service (much like <set-service-ref> in the current design)?
  • Likewise, other special types: localized messages and OGNL expressions
  • Even more difficult to generate documentation from this style, since we won't know the types of elements ahead of time.
  • Does the idea of configuring an object instance (say, a core service implementation) make sense? Will we also have to go through a special factory service?
  • Will this significantly affect performance?
  • We will have to keep more of the content read from a module descriptor in a DOM format; we can't go right to descriptor objects anymore, because we need the contribution and the schema for the configuration extension point at the same time, which we don't necessarilly have until all module descriptors have been parsed.

I want to keep noodling on this; it feels like it is a step in the right direction. It means, though, that we have to abandon having an XML Schema for HiveMind, since the content of a HiveMind module descriptor will be very free form. I suppose that this could be addressed using namespaces, but the only advantage to using a schema is the use of a validating SAX parser, and we'll have to be doing our own validating anyway, so we can at least generate some better error messages for failures!

Thursday, July 17, 2003

HiveMind updates

HiveMind continues to come along nicely; I'm quite proud of it.

The latest addition was a revision to how parameters to service implementation factories work. A service implementation factory is a HiveMind service used to construct the implementation of some other HiveMind service. Down the rabbit hole yet? This is occasionally useful ... the example built into the HiveMind framework is creating a HiveMind service that fronts for a stateless service bean. The implementation that is generated (in this case, using JDK dynamic proxies) looks up the home interface in a naming service (a HiveMind service wrapping around JNDI), invokes the create() method (using reflection), and forwards all the remaining interface methods to it.

The factory needs to know the JNDI name. The new approach is that the factory will create a parameters object and the <parameters> element of the descriptor will configure the properties of the parameters object, which is then passed back to the factory so that it can construct the final service.

Why not just set properties on the factory service implementation itself? Because the factory service implementation is a HiveMind service, and therefore it is multi-threaded. Two different threads could be trying to create implementations for two different session EJBs at the same time.

It ends up looking something like:

<service id="SimpleRemote" interface="hivemind.test.services.SimpleRemote">
  <service-factory service-id="org.apache.commons.hivemind.EJBProxyFactory">
    <parameters>
      <set property="jndiName" value="hivemind.test.services.Simple"/>
    </parameters>	
  </service-factory>			
</service>

Ah! Blogger seems to have come back to life after a long absence.

This is very much good news; the Tapestry Blog at Freeroller is just too unstable to use. I guess from here on in, I'm back home in Bloogler land.

I'm using this blog for both Tapestry and HiveMind because eventually, much of the Tapestry code base will be moved into HiveMind, and much of Tapestry will be re-architected to take advantage of it.