Tapestry Training -- From The Source

Let me help you get your team up to speed in Tapestry ... fast. Visit howardlewisship.com for details on training, mentoring and support!

Monday, June 05, 2006

Whoa ... Spring doesn't lazily instantiate beans?

Just stumbled across a blog about Lazy Bean instantiation in Spring 2.0. This is kind of funny to me ... lazy instantiation is so important, so part of the base line of IoC container functionality, that I just assumed Spring already did this.

Update: Spring has had lazy instantiation since at least 1.2, but it isn't on by default.

For the record: this is one of the essential services provided by HiveMind since day 1. HiveMind lazily instantiates everything it can, and is smart, using a pair of proxies for efficiency.

Why a pair? For threading efficiency. The outer proxy is visible to the world and delegates (initially) to the inner proxy. The inner proxy has its methods synchronized as it is responsible for thread-safe lazy instantiation. Once the service (and its interceptors) are instantiated, the inner proxy replaces itself inside the outer proxy. That is, once the service is instantiated, the outer proxy delegates directly to the service implementation, and the inner proxy is no longer needed (it is released to the garbage collector).

Doing things this way is a great idea, but not my great idea. It was suggested a long ways back by someone in the HiveMind community. I'd like to extend proper credit on this, but my memory is weak.

Using double proxies is a very powerful technique, since it ensures thread-safe, just-in-time instantiation of the service, without paying the cost of synchronizing every method in the outer proxy. It's an example of what an IoC container buys you ... I don't think you'd want to code three implementations of every service interface manually, but if you skimp, you undercut the performance and scalability of your app. Using HiveMind, two of the three implementations (the outer and inner proxies) are just provided for you.

So, that's a new answer to one of the more frequent questions posed to me: Why does Tapestry use HiveMind and not Spring?. A: Spring does not provide all the capabilities that Tapestry requires, such as lazy instantiation. Tapestry consists of about 200 services, but very few of those are needed at startup, and a percentage will not be used in most applications. Lazy instantation is a huge win for Tapestry.

Futher, HiveMind's approach to lazy instantiation and proxies means that you can have mutually dependant services ... services that are injected into each other. The proxies, and the lazy instantiaton, means that you bypass the normal chicken-and-the-egg problem of which service to instantiate first. I've used this technique to break untestable, monolithic code into two halves that can each be properly unit tested.

Back to Spring ... now that it looks like Spring 2.0 has lazy instantiation, I have to question the fact that it defaults to OFF.

17 comments:

Anonymous said...

I posted a correcting comment on the other site, but I would also like to add clarification here too...

Spring has supported lazy-initialization for quite some time. It is nothing new in 2.0.

True, it is false by default, but you can change the default behavior to lazy quite easily.

I have never used HiveMind and am not trying to say that Spring is better; I just want to make sure users are not mislead by the posting that this entry addresses.

Unknown said...

I've updated the blog entry to reflect this new information. It would be interesting to see how Spring's approach to thread safety compares with HiveMind's.

Anonymous said...

Times are changing and so it may turn out that lazy initialization is a modern urban java performance legend. Those guys developing modern days JVMs are clever, too.

Never forget the old saying, that "premature optimization is the root of all evil".

Anonymous said...

Howard -- It's hard to say, without seeing the code, but it sounds like your "double proxy to avoid synchronization" is in danger of suffering from exactly the same synch problem as double-checked locking.

It is essentially impossible to share any mutable field across threads without *always* synchronizing on it:

http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html
http://www-128.ibm.com/developerworks/java/library/j-dcl.html

Probably you're already aware of the issue, and have your outer proxy synchronizing on access to the inner one, but I thought I'd point it out just in case.

Anonymous said...

Can't believe you posted such an ill-informed post without checking your facts. I note you've now updated the post to correct this but this is kinda tucked away at the bottom (some readers might not get this far). Given the amount of humble pie you ought to be eating, the update should be more prominant (and, ideally, right at the beginning, in bold, with profuse apologies).

Unknown said...

"lazy initialization is a modern urban java performance legend"

Lazy initialization of short lived objects is a net loss; the modern GC is very capable of handling those. However, for long lived objects with potentially expensive initialization, lazy initialization is still a win.

Unknown said...

Paul -- good point, and it depends on the fact that updating a variable containing a object reference is an atomic operation. On any code path through the outer proxy, the code will either go to the fully instantiated service, or will go (first) to the inner proxy, where the method for obtaining the fully instantiated service is synchronized. Here's some psuedo code:

public class FooOuter implements Foo
{
private Foo _delegate;

public void doFoo()
{
_delegate.doFoo();
}

void setDelegate(Foo newDelegate)
{ _delegate = newDelegate; }
}

public class FooInner implements Foo
{
private FooOuter _outerProxy;
private Foo _service;

public void doFoo() {
getService().doFoo();
}

private synchronized Foo getService()
{
if (_service == null)
{
_service = ... ;
_outerProxy.setDelegate(_service);
}

return _service;
}

}

I left out a bunch of minor details, but the get think is FooInner's getService() method. You can see that FooOuter's _delegate will either be the FooInner proxy (initially), or the actual service.

Anonymous said...

Howard -- It's not the atomicity of object assignment that's the problem with double-checked locking; it's the fact that the compiler is free to reorder statements in such a way that a field can be visible before the object it points to is fully initialized -- unless all accesses to that field are synchronized.

In your code, I believe it is theoretically possible for FooOuter to see _delegate before _delegate has been fully constructed. I'm not an expert on this, but I do think you're in trouble.

Give a careful read to the cs.umd.edu link above.

Unknown said...

Paul -- I'm very interested in concurrency issues. The psuedo-code I presented is out of context; there's more synchronization going on inside HiveMind that ensures that OuterProxy and InnerProxy are pointing at each other before the "outside world" sees OuterProxy. Further, the HiveMind test suite includes a few tests where I fire hundreds of threads in parallel at this, and similar code, to try and sniff out synchronization issues.

Anonymous said...

Howard - after looking at the SingletonServiceModel code, I believe that Paul is correct. The double proxy implementation used in HiveMind is just a slight variation of DCL, which is not thread safe and will not work on some JVMs or SMP architectures.

The problem is that read access to the OuterProxy._inner field is not synchronized, and thus the actual service implementation may be seen and used by other threads before it is properly initialized.

1. The JVM is free to inline and reorder code, so that the atomic assignment to OuterProxy._inner may happen before the service implementation's constructor is executed.

2. A thread running on another processor may see and use the OuterProxy._inner field before the state (instance variables) of the service implementation are flushed from the creating processor's cache to main memory, or before the state is loaded from main memory to the second processor's cache.

There are three ways to fix this:
1. synchronize read access to _inner
2. Only for 1.5 JVMs (implementing JSR133 memory model): declare _inner as volatile
3. Use static initializers

As outer and inner proxies are javassist-generated, I suggest the static initializer approach, as follows:

// just one proxy
class FooProxy implements Foo, RegistryShutdownListener
{
void doFoo()
{
FooImplementationHolder.INSTANCE.doFoo();
}

void registryDidShutdown()
{
FooImplementationHolder.INSTANCE = FooShutdownImplementation.INSTANCE;
}
}

// instead of inner proxy:
class FooImplementationHolder
{
// static initializer is guaranteed to complete before any thread sees INSTANCE
static Foo INSTANCE = (Foo)SingletonServiceModel
.getActualServiceImplementation(
FooImplementationHolder.class);
}

// instead of checking _shutdown on every method invocation:
class FooShutdownImplementation implements Foo
{
static Foo INSTANCE = new FooShutdownImplementation();

void doFoo()
{
throw HiveMind.createRegistryShutdownException();
}
}

Anonymous said...

There is possibly another synchronize problem. Scenario: The first thread replaces the inner proxy with the service and flushes this change to the main memory. The garbage collector runs right now and releases the inner proxy object in main memory, because there are no further references to the inner proxy in main memory. Without synchronization a second thread on another processor could have his own "copy" of the outer proxy in its processor cache yet - containing a reference to the now no more existing inner proxy. Calling a method of the outer proxy which delegates the call to the inner proxy may cause unexpected results.
It's likely a rare scenario, I grant, but not impossible, unless JVM specification (which I doesn't know so deep) doesn't allow such a behaviour.

Btw, since threading behaviour depends on hardware architecture and JVM implementation it is possible that your tests run very well on one machine and (occasionally) fail on another with a different architecture and/or JVM.

Unknown said...

Karsten --

Looking at the code paths, it seems unlikely to me that the compiler would have much opportunity to inline things.

Insidethe inner proxy (generated by SingletonServiceModel) we see the following code:

private final synchronized MyServiceInterface _service()
{
if (_service == null)
{
_service = (MyServiceInterface ) _serviceModel.getActualServiceImplementation(); // 1
_deferredProxy._setInner(_service); // 2
}

return _service;
}

The memory model for Java is a tricky thing in concurrent code.

The first line of code (// 1) may not completely execute before (// 2) does. That leaves a tiny window where _service is not fully instantiated and yet is exposed (via the OuterProxy, aka _deferredProxy) to some other thread. Further, the nature of these two statements means that //2 can't happen before //1, just in parallel. It's a pretty damn tight window.

Personally, I don't see it. If (// 1) was just creating a new instance, there might be something. However, getActualServiceImplementation() does a lot of work, much of it delegated to other objects. The secondary objects (service and interceptor factories) are hidden behind interfaces, interfaces with multiple implementations (and therefore, not likely to be inlined by the magic of Hotspot). The window shrinks from merely remotely possible to zero.

Further, the scenarios necessary for even the remotely possible window are not completely likely. If multiple threads all hit _service() at the same time, they will be serialized by the synchronized lock on the method. What's necessary is that, while //1 and //2 are both running, yet another thread arrives inside the OuterProxy and sees the full service implementation (via //2) before the initialization code (// 1) finished executing.

I'd love for every bit of code I write to be provably correct under any circumstances. For this to be an actual concern, we'd need the intersection of many different unlikely scenarios to line up: the threads arriving inside the OuterProxy and InnerProxy at just the right times (highly unlikely) along with Hotspot reording the code (distributed across at least a dozen different classes) so that //2 occurs before //1 completes. LIkelyhood falls close enough to zero to not be an issue.

I really like the fact that, once the service implementation is fully instantiated, the inner proxy goes away, and the outer proxy can delegate to the service implementation without needing any synchronization.

Unknown said...

Horst --

Synchronized or no, I'm very sure that the Java Memory Model takes into account object references inside registers and will not GC objects that have live references anywhere.

Anonymous said...

Changing subject for a second, I actually think that having lazy-load off by default is the better approch. The only time (at least for what I do) where startup time really matters is during development, and there you are free to enable lazy-loading at the top level element in spring. Once you deploy your code I will much rather have the singleton services fully instantiated and configured from the start for 2 reasons.

1) Its an extra check for errors in the configuration file.

2) Performing initialization as needed might result in slow responses for the first users that need those services to handle their request.

Anonymous said...

Howard,

first, I didn't talk about references in registers. Code can control what's in registers, but not what a processor cache contains. With synchronization it is possible to control what's not in the cache, or more likely which data is the same in cache and main memory. Sharing data between threads without synchronization is incorrect programming and I doubt, that the Java Memory Model takes into account incorrect programming.
Caching could be also a problem with that what Karsten said. May be reordering and inlining can't break your code - a cache can. Imagine that for any reason the reference of the newly created service is written from cache to main memory but the initialisation values are not. That's possible without synchronization. Other threads will see an uninitialized service.

And second, I think, it's very dangerous, to rely on a probability above 0 - even if it's very near to 0 - in things you give to others, if you know how to make improbability. It should at least be documented and if it's as easy as it seems with Hivemind, you could make it configurable, to let the user choose between performance and correctness.

Anonymous said...

Hi Horward,

I think the last post of Horst has a point. That even if instruction reordreing does not happen, the data could still be written out of order when viewed from another processor, due to cache/main memory flushing or other issues. Here is the quote from http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html:

"Even if the compiler does not reorder those writes, on a multiprocessor the processor or the memory system may reorder those writes, as perceived by a thread running on another processor."

Anonymous said...

Sorry for the kerfuffle - as someone pointed out, it was my mistake for saying that this functionality was new in Spring 2.0 - I just hadn't seen it in Spring 1.0...