Tapestry Training -- From The Source

Let me help you get your team up to speed in Tapestry ... fast. Visit howardlewisship.com for details on training, mentoring and support!

Sunday, July 02, 2006

Synchronization Costs

I've been doing a bit of work on the Tapestry 5 code base. I'm really interested in making Tapestry 5 screaming fast, and since the code is based on JDK 1.5, we can use concurrency support. Previously, I've blogged about using an aspect to enforce read and write locks. I decided to write a simple benchmark to see what the relative costs were.

As with any benchmark, its only an approximation. I tried enough tricks to ensure that Hotspot wouldn't get in there and over optimize things, but you can never tell. HotSpot is a devious piece of software.

I got interesting, and strange, results:

For a base line, I executed the code with no synchronization whatsoever (simple). The cost of synchronization (synched) shows that synchronization is pretty darn cheap, just an increment on top of the baseline code. The aspect graph shows the cost of using the @Synchronized aspect to maintain a reentrant read/write lock (that is, shared read lock combined with an exclusive write lock). Finally, the rw graph shows the cost of writing code that maintain the read/write lock in normal code (rather than having it added via the aspect).

Synchronization has some overhead. Using the @Synchronization aspect is about 4x as expensive as just using the synchronized keyword on a method. Strangely, the aspect version operates faster than the pure code version for reasons I can't explain, except it must have something to do with how AspectJ weaves my code (a lot of code I write ends up as private static methods after weaving, which may have some runtime performance advantage).

These results demonstrate an important tradeoff: if your application only occasionally has multiple threads hitting the same methods, then you might want to choose synchronized, since you aren't in danger of serializing your threads. By serializing, we mean that only one thread is running and all other threads are blocked, waiting for that thread to complete a synchronized block. Serialized threads is what causes throughput for a web site to be bad, even though the CPU isn't maxed out ... it's basically, Moe, Larry and Curly fighting to get through a single, narrow door all at the same time (they race to claim the single, exclusive lock).

Tapestry, on the other hand, will have a number of choke points where many threads will try to simultaneously access the same resource (without modifying it). In those cases, a shared read lock (with the occasional exclusive write lock) costs a little more per thread, but allows multiple threads to operate simultaneously ... and that leads to much higher throughput. Here, Moe, Larry and Curly get to walk through their own individual doors (that is, each of them has a non-exclusive read lock of their own).

As with any benchmark, my little test bench is far, far from a simulation of real life. But I think I can continue to make use of @Synchronized without worrying about tanking the application. In fact, just as I predicted Tapestry 4 would out-perform Tapestry 3, I believe Tapestry 5 will out perform Tapestry 4, by at least as much.


jesse said...

Very interesting. It's been a while since I delved into this area, but last time I was doing these kinds of things I had found that the performance benefits associated with ReadWrite style locking mechanisms can really only be realized in true multi processor environments.

For something like this, I might even try a simpler locking mechanism like a dumb Semaphore (http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/Semaphore.html).I think you'll find that a Semaphore will still be faster than synchronized. (in your tests at least)

Of course, my direct experience with the new concurrent api's (or the library they were derived from ) is limited, as I had written my own library when I needed these things. (not knowing the original concurrent library existed, it may very well have not existed at the time)

jesse said...

Oops....Semaphore was a stupid suggestion for your purposes.

This looks like the guy you want. http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/locks/ReentrantLock.html.

Just be sure fairness is turned off/false. (default no args constructor).

If synchronized is faster than using a combination of this guy + the supportive Condition objects it provides I would be very surprised.

Unknown said...

Yes, definiately using ReentrantReadWriteLock, which supports multple readers and an exclusive writer. For the various pools and caches Tapestry 5 will use, most operations are read only, with the occasional short write, such as after reading and parsing an HTML template, or when a cache invalidation event is propogated around.

Anonymous said...

if you're using 1.5.0_06, try running with -XX:+UseBiasedLocking It dramatically improves synchronization performance.

Anonymous said...

howard have you thought of using nio to read optimistically from large buffers/disk caches?

Im sure youve heard about the glassfish grizzly connector on 10 threads competing with ARP/Http11 tomcat on 500 threads.


Unknown said...

Huh? Reading large buffers has very little to do with this, it's more a general question involving untold numbers of small method invocations.

Henrik Vendelbo said...

I seem to recall that Commons Collections has versions that differentiate between readers and writers.

Anonymous said...

Commons Collections has some dangerously not-thread-safe code. It may have been expunged by now, but I'd stick with java.util.concorrent if possible.

Anonymous said...

Just some adds from my experience ...

a) Timeouts are the single most valuable feature from my perspective, and why in many cases I can't go back to synchronization constructs. I work on a network application, and the ability to provide another failure path for some things is invaluable. It always puts me in the mind of 'the eight fallacies'.

b) Second most, the ability to replace the lock implementation with a debug version is very, very handy. I would think with the aspect nature of your implementation it would be very easy to replace your version with one that prints out lock statistics (waiters, time to acquire, etc)

The concurrency is nice, but something I rarely think about these days. The overhead of a single synchronized call vs. the overhead of the construct is relatively trivial in the scheme of things performance wise, I have many bigger fish to fry.

jesse said...

Heh...Synchronization of threads may seem trivial - until you try to deploy an application that thinks this way on a multi CPU box. People tend to get mad when they find out that their application isn't able to properly execute in a concurrent cpu environment.

I've seen/played with what Howard is doing first hand on single/multi CPU boxes and the performance gain is dramatic. It won't matter to a lot, but for the people it does it will be greatly appreciated and well worth the effort.

Ken Yee said...

Cool...the tapestry sites I've been on have felt a little sluggish (just MHO) compared to PHP sites, so this will probably help :-)

Anonymous said...

I'm still troubled by Tapestry's use of HiveMind. I know it was started around the time that Spring was (a little after Pico container, et. al.), but at this point Spring is pretty much a defacto standard.

Tapestry is (luckily + happily) gaining ground, but people still have to get frustrated with the complexity and rigidity of JSF, so Tapestry still has a long and possibly challenging road in the marketplace ahead. I think having HiveMind be part of that adoption curve makes Tapestry's road more difficult. In some cases maybe twice as hard -- vs. if it were seen as using Spring. (To make the case extreme suppose Tap. used Cayenne instead of easily integrating with Hibernate...)

Anonymous said...

I'm still troubled by Tapestry's use of HiveMind. I know it was started around the time that Spring was (a little after Pico container, et. al.), but at this point Spring is pretty much a defacto standard.

Integrating spring with tapestry is so trivial that this statement is really pointless. Just because Tap is powered under the hood by Hivemind does NOT mean it forces you to use Spring as your DI of choice.

The same logic as you have here can be applied to JSF. JSF mandates a proprietary (per-impl) DI container which is neither Hivemind, spring nor pico. Does this mean JSF will have a tough time being adopted by spring users? Does it even mean that JSF is somehow difficult to integrate with spring?
Hell no.

Simply bemoaning the fact that Tapestry uses a competing technology to spring under the hood "does not a valid argument make."

In point of fact, the tapestry-spring library is more robust than the JSF-spring delegating variable resolver system because there is a namespace collision between JSF and spring beans which has to be resolved kludgily. There is no such collision between hivemind services and spring bean ids.

The analogy about Cayenne vs. Hibernate is also inapt. Tapestry is not a master-framework like spring. It does not aim to solve all problems--it is VERY specifically a view tier technology. Therefore the integration between tap and hibernate (or ANY persistence framework) is a trivial issue. I could very easily write a hibernate-backed tap-app without using a single tapernate component. In fact I prefer to, because my persistence concerns are kept separate from presentation (e.g. in an EJB container it is unavoidable).

Most of the derision against Tap as being backed by hivemind and such is purely associative, usually by people who have never actually written a Tapestry + spring app.

Sorry to sound annoyed, but let's get over our pettiness-of-names already.

Go go devil boy!


Anonymous said...

Just because Tap is powered under the hood by Hivemind does NOT mean it forces you to use Spring as your DI of choice.

That should read: ...does NOT mean it forces you to use Hivemind as your DI of choice.