Friday, June 29, 2012

Gradle CoffeeScript Compilation

As I'm working on rebooting JavaScript support in Tapestry one of my goals is to be able to author Tapestry's JavaScript as CoffeeScript. Tapestry will ultimately have a runtime option to dynamically compile CoffeeScript to JavaScript, but I don't want that as a dependency of the core library; that means I need to be able to have that compilation (transpilation?) occur at build time.

Fortunately, this is the kind of thing Gradle does well! My starting point for this is the Web Resource Optimizer for Java project, which includes a lot of code, often leveraging Rhino (for JavaScript) or JRuby, to do a number of common web resource processing, including CoffeeScript to JavaScript. WRO4J has an Ant task, but I wanted to build something more idiomatic to Gradle.

Here's what I've come up with so far. This is a external build script, separate from the project's main build script:

The task finds all .coffee files in the input directory (ignoring everything else) and generates a corresponding .js file in the output directory.

The @InputDirectory and @OutputDirectory annotations allows Gradle to decide when the task's action is needed: If any file changed in the directories provided by these methods then the Task must be rerun. Gradle doesn't tell us exactly what changed, or create the directories, or anything ... that's up to us.

Since I only expect to have a handful of CoffeeScript files, the easiest thing to do was to simply delete the output directory and recompile all CoffeeScript input files on any change. The @TaskAction annotation directs Gradle to invoke the doCompile() method when inputs (or outputs) have changed since the previous build.

It's enlightening to note that the visit passed to the closure on line 34 is, in fact, a FileVisitDetails, which makes it really easy to, among other things, work out the output file based on the relative path from the source directory to the input file.

One of my stumbling points was setting up the classpath to pull in WRO4J and its dependencies; the buildscript configuration on line 4 is specific to this single build script, which is very obvious once you've worked it out. This is actually excellent for re-use, as it means that minimal changes are needed to the build.gradle that makes use of this build script. Earlier I had, incorrectly, assumed that the main build script had to set up the classpath for any external build scripts.

Also note line 54; the CompileCoffeeScript task must be exported from this build script to the project, so that the project can actually make use of it.

The changes to the project's build.gradle build script are satisfyingly small:

The apply from: brings in the CompileCoffeeScript task. We then use that class to define a new task, using the defaults for srcDir and outputDir.

The last part is really interesting: We are taking the existing processResources task and adding a new input directory to it ... but rather than explicitly talk about the directory, we simply supply the task. Gradle will now know to make the compileCoffeeScript task a dependency of the processResources task, and add the task's output directory (remember that @OutputDirectory annotation?) as another source directory for processResources. This means that Gradle will seamlessly rebuild JavaScript files from CoffeeScript files, and those JavaScript files will then be included in the final WAR or JAR (or on the classpath when running test tasks).

Notice the things I didn't have to do: I didn't have to create a plugin, or write any XML, or seed my Gradle extension into a repository, or even come up with a fully qualified name for the extension. I just wrote a file, and referenced it from my main build script. Gradle took care of the rest.

There's room for some improvement here; I suspect I could do a little hacking to change the way compilation errors are reported (right now it is just a big ugly stack trace). I could use a thread pool to do compilation in parallel. I could even back away from the delete-the-output-directory-and-recompile-all approach ... but for the moment, what I have works and is fast enough.

Luke Daly leaked that there will be an experimental CoffeeScript compilation plugin coming in 1.1 so I probably won't waste more cycles on this. For only a couple of hours of research and experimentation I was able to learn a lot, and get something really useful put together, and the lessons I've learned mean that the next one of these I do will be even easier!

Getting CrashPlan to work on Mac after JDK 1.7 upgrade

After upgrading my Mac to JDK 1.7, I noticed that CrashPlan stopped working. Given just how much trouble I got in a ways back when I lost a month's worth of pictures and videos of my son I'm now a belt and suspenders kind of backup guy ... the belt may be TimeMachine, but the suspenders are CrashPlan.

After working it out with their technical staff, it turns out CrashPlan is not compatible with JDK 1.7 on Mac which is rather unfortunate. And, having installed JDK 1.7, it was trying to use that version!

Getting it working is pretty easy however; most of the information is available in their recipe for starting and stopping the CrashPlan engine. The key part is that the com.crashplan.engine.plist file includes an entry where you can specify the path to the Java executable; with a bit of experimentation to find the right path, it appears to be working.

The only change I made to the file was to change the path; the first entry for the ProgramArguments key:

You will, of course, need to sudo to edit the file and to stop and relaunch the CrashPlan engine. Now, having set this up (and, coincidentally, renewed by annual subscription) my computer is happily chugging away, making the backups.

Friday, June 08, 2012

Latency numbers every programmer should know

This is interesting stuff; Jonas Bonér organized some general some latency data by Peter Norvig as a Gist, and others expanded on it. What's interesting is how, scaling time up by a billion, converts a CPU instruction cycle into approximately one heartbeat, and yields a disk seek time of "a semester in university".

This is easier read on the original Gist page. Sorry about my blog's narrow formatting.

Another view of this data is as an animated presentation.

Wednesday, June 06, 2012

Synchronized Considered Harmful

New flash: concurrency is hard. Any time you have mutable data and multiple threads, you are just asking for abuse, and synchronized is simply not going to be your savior ... but the concurrency facilities of JDK 1.5 just might be.

I was recently contacted by a client who was load testing their Tapestry 5.3.3 application; they were using Tomcat 6.0.32 with 500 worker threads, on a pretty beefy machine: Intel Xeon X7460 @ 2.66Ghz, OpenJDK 64-Bit Server VM (14.0-b16, mixed mode). That's a machine with six cores, and 16 MB of L2 cache.

For all that power, they were tapping out at 450 requests per second. That's not very good when you have 500 worker threads ... it means that you've purchased memory and processing power just to see all those worker threads block, and you get to see your CPU utilization stay low. When synchronization is done properly, increasing the load on the server should push CPU utilization to 100%, and response time should be close to linear with load (that is to say, all the threads should be equally sharing the available processing resources) until the hard limit is reached.

Fortunately, these people approached me not with a vague performance complaint, but with a detailed listing of thread contention hotspots.

The goal with Tapestry has always been to build the code right initially, and optimize the code later if needed. I've gone through several cycles of this over the past couple of years, optimizing page construction time, or memory usage, or throughput performance (as here). In general, I follow Brian Goetz's advice: write simple, clean, code and let the compiler and Hotspot figure out the rest.

Another piece of advice from Brian is that "uncontested synchronized calls are very cheap". Many of the hotspots located by my client were, in fact, simple synchronized methods that did some lazy initialization. Here's an example:

In this example, getting the messages can be relatively time consuming and expensive, and is often not necessary at all. That is, in most instances of the class, the getMessages() method is never invoked. There were a bunch of similar examples of optional things that are often not needed ... but can be heavily used in the cases where they are used.

It turns out that "uncontested" really means uncontested: You better be sure that no two threads are ever hitting synchronized methods of the same instance at the same time. I chatted with Brian at the Hacker Bed & Breakfast about this, and he explained that you can quickly go from "extremely cheap" to "asymptotically expensive" when there's any potential for contention. The synchronized keyword is very limited in one area: when exiting a synchronized block, all threads that are waiting for that lock must be unblocked, but only one of those threads gets to take the lock; all the others see that the lock is taken and go back to the blocked state. That's not just a lot of wasted processing cycles: often the context switch to unblock a thread also involves paging memory off the disk, and that's very, very, expensive.

Enter ReentrantReadWriteLock: this is an alternative that allows any number of readers to share a lock, but only a single writer. When a thread attempts to acquire the write lock, the thread blocks until all reader threads have released the read lock. The cost of managing the ReentrantReadWriteLock's state is somewhat higher than synchronized, but has the huge advantage of letting multiple reader threads operate simultaneously. That means much, much higher throughput.

In practice, this means you must acquire the shared read lock to look at a field, and acquire the write lock in order to change the field.

ReentrantReadWriteLock is smart about only waking the right thread or threads when either the read lock or the write lock is released. You don't see the same thrash you would with synchronized: if a thread is waiting for the write lock, and another thread releases it, ReentrantReadWriteLock will (likely) just unblock the one waiting thread.

Using synchronized is easy; with an explicit ReentrantReadWriteLock there's a lot more code to manage:

I like to avoid nested try ... finally blocks, so I broke it out into seperate methods.

Notice the "lock dance": it is not possible to acquire the write lock if any thread, even the current thread, has the read lock. This opens up a tiny window where some other thread might pop in, grab the write lock and initialize the messages field. That's why it is desirable to double check, once the write lock has been acquired, that the work has not already been done.

Also notice that things aren't quite symmetrical: with ReentrantReadWriteLock it is allowable for the current thread to acquire the read lock before releasing the write lock. This helps to minimize context switches when the write lock is released, though it isn't expressly necessary.

Is the conversion effort worth it? Well, so far, simply by converting synchronized to ReentrantReadWriteLock, adding a couple of additional caches (also using ReentrantReadWriteLock), plus some work optimizing garbage collection by expanding the eden space, we've seen some significant improvements; from 450 req/sec to 2000 req/sec ... and there's still a few minor hotspots to address. I think that's been worth a few hours of work!

Tuesday, June 05, 2012

Things I Learned at Hacker Bed & Breakfast

  • You can spend any amount of money on Scotch, and it keeps getting better
  • A Java instance field that is assigned exactly once via lazy initialization does not have to be synchronized or volatile (as long as you can accept race conditions across threads to assign to the field); this is from Rich Hickey
  • Scotch + Random Internet Images + Speakers == Fun; giving off the cuff talks on silly subjects, with random internet images as your "slides"
  • No one's that interested in Java anymore (at least, not at Hacker B&B)
  • Only ride a zip line backwards if you are prepared to accept the consequences
  • The linkage between honey and fried chicken has not yet penetrated into Research Triangle, despite my best efforts
  • The cost of using the synchronized keyword can go asymptotic with enough threads and contention, making ReentrantReadWriteLock much cheaper in comparison
  • Stu Halloway's house appears to have been designed by the same team that designed the Tardis, and is also bigger on the inside than the outside
  • It is much more efficient to just hand Brian Goetz the $50 buy in, rather than go through the motions of playing poker with him