One idea I had was to rework how interceptors work, so that all the interceptors would work together to make a single interceptor class. Didn't seem like the right approach, since it would be a lot of work, and I think it will be rare for a service to have more than two interceptors. Hard to test, hard to implement, no assurance of a return.
Next, I considered having each interceptor subclass from the previous; this would allow calls to use super.foo()
rather than _inner.foo()
, which would be more efficient (a super-class invocation rather than an interface invocation).
Then I realized that I always have the actual type and instance of each interceptor (or the core implementation) ... which means I can build interceptors in terms of concrete classes rather than interfaces.
To see if there was any benefit, I needed to test. I extended my test harness to add to new Adder services; the first is like the Singleton service, but has one NullInterceptor (this interceptor simply calls through to the next object). The second has two Interceptors. First run (JDK 1.4, Hotspot server) showed the cost of those interceptors:
Run | Bean | Interface | Singleton | Deferred | Threaded | One Interceptor | Two Interceptors |
---|---|---|---|---|---|---|---|
Run #1 | 211 | 390 | 2183 | 2824 | 3185 | 2393 | 9174 |
Run #2 | 250 | 2343 | 2324 | 2864 | 3014 | 2434 | 9203 |
Run #3 | 240 | 2354 | 2323 | 2844 | 3054 | 2394 | 9253 |
Run #4 | 241 | 2333 | 2353 | 2824 | 3045 | 2403 | 9183 |
Run #5 | 231 | 2353 | 2333 | 2825 | 3064 | 2383 | 9194 |
Compare column "Singleton" to "One Interceptor" and "Two Interceptors". I don't exactly have a theory for why the difference between one and two interceptors is so large. Also, the value is red is not a mistake, we've seen it before ... that appears to be Hotspot making an early optimization when there is only a single implementation of the interface; later it must go back and rewrite things when new implementations show up and it can no longer be certain that interface test.Adder is always class test.impl.AdderImpl.
I then reworked the interceptor code to use the actual class to call the next inner-most (interceptor or implementation). This is possible because of the order of contstruction: the core implementation is constructed first, then the lowest order interceptor is wrapped around it, the the next lowest order interceptor, and so forth. This time there was a dramatic change:
Run | Bean | Interface | Singleton | Deferred | Threaded | One Interceptor | Two Interceptors |
---|---|---|---|---|---|---|---|
Run #1 | 210 | 361 | 1993 | 2593 | 2925 | 2203 | 2233 |
Run #2 | 210 | 2153 | 2144 | 2613 | 2804 | 2194 | 2213 |
Run #3 | 220 | 2143 | 2153 | 2594 | 2784 | 2203 | 2203 |
Run #4 | 221 | 2143 | 2153 | 2583 | 2804 | 2194 | 2193 |
Run #5 | 220 | 2143 | 2153 | 2574 | 2814 | 2193 | 2213 |
And so we see the desired outcome; adding interceptors is now a small, incremental cost.
Why is this important? We often hear about the 80/20 rule, that 80% of the performance problems are in 20% of the code. My concern is that all the extra layers of interface invocations will become a visible factor, cumulatively ... but because all those calls are so completely, widely distributed, they will become impossible to track down. Sort of grit in the machinery ... no single, large point of failure, but a kind of overall friction in the works.
Given just how many iterations of my code loop I need to get measurable results (50,000,000), that's probably not a large concern ... still it's nice to claim I've optimized HiveMind for performance.
No comments:
Post a Comment