Tuesday, June 08, 2010

Who Wants The Func? Gotta Have That Func!

I've been entranced by the concept of laziness since I first really considered it while teaching myself a bit of Haskell. Laziness is the idea that no computation takes place until it is actually needed ... an idea that is common in the functional programming world and one that works best with immutable data.

Why immutable? This has been covered extensively elsewhere, but the gist is that when you have any kind of mutable data (any field that can ever change its value), you add time as an invisible input to your expressions. Literally, the time that any single expression is evaluated relative to other changes to mutable state will affect the outcome of the expression, often in non-predictable ways. In the mathematical world, a function will always return the same value for the same inputs ... in the fuzzy, dirty world of Object Orientation, a method invocation may return different values at different times based on mutable state. Not necessarily mutable state in the object being invoked, but in some other object, somewhere, that the invoked object depends on.

This is why parallel programming in the OO world seems so hard. It requires locks on mutable data, which brings its own problems, such as deadlocks. It can feel like a tottering house of cards.

But remove mutability from this picture and an entirely different world emerges. Functions do behave as functions; same inputs: same result. Side effects disappear, because there's no mutable state. Evaluation of expressions is no longer linked to time: it can be evaluated in parallel threads, or can be deferred until absolutely needed.

That last bit is the laziness. Laziness is a way to bootstrap your code up to a simpler, clearer expression of your algorithms ... once you embrace laziness, you can see that a good amount of the code you write (using mutable data especially) is a case of premature optimization.

Back to Tapestry; as far back as Tapestry 4.0 (where HiveMind and the use of Inversion of Control and Dependency Injection where introduced), Tapestry's internal code has had many functional characteristics. The base unit of work in the Tapestry IoC container is an interface, not a function ... but often those interfaces have a single method. That makes the interfaces look a lot like a function, ready for the kind of composition possible in a functional programming language. Sure, it's a bit clumsy compared to a real functional programming language ... but the power is still there.

Tapestry 5 uses these features to handle a lot of Aspect Oriented Programming concepts; for instance, services are lazily instantiated, and they can be decorated and advised to provide cross-cutting concerns. In fact, Tapestry uses functional composition extensively for all kinds of meta-programming.

Meanwhile outside the realm of Tapestry, my exposure to Clojure has really sold me on the functional approach, and I take to immutable data structures like a warm, comforting blanket. I miss all that when I'm working with ordinary Lists and Sets from Collections API.

Given that Tapestry does a lot of complex things, I started work on a simple functional library. What I've created is not nearly as complex as Functional Java; I think it does less, but does it more cleanly. It's more focused.

The idea is that you'll create a Flow from some source (usually, a Collection). You can then map, filter, reduce, append, concatenate, and iterate the values inside the Flow. Further, the Flows are lazy (as with Haskell and Clojure); all evaluation is deferred until absolutely necessary, and it's thread safe. You can also have infinite flows.

It all starts in a static class F (for Functional) that has the initial factories for Flows. This example uses the F.range() method to create a Flow for a range of integers:

System.out.println(F.range(1, 100).filter(F.gt(10)).map(
  new Mapper() {
   public Integer map(Integer value) {
    return value * 2;
   }
  }).take(3).toList());

When executed, this code prints the following: [22, 24, 26]. That is, it dropped the values less than or equal to 10; it then multiplied each remaining value by 2 and converted the first three to a list.

  • F.range() creates a lazy Flow of integers in the range (from 1 to 99; the upper range is non-inclusive).
  • filter() is a Flow method that keeps only some values, based on a Predicate
  • F.gt() is a static factory method, it creates a Predicate that compares a Number value from the Flow against the provided value
  • map() is a Flow method that is applied to each value in the Flow
  • take() takes a limited number of values from the front of the Flow
  • toList() converts the Flow into a non-modifiable List

Here we mapped from Integer to Integer, but it would have been possible to map to a different type. At each stage, a new (immutable) Flow object is created.

What about laziness? Well if we modify the code a bit:

System.out.println(F.range(1, 100).filter(F.gt(10)).map(
  new Mapper() {
   public Integer map(Integer value) {
    System.out.println("** mapping " + value);
    return value * 2;
   }
  }).take(3).toList());
The new output is:
** mapping 11
** mapping 12
** mapping 13
[22, 24, 26]

... in other words, although we write code that makes it appear that the entire Flow is transformed by the map() call, the reality is that individual values from the original flow are mapped, just once, as needed. The code we write focuses on the flow of transformations, from the input, to the final result: "start with the range, retain the values greather than 10, multiply each by two, keep just the first three".

Does this make a difference? Not with trivial cases like this example. The functional code could be rewritten in standard Java as:

List<Integer> result = new ArrayList<Integer>();
for (int i = 1; i < 100; i++)
{
  if (i >= 10)
  {
    result.add(i * 2);
  }
}

result = result.subList(0, 3);

System.out.println(result);

Yes, this code is shorter, but it does more work (computing many doubles values that are not needed). We could do some extra work to keep a count of result values and end the loop earlier, but that increases the cyclomatic complexity even further. The extra work is not a big deal here, but if the transformations were more expensive (say, re-drawing images in different sizes, or reading data from a database) the work not done unnecessarily would become quite significant.

And is the traditional Java code really shorter? What if we create a reusable factory function:

public static Mapper<Integer,Integer> multiplyBy(final int multiplicand)
{
  return new Mapper<Integer, Integer>() 
  {
    public Integer map(Integer value)
    {
      return value * multiplicand;
    }
  };
}

Then our original Flow expressions becomes:

System.out.println(F.range(1, 100).filter(F.gt(10)).map(multiplyBy(2)).take(3).toList());

... meaning that, once we have a good collection of these Mapper and Predicate factory methods, we can have the efficient, lazy code and it will be more concise and readable.

Anyway, tapestry-func is a work in progress, but it's very promising, and already being used in both Tapestry 5.2, and in some of my clients' code.

9 comments:

  1. Have you look at op4j (http://www.op4j.org/)? It looks very similar with your approach with a lot of prepared mappers/functions

    ReplyDelete
  2. Op4j looks quite nice ... I'm not sure it's lazy, and I'm trying to reduce the number of outside dependencies for Tapestry. Otherwise, I'd consider it.

    ReplyDelete
  3. Hi Howard, you might want to fix this: "Side effects disappear, because there's no immutable state" :-)

    ReplyDelete
  4. Forgive me for asking, but doesn't this duplicate some of the functionality in Google Collections (especially the Iterables and Functions classes)?

    ReplyDelete
  5. @Robert:

    Yes.

    See comment r.e. Op4j.

    ReplyDelete
  6. what you have is precisely what Google Collections does (and i guess op4j). Yet you want to ignore it....

    you have to agree that it's a little bit un-lazy. but as long as you're happy to keep enhancing tapestry, we can't complain too much. :) :)

    cheers

    ReplyDelete
  7. Really interesting, tapestry is getting more and more 'functionnal'

    Just for information (if we don't think about lesser dependencies) , have you already tried to make a bridge between Tapestry and Clojure to handle this kind of 'funtionnal' things.

    ReplyDelete
  8. @Christophe

    Back, way, back, before I coded OO I worked in C and PL/1. This predates Java! Even so, I did a lot with void pointers for data encapsulation, and various kinds of look-up-tables to handle what we would call inheritance now. The point was, I was using a limited language in a clumsy way to unknowingly emulate a higher-level language.

    I've been gradually doing the same thing in Java since about 2003, 2004 and it's only accelerated as I've learned more about FP ... it's given me more of a target to blindly stumble towards.

    However, Tapestry + Clojure simply doesn't make sense to me ... Tapestry is all about managing state, and Clojure is almost entirely about avoiding state.

    What I've done with Cascade is to revisit Tapestry's core values within an overall Clojure approach. For example, templates are defined in terms of Clojure syntax (via macros) rather than being external XML files.

    ReplyDelete
  9. I have a question. Does the F.gt(10) method only work with Number or it is a generic method that would work on any Comparable and you just mention Number in the example because this is the concrete case. It seems to me that such a method can easily be implemented to return a generic predicate but there may be other considerations (like performance) that made you reduce the functionality to Number.

    Also I would like to point out that you are incorrectly referring to object orientation as the reason for mutability. There is nothing in the OO school of thought that says you should have mutable objects. OO teaches us how to separate and reuse code and concepts like objects, classes and polymorphism work pretty well in the functional world. The real source of mutability is the imperative approach. OO itself works equally well with imperative and functional programming.

    ReplyDelete

Please note that this is not a support forum for Tapestry. Requests for help will be deleted. Please subscribe to the Tapestry user mailing list if you are in need of support, or contact me directly for professional (for pay) support.

Spammers: Don't bother. I delete your comments and it's a waste of time for both of us. 垃圾邮件发送者:不要打扰。我删除您的评论和它的时间对我们双方的浪费