Tapestry Training -- From The Source

Let me help you get your team up to speed in Tapestry ... fast. Visit howardlewisship.com for details on training, mentoring and support!

Friday, February 01, 2013

Crafting Code in Clojure

The other day, I was working on a little bit of code in Clojure, just touching up some exception reporting, when I was suddenly struck by one of the fundamental reasons that Clojure is so enjoyable to code in. Clojure is craftable: that is, in Clojure you have the option to craft at your code to make it more concise, easier to read, and easier to maintain. That is not the case for all, or perhaps even most, programming languages.

In my case, I was constructing an error message where I needed to convert the keys of two maps into a comma-seperated string (I don't like to say "you guessed wrong" without saying "here's what you could have said").

What I want my code to do is easily expressed as an informal recipe:

  • Extract all the keys from both maps
  • Remove any duplicates
  • Convert the keys to strings
  • Sort the strings into ascending order
  • Build and return one big string, by concatinating all the key strings, using ", " as a seperator
  • Return "<none>" if both maps are empty

If I was writing this in Java, it would look something like this:

There's enough looping and conditionals in this code (along with tip-toeing around Java Generics) that its easier to look at its test specifiction (written in Spock) to see what it is supposed to do:

The first pass at a Clojure version is already simpler than the Java version ...

I couldn't resist using the clojure.string/join function, rather than building the string directly (which would be slightly tedious in Clojure). In many ways, this is a lot like the Java version; we're using let to create local symbols for each step in the process in just the same way that the Java version defines local variables for each step.

However, there's room for improvement here. Let's start to craft.

For example, let's assume that both maps being empty is rare, or at least, that the cost of sorting an empty list is low (it is!). Our code becomes much more readable if we merge it into one big let:

Now we're getting somewhere. I think this version makes it much more clear what is going on that the prior Clojure version, or the Java version.

However, if you've written enough code, you know one of the basic rules of all programming: names are hard. Anything that frees you from having to come up with names is generally a Good Thing. In Java, we have endless names: not just for methods and variables, but for classes and interfaces ... even packages. Long years of coding Java has made me dread naming things, because names never quite encompass what a thing does, and often become outdated as code evolves.

So, what names can we get rid of, and how? Well, if we look at the structure of our code, we can see that each step creates a value that is passed to the next expression as the final parameter. So all-keys is passed as the last parameter of the (map) expression, resulting in key-names, and then key-names is passed as the last parameter of the (sort) expression. In fact, ignoring the empty check for a moment, the sorted-names value is passed to the (s/join) expression as the last parameter as well.

This is a very important concept in Clojure; you may have heard people trying to express that you code in Clojure in terms of a "flow" of data through a series of expressions. We'll, you've just seen a very small example of this.

In fact, it is no simple coincidence that the last parameter is so important; this represents a careful and reasoned alignment of the parameters of many different functions in clojure.core and elsewhere, to ensure that flow can be passed as that final parameter, because it becomes central to the ability to combine functions and expressions together with minimal fuss.

We can use the ->> macro (pronounced "thread last") to rebuild our flow without having to come up with names for each step:

The ->> macro juggles our expressions into an appropriate order; without it we'd have to deeply nest our expressions in an unreadable way: (sort (map str (set (concat (keys map1) (keys map2))))). Even with a short flow of expressions, that's hard to parse and interpret, so ->> is an invaluable and frequently used tool in the Clojure toolbox.

We can continue to craft; the first expression (that builds the set from the keys), can itself be broken apart into a few smaller steps. This is really to get us ready to do something a bit more dramatic:

This is getting ever closer to our original recipe; you can more clearly see the extraction of keys from the maps before building the set (which is only used to ensure key uniqueness), before continuing on to convert keys from objects to strings, sort them, and combine the final result.

In fact, we're going to go beyond our original brief, and support any number of input maps, not just two:

The mapcat function is like map, but expects that each invocation will create a collection; mapcat concatinates all those collections together ... just what we want to assemble a collection of all the keys of all the input maps.

At this point, we don't have much more to go ... but can we get rid of the sorted-names symbol? In fact, we can: what if part of our flow replaced the empty list with a list containing just the string "<none>"? It would look like this:

... and that's about as far as I care to take it; a clean flow starting with the maps, and going through a series of expressions to transform those input maps into a final result. But what's really important here is just how fast and easy it is to start with an idea in Clojure and refine it from something clumsy (such as the initial too-much-like-Java version) into something elegant and surgically precise, such as the final version.

That's simply not something you can do in less expressive languages such as Java. For example, Tapestry certainly does quite a number of wonderful things, and supports some very concise and elegant code (especially in green code) ... but that is the result of organizing large amounts of code in service of specific goals. We're talking tons of interfaces, a complete Inversion-Of-Control container, and runtime bytecode manipulation to support that level of conciseness. That's the hallmark of a quite consequential framework.

That isn't crafting code; that's a big engineering effort. It isn't local and invisible, it tends to be global and intrusive.

In Java, your only approach to simplifying code in one place is build up a lot of complexity somewhere else.

That is simply not the case in Clojure; by adopting, leveraging, and extending the wonderful patterns already present in the language and its carefully designed standard library, you can reach a high level of readability. You are no longer coding to make the compiler happy, you are in control, because the Clojure languge gives you the tools you need to be in control. And that can be intoxicating.

The source code for this blog post is available on GitHub.

12 comments:

Unknown said...
This comment has been removed by a blog administrator.
Unknown said...

you could use not-empty to get rid of the replace-empty all together as well

https://gist.github.com/4694924

Unknown said...

@Dmiitri,

That's a nice solution as well (probably better than mine). That being said, I like how my article stands because it does demonstrate how easy it is to add your own logic into a flow, largely by following the last parameter convention.

This scenario does happen a bit: a piece you need is already present in core, but easy to miss. Everyone who'se done http://www.4clojure.com/ exercises has seen this when their carefully crafted, super-concise solution ends up twice as large as Chouser's.

Mithaldu said...

When you say "craft" you mean another word that's been around a long time: Golf. What you did is golf your code. Cheers on discovering the usefulness and fun of this old pastime. :)

d.a. said...

Once upon a time, I would have been impressed by that - but the same thing in something like Python looks like this:

def DictKeysToString(x, y):
keys = sorted(set(x).union(y))
return ', '.join(keys) or 'none'

# e.g.
x = {'a': 1}
y = {'a': 2, 'b': 3}
DictKeysToString(x, y)
# returns 'a, b'

Extending this to take an iterable of dicts is pretty easy as well.

I really miss that clean, obvious terseness in other languages I use.

mrjbq7 said...

I implemented a version in Factor, to show how similar and a little different it is.

Unknown said...

Hey Howard,

This was a great read. Thanks for posting it!

Once I read the brief, I fired up IRB and implemented this with Ruby as I was curious to see how my result would compare to Clojure's. This isn't necessarily idomatic, but was fun nonetheless: https://gist.github.com/4696154

The Clojure solution is very elegant. Definitely going to spend some more time with Clojure over the next year or so.

Josh said...

If you use

List sortableKeys = new ArrayList(allKeys);

and

return StringUtils.join(sortableKeys, ",").toString();

you can halve the size of the Java code. :|

Vasily said...

F# https://gist.github.com/4697027

Unknown said...

Wow! This is so Monadic, I really missed that from Haskell that in fact you opened my eyes with this blog post. Thank you for taking the time to share something so important as the function sequencer (->> a.k.a. >>=) n_n

Elf Sternberg said...

@Mithaldu: By "long time" you mean "about thirteen years," since the term "code golf" entered the lexicon in 1999.

I prefer the idea of craftsmanship. It's not about the fewest keystrokes: it's about an expression that's both concise and clear. I think Howard's done that here.

Michael Snell said...

That Java code is so poor it's difficult not to see it as a straw man.

Nobody should be writing String joining code inline - at the very least, this would be in a util class (if for some reason you don't want to use one of the many libs that provide this).

And using TreeSet instead of HashSet would give you the sorting for free without the need for a temporary ArrayList.