Monday, July 02, 2012

You Cannot Correctly Represent Change Without Immutability

The title of this blog post is a quote by Rich Hickey, talking about the Datomic database. Its a beautiful statement, at once illuminating and paradoxical. It drives at the heart of the design of both Clojure and Datomic, and embraces the difference between identity and state.

What is change? That seems like an obvious question, but my first attempt at defining it was "some change to a quantifiable set of qualities about some object." Woops, I used change recursively there ... that's not going to help.

In the real world, things change in ways we can observe; the leaf falls from the tree, the water in the pot boils, the minute hand moves ever forward.

How do we recognize that things have changed? We can, in our memories, remember a prior state. We remember when the leaf was green and attached to a branch; we remember when the water came out of the tap, and we remember looking at the clock a few minutes ago. We can hold both states in our mind at the same time, and compare them.

How do we represent change in traditional, object-oriented technologies? Well, we have fields (or columnus) and we change the state in place:

  • leaf.setColor(BROWN).detachFromTree()
  • UPDATE LEAVES SET COLOR = 'BROWN' WHERE ID = ?ID
  • water.setTemperature(212)
  • or we see time advancing via System.currentTimeMillis()

Here's the challenge: given an object, how do you ask it about its prior state? Can you ask leaf.getTreeDetachedFrom()? Generally, you can't unless you've gone to some herculean effort: the new state overwrites the old state in place.

When Rich talks about conflating state with identity, this is what he means. With the identity and state conflated, then after the change in state, the leaf will now-have-always-been fallen from the tree, the water will now-have-always-been boiled, and the clock will now-eternally be at 9:49 AM.

What Clojure does in memory, and Datomic does in the database, is split identity and state. We end up with leaf1 as {:id "a317a439-50bb-4d37-838a-c8eef289e22f" :color :green :attached-to maple-tree} and leaf2 as {:id "a317a439-50bb-4d37-838a-c8eef289e22f" :color :brown :on-ground true}. The id is the same, but the other attributes can vary.

With immutability, changes in state are really new objects; a new version, or "quantifiable set of qualities", that does not affect the original version. It is possible to compare two different iterations of the same object to see the "deltas". In Datomic, you even have more meta-data about when such state changes occur, what else changed within the same transaction, and who is the responsible party for that transaction.

The essence here is not to think of an object as a set of slots you can put new data into. Instead, think of it as a time-line of different configurations of the object. The fact that late in the time-line, the leaf has fallen from the tree does not affect the fact that earlier on the time-line, the leaf was a bud on a branch. The identity of the leaf transcends all those different states.

In the past, I've built systems that required some of the features that Datomic provides; for example, being able to reconstruct the state of the entire database at some prior time, and strong auditing of what changes occurred to what entities at a specific time (or transaction). Rich knows that others have hit this class of problem; part of his selling point is to ask "and who really understands that query" (the one that reconstructs prior state). He knows people have done it, but he also knows no one is very happy about its performance, correctness, or maintainability ... precisely because traditional databases don't understand mutability: they live in that eternal-now, and drag your application into the same world view.

That's why I'm excited by Datomic; it embraces this key idea: separate identity from state by leveraging immutability and from the ensuing design, much goodness is an automatic by-product. Suddenly, we start seeing much of what we take as dogma when developing database-driven applications to be kludges on top of an unstable central idea: mutable state.

For example: read transactions are a way to gain stable view of interrelated data even as the data is being changed (in place); with Datomic, you always have a stable view of all data, because you operate on an immutable view of the entire database at some instance in time. Other transactions may add, change, or replace Datoms in the database, but any code that is reading from the database will be completely unaware of those changes, even as they lazily navigate around the entire database.

8 comments:

  1. You forgot about retrodiction. Our memories in light of new events.

    Just as objects with slots holding "current" state is a simplification, so is trackable state changes without retrodiction.

    ReplyDelete
  2. "In the real world, ..."

    Well, that is the 1 million fiat money question, isn't it? The "real world" computational engine is 'conscious' and pretty darn complex. Are we making a systemic error in aspiring to the holy grail of "modeling according to the standing model of the real world"?

    Clearly, if we intend to pursue the questionable grail goal, we need far more rigor in our analytical efforts (i.e. along the lines of the effectively philosophical musings of RH).

    ReplyDelete
  3. Sorry, I should have said that our memories change in the light of new events. We do it all the time.

    ReplyDelete
  4. Seem to have gone off track here; the point is not to model memory the way humans do, but to model change the way the universe does (which we can only appreciate via our limited senses and imperfect memory).

    Datomic is about modeling complex, interrelated data so that nothing is lost, including changes to existing data. What I hinted at towards the end was that Datomic subsumes some of the other technologies used to "tame" traditional databases; you will likely have very little use for, say, Memcached in a Datomic database.

    Datomic is NOT about creating an artificial consciousness!

    ReplyDelete
  5. Grrr... i entered my comment but then it deleted it after I signed in...

    anyway, it was about how reading this post made me think of version-control systems like git.. there are some conceptual similarities there, though I'm not sure whether there's anything to be made of that.. thoughts?

    ReplyDelete
  6. James -- you are right on. The first comparison Stu and Rich make for Datomic is to Git; unfortunately, people who understand Git are still a bit in the minority.

    Both are systems dedicated to tracking changes over time, without losing an prior state. Git, like Datomic, also thinks in terms of moving the data to the "client" rather than execute queries on the "server" (but the metaphor breaks down after that, as Git considers all repositories to be equal and complete; Datomic can page relevant data from the central database as needed).

    ReplyDelete
  7. It's not really fair to say this is too hard with "traditional databases", Richard Snodgras's book Developing Time-Oriented Database Applications in SQL discussed doing this sort of thing in portable SQL in 1999. I highly recommend the book as a great way to understand the differt kinds of time-orientation in databases.

    I'm not saying that Datomic isn't cool, or even better... but this sort of thing has been described in code for a while :)

    ReplyDelete
  8. True; I built a system in SQL (Oracle) with some of these features back in the 90's. It also modelled transactions as a database entity; on the other hand, it had a history table for every main table, and stored procedures to keep everything straight in the database, and complex queries to resolve historic state. Rich is always pointing out "who's written that query before ... and who want's to never do that again?" (or something to that effect).

    ReplyDelete

Please note that this is not a support forum for Tapestry. Requests for help will be deleted. Please subscribe to the Tapestry user mailing list if you are in need of support, or contact me directly for professional (for pay) support.

Spammers: Don't bother. I delete your comments and it's a waste of time for both of us. 垃圾邮件发送者:不要打扰。我删除您的评论和它的时间对我们双方的浪费