Tapestry Training -- From The Source

Let me help you get your team up to speed in Tapestry ... fast. Visit howardlewisship.com for details on training, mentoring and support!

Friday, March 09, 2012

Node and Callbacks

One of the fears people have with Node is the callback model. Node operates as a single thread: you must never do any work, especially any I/O, that blocks, because with only a single thread of execution, any block will block the entire process.

Instead, everything is organized around callbacks: you ask an API to do some work, and it invokes a callback function you provide when the work completes, at some later time. There are some significant tradeoffs here ... on the one hand, the traditional Java Servlet API approach involves multiple threads and mutable state in those threads, and often those threads are in a blocked state while I/O (typically, communicating with a database) is in progress. However, multiple threads and mutable data means locks, deadlocks, and all the other unwanted complexity that comes with it.

By contrast, Node is a single thread, and as long as you play by the rules, all the complexity of dealing with mutable data goes away. You don't, for example, save data to your database, wait for it to complete, then return a status message over the wire: you save data to your database, passing a callback. Some time later, when the data is actually saved, your callback is invoked, and which point you can return your status message. It's certainly a trade-off: some of the local code is more complicated and bit harder to grasp, but the overall architecture can be lightening fast, stable, and scalable ... as long as everyone plays by the rules.

Still the callback approach makes people nervous, because deeply nested callbacks can be hard to follow. I've seen this when teaching Ajax as part of my Tapestry Workshop.

I'm just getting started with Node, but I'm building an application that is very client-centered; the Node server mostly exposes a stateless, restful API. In that model, the Node server doesn't do anything too complicated that requires nested callbacks, and that's nice. You basically figure out a single operation based on the URL and query parameters, execute some logic, and have the callback send a response.

There's still a few places where you might need an extra level of callbacks. For example, I have a (temporary) API for creating a bunch of test data, at the URL /api/create-test-data. I want to create 100 new Quiz objects in the database, then once they are all created, return a list of all the Quiz objects in the database. Here's the code:

var Quiz, schema, sendJSON;
schema = require("../schema");
Quiz = schema.Quiz;
sendJSON = function(res, json) {
res.contentType("text/json");
return res.send(JSON.stringify(json));
};
module.exports = function(app) {
app.get("/api/quizzes", function(req, res) {
return Quiz.find({}, function(err, docs) {
if (err) throw err;
return sendJSON(res, docs);
});
});
app["delete"]("/api/quizzes/:id", function(req, res) {
console.log("Deleting quiz " + req.params.id);
return Quiz.remove({
_id: req.params.id
}, function(err) {
if (err) throw err;
return sendJSON(res, {
result: "ok"
});
});
});
return app.get("/api/create-test-data", function(req, res) {
var i, keepCount, remaining, _results;
remaining = 100;
keepCount = function(err) {
if (err) throw err;
remaining--;
if (remaining === 0) {
return Quiz.find({}, function(err, docs) {
if (err) throw err;
return sendJSON(res, docs);
});
}
};
_results = [];
for (i = 1; 1 <= remaining ? i <= remaining : i >= remaining; 1 <= remaining ? i++ : i--) {
_results.push(new Quiz({
title: "Test Quiz \# " + i
}).save(keepCount));
}
return _results;
});
};
view raw api.js hosted with ❤ by GitHub

It should be pretty easy to pick out the logic for creating test data at the end. This is normal Node JavaScript but if it looks a little odd, it's because it's actually decompiled CoffeeScript. For me, the first rule of coding Node is always code in CoffeeScript! In its original form, the nesting of the callbacks is a bit more palatable:

# Exports a single function that is passed the application object, to configure
# its routes
schema = require "../schema"
Quiz = schema.Quiz
sendJSON = (res, json) ->
res.contentType "text/json"
# TODO: It would be cool to prettify this in development mode
res.send JSON.stringify(json)
module.exports = (app) ->
app.get "/api/quizzes",
(req, res) ->
Quiz.find {}, (err, docs) ->
throw err if err
sendJSON res, docs
app.delete "/api/quizzes/:id",
(req, res) ->
console.log "Deleting quiz #{req.params.id}"
# very dangerous! Need to add some permissions checking
Quiz.remove { _id: req.params.id }, (err) ->
throw err if err
sendJSON res, { result: "ok" }
app.get "/api/create-test-data",
(req, res) ->
remaining = 100
keepCount = (err) ->
throw err if err
remaining--
if (remaining == 0)
Quiz.find {}, (err, docs) ->
throw err if err
sendJSON res, docs
for i in [1..remaining]
new Quiz(title: "Test Quiz \# #{i}").save keepCount
view raw api.coffee hosted with ❤ by GitHub

What you have there is a count, remaining, and a single callback that is invoked for each Quiz object that is saved. When that count hits zero (we only expect each callback to be invoked once), it is safe to query the database and, in the callback from that query, send a final response. Notice the slightly odd structure, where we tend to define the final step (doing the final query and sending the response) first, then layer on top of that the code that does the work of adding Quiz objects, with the callback that figures out when all the objects have been created.

The CoffeeScript makes this a bit easier to follow, but between the ordering of the code, and the three levels of callbacks, it is far from perfect, so I thought I'd come up with a simple solution for managing things more sensibly. Note that I'm 100% certain that this issue has been tackled by any number of developers previously ... I'm using the excuse of getting comfortable with Node and CoffeeScript as an excuse to embrace some Not Invented Here syndrome. Here's my first pass:

event = require "events"
_ = require "underscore"
# Helps to organize callbacks. At this time, it breaks normal
# conventions and makes not attempt to catch errors or fire an 'error'
# event.
class Flow extends event.EventEmitter
constructor: ->
@count = 0
# Array of zero-arg functions that invoke join callbacks
@joins = []
invokeJoins: ->
# The join callbacks may add further callbacks or further join
# callbacks, but that only affects future completions.
joins = @joins
@joins = []
join.call(null) for join in joins
@emit 'join', this
checkForJoin: ->
@invokeJoins() if --@count == 0
# Adds a callaback and returns a function that will invoke the
# callback. Adding a callback increases the count. The count is
# decreased after the callback is invoked. Callbacks are invoked
# with this set to null. Join callbacks are invoked when the count
# reaches zero. Callbacks should be added before join callbacks are
# added.
add: (callback) ->
# One more callback until we can invoke join callbacks
@count++
(args...) =>
callback.apply null, args...
@checkForJoin()
# Adds a join callback, which will be invoked after all previously
# added callbacks have been invoked. Join callbacks are invoked with
# this set to null and no arguments. Emits a 'join' event, passing
# this Flow, after invoking any explicitly added join callbacks.
# Invokes the callback immediately if there are no outstanding
# callbacks.
join: (callback) ->
@joins.push callback
@invokeJoins() if @count == 0
# TODO:
# sub flows (for executing related tasks in parallel)
module.exports = Flow
view raw flow.coffee hosted with ❤ by GitHub

The Flow object is a kind of factory for callback wrappers; you pass it a callback and it returns a new callback that you can pass to the other APIs. Once all callbacks that have been added have been invoked, the join callbacks are invoked after each of the other callbacks have been invoked. In other words, the callbacks are invoked in parallel (well, at least, in no particular order), and the join callback is invoked only after all the other callbacks have been invoked.

In practice, this simplifies the code quite a bit:

app.get "/api/create-test-data",
(req, res) ->
flow = new Flow
for i in [1..100]
quiz = new Quiz
title: "Test Quiz \# #{i}"
location: "Undisclosed"
quiz.save flow.add (err) ->
throw err if err
flow.join ->
Quiz.find {}, (err, docs) ->
throw err if err
sendJSON res, docs
view raw api.coffee hosted with ❤ by GitHub

So instead of quiz.save (err) -> ... it becomes quiz.save flow.add (err) -> ..., or in straight JavaScript: quiz.save(flow.add(function(err) { ... })).

So things are fun; I'm actually enjoying Node and CoffeeScript at least as much as I enjoy Clojure; which is nice because it's been years (if ever) since I've enjoyed the actual coding in Java (though I've liked the results of my coding, of course).