... at least for me. I'm working on some more real-world code in Clojure, and I'm struggling with turning a stream of tokens (representing XML content parsed from a stream) into a DOM-like hierarchy. The tokens represent open tags, attributes, close tags, and text content.
I've written dozens of recursive decent compilers in the past, in Pascal, C, PL/1 and Java. The problem is, that style of compiler works by consuming a stream (of characters or tokens) ... a destructive operation.
Logically, I want code such as:
List parseTemplate(tokens) { result = [] while (tokens not empty) { token = pop(tokens) if (token.type == text) { add text token to result; continue; } if (token.type == element) { e = parse_element (token, tokens) add e to result; continue; } }
Now, in Clojure I can write a (parse-element) function that takes a bunch of tokens, and it can work its way through the tokens using Clojure's (loop) and (recur) forms ... but to the calling function, tokens will not have changed.
One approach might be to pass along some pretty complex state; literally a state machine (inside a map) so that each function can keep push and popping values into and out of the state before invoking each. This would involve recursing on each token. Anyway, I have a rough idea of what that would look like and it seems complicated.
Instead, I've been pounding my head, looking at Clojure's support for monads. Monad are a way to combine certain types of functions together in a way that appears as if there were mutable state. You provide some base types of monadic functions and, within the context of a monad definition, it streamlines the way those basic monadic functions can be strung together to form more complex monadic functions.
Still trying to wrap my head around the concept however. It gets really confusing because the real power of monads appears when the monadic values, the things returned from a monadic function, are themselves functions. Yep, that's where I get lost too.
Update: passed the initial Monad hurdle; I'm finally beginning to understand them, perhaps even enough to explain them to other people.
Do you know about this?: http://kotka.de/projects/clojure/parser.html
ReplyDeleteI haven't used it (indeed, I don't use Clojure) but it looks like the obvious way to parse in Clojure.
Also it looks like the solution you feel most comfortable with, is similar (exactly?) how the clojure.xml parser works.
ReplyDeleteActually, clojure.xml/parse runs a SAXParser instance and it's full of vars, (set!), (binding), etc.
ReplyDeleteI actually used that as the basis for a more namespace-aware parser that emits the XML token stream.
I could probably accomplish more of what I want in my parser, but I really want to isolate the non-functional, SAX-based portion and keep the rest pure Clojure, and I want the eventual token-to-tree parser to be truly functional. If I can wrap my head around the state and maybe monads, I think I'll have something.
i don't know if this applies to Clojure but is a very easy reading about Monadic Parsing (asumes no Monads knowledge )
ReplyDeletehttp://cs.nott.ac.uk/~gmh/monparsing.pdf
I think you're trying to use parser combinators, of which Kotka is an implementation for Clojure.
ReplyDeleteIf you're on the JVM Scala provides good support, though not very well documented, for parser combinators.
This is a very good introduction:
http://www.codecommit.com/blog/scala/formal-language-processing-in-scala