Angly parsers specified nicely

I think it’s almost fair to Douglas Hofstadter to say that I lept over a Strange Loop this week. Or at least I evaded one.

Specifically, as I mentioned in my last post, I wanted a nicer language for specifying DFA languages. Uh oh! Define the language in terms of itself?  No: we just use the old ugly way of specifying DFAs one last time, in order to produce a DFA which can take DFA definitions and produce more DFAs.

Again, here’s the language to parse:

typedef graph<node<A, edge<t, B>, edge<u, C> >,
              node<B, edge<v, D> >,
              node<C, edge<w, E> >,
              node<D, edge<x, F> >,
              node<E, edge<y, F>, edge<z, G> > > adjacencies; 


Now, here’s the language definition:

typedef dfa<state<graph_nodes, mpl::true_, mpl::vector<>,
                  transition<graph_node, graph_nodes,
                             tmparser::match_token<node<> >, node_name,
                             mpl::push_back<mpl::_1, mpl::_2> > >,
                  transition<node_name_trans, node_edges,
                             mpl::always<mpl::true_>, void,
                             mpl::pair<mpl::_2, mpl::vector<> > > >,
            state<node_edges, mpl::true_,
                  transition<node_edge, node_edges,
                             tmparser::match_token<edge<> >, edge_name,
                                                      mpl::_2> > > >,
                  transition<edge_name_trans, edge_target,
                             mpl::always<mpl::true_>, void, mpl::_2> >,
                  transition<edge_target_trans, edge_done,
                             mpl::always<mpl::true_>, void,
                             mpl::pair<mpl::_1, mpl::_2> > >,
            state<edge_done, mpl::true_>
            > graph_dfa_desc;


Not too bad, right?  I mean, we’d rather be specifying regular expressions than creating the DFA by hand, but it’s hard to see how a parser for the graph metadata language could be much terser.

Let’s look at the start state, to see how the DFA works:

            state<graph_nodes, mpl::true_, mpl::vector<>,
                  transition<graph_node, graph_nodes,
                             tmparser::match_token<node<> >, node_name,
                             mpl::push_back<mpl::_1, mpl::_2> > >,


The start state graph_nodes is looping on node<> tokens. The optional second and third parameters specify that it is a finish state (the parser will #error out if it reaches the end of input and the current state is not a finish state), and the starting data: since we’re building data for mpl_graph::adjacency_list, this is an empty sequence of state vertices.

After the node parameters comes an optional sequence of transition<> tokens.  This state has one transition graph_node which loops back to the graph_nodes state. This transition will match any template expression wrapped in node<> and it will recurse into node_name (putting a new parser context frame on the explicit stack).  The transition specifies a semantic action to be executed after recursion is complete, which pushes the new state onto the vertex sequence.

See if you can figure out how the rest of the states and transitions work!

I’ll clean up the code and post it to the metagraph sandbox soon (under “angly”).

P.S. The same parser could be used to parse variadic runtime arguments as a regular expression (well almost) of argument types. That’s not where I’m going with this right now, but I’d be happy to show an example if there’s interest.

P.P.S. How slow is it? This example, creating the DFA parser, then parsing the graph DFA description, then parsing an example graph and running depth first search on it, takes 3 seconds using gcc 4.5 on my laptop. That works for me.

P.P.P.S. One more strange loop to cross: the structure of a composed type pattern is logically a composed type pattern!

Leave a Comment