Now Parsing Angle Brackets

As always, C++Now / BoostCon back in May was a blast and I learned a lot. About a dozen people came to my talk; what was really incredible is that I think everyone knew exactly what I was talking about.

Now, I didn’t manage to give the weirdest talk at C++Now this year – that honor goes to Ábel Sinkovics and Zoltan Porkolab’s Metaparse (video). They demonstrated a parser and interpreter of a subset of Haskell that runs inside the C++ compiler. Yow! As a lot of people have pointed out, the pure functional Haskell programming language could provide a nicer way to do metaprogramming in C++, but this is the closest I know of anyone actually implementing it.

In my Angly talk, I ended up presenting almost the entire library – it’s just a few hundred lines – and along the way compared my approach for “reparsing” angle bracket expressions with the methods used in Metaparse for compile-time strings, and Eric Niebler’s Proto for reparsing expressions.

Here is the video. And here are the slides.

I will have to try harder to produce the weirdest talk next year. (Will it be Metagraph itself?) But this was a real breakthrough for the depth of conversation in the room on that day, and the thoughts that followed.

The most important thing I learned is that the method I used (a deterministic pushdown automaton) is much more complicated and not as powerful as the non-deterministic recursive descent parsers used by Proto and Metaparse… so I might move in that direction. I’m not sure if that’s as important as actually moving ahead with the Metagraph. Stay tuned!

This year I’ll be presenting a talk about compile-time “reparsing” at C++ Now! (formerly BoostCon).

IMHO last year’s talk about Graph Metaprogramming was a hit, and the weirdest talk at the conference.

This year I’ll try to outdo myself by showing even more template metaprogramming, and by comparing techniques between my Angly, Ábel Sinkovics’ Metaparse, and Eric Niebler’s Proto.

Why do I call it ‘reparsing’? Well, partly because I’m very tired of the meta- prefix, having read *Gödel Escher Bach* half a lifetime ago, and want to reserve the prefix for processes applied to themselves rather than just anything to do with metaprogramming. But also because with each of the tools, the C++ compiler’s parser has already run, and then you want to *re*interpret the results through a metaprogram: Angly for angle-bracket expressions, Metaparse for compile-time strings, Proto for runtime expressions.

Here’s a link:

http://cppnow.org/session/compile-time-repasing/

I committed the Angly parser to the Boost sandbox (under metagraph). Fare thee well, sweet child!

grodo-mcwoohu:example gordon$ svn commit ../../.. -m "angly parser; remove unneeded deps" Deleting boost/fusion Deleting boost/intrusive Adding boost/metagraph/angly Adding boost/metagraph/angly/detail Adding boost/metagraph/angly/detail/apply_seq.hpp Adding boost/metagraph/angly/dfa.hpp Adding boost/metagraph/angly/dfa_lang.hpp Sending boost/metagraph/fusion_graph/fusion_graph.hpp Sending boost/metagraph/mpl_graph/depth_first_search.hpp Sending boost/metagraph/mpl_graph/detail/incidence_list_graph.ipp Sending boost-build.jam Deleting libs/intrusive Adding libs/metagraph/example/angly_graph_dfa.cpp Adding libs/metagraph/example/angly_graph_parser.cpp Transmitting file data ......... Committed revision 74000.

I think it’s almost fair to Douglas Hofstadter to say that I lept over a Strange Loop this week. Or at least I evaded one.

Specifically, as I mentioned in my last post, I wanted a nicer language for specifying DFA languages. Uh oh! Define the language in terms of itself? No: we just use the old ugly way of specifying DFAs one last time, in order to produce a DFA which can take DFA definitions and produce more DFAs.

Again, here’s the language to parse:

```
typedef graph<node<A, edge<t, B>, edge<u, C> >,
node<B, edge<v, D> >,
node<C, edge<w, E> >,
node<D, edge<x, F> >,
node<E, edge<y, F>, edge<z, G> > > adjacencies;
```

Now, here’s the language definition:

typedef dfa<state<graph_nodes, mpl::true_, mpl::vector<>, transition<graph_node, graph_nodes, tmparser::match_token<node<> >, node_name, mpl::push_back<mpl::_1, mpl::_2> > >, state<node_name, transition<node_name_trans, node_edges, mpl::always<mpl::true_>, void, mpl::pair<mpl::_2, mpl::vector<> > > >, state<node_edges, mpl::true_, transition<node_edge, node_edges, tmparser::match_token<edge<> >, edge_name, mpl::pair<mpl::first<mpl::_1>, mpl::push_back<mpl::second<mpl::_1>, mpl::_2> > > >, state<edge_name, transition<edge_name_trans, edge_target, mpl::always<mpl::true_>, void, mpl::_2> >, state<edge_target, transition<edge_target_trans, edge_done, mpl::always<mpl::true_>, void, mpl::pair<mpl::_1, mpl::_2> > >, state<edge_done, mpl::true_> > graph_dfa_desc;

Not too bad, right? I mean, we’d rather be specifying regular expressions than creating the DFA by hand, but it’s hard to see how a parser for the graph metadata language could be much terser.

Let’s look at the start state, to see how the DFA works:

state<graph_nodes, mpl::true_, mpl::vector<>, transition<graph_node, graph_nodes, tmparser::match_token<node<> >, node_name, mpl::push_back<mpl::_1, mpl::_2> > >,

The start state graph_nodes is looping on node<> tokens. The optional second and third parameters specify that it is a finish state (the parser will #error out if it reaches the end of input and the current state is not a finish state), and the starting data: since we’re building data for mpl_graph::adjacency_list, this is an empty sequence of state vertices.

After the node parameters comes an optional sequence of transition<> tokens. This state has one transition graph_node which loops back to the graph_nodes state. This transition will match any template expression wrapped in node<> and it will recurse into node_name (putting a new parser context frame on the explicit stack). The transition specifies a semantic action to be executed after recursion is complete, which pushes the new state onto the vertex sequence.

See if you can figure out how the rest of the states and transitions work!

I’ll clean up the code and post it to the metagraph sandbox soon (under “angly”).

P.S. The same parser could be used to parse variadic runtime arguments as a regular expression (well almost) of argument types. That’s not where I’m going with this right now, but I’d be happy to show an example if there’s interest.

P.P.S. How slow is it? This example, creating the DFA parser, then parsing the graph DFA description, then parsing an example graph and running depth first search on it, takes 3 seconds using gcc 4.5 on my laptop. That works for me.

P.P.P.S. One more strange loop to cross: the structure of a composed type pattern is logically a composed type pattern!

I’ve implemented a parser for template expressions. This will enable EDSLs in the realm of C++ angle brackets.

For example, yesterday I created a new syntax for specifying graph metadata:

struct A {}; struct B {}; struct C {}; struct D {}; struct E {}; struct F {}; struct G {}; struct t {}; struct u {}; struct v {}; struct w {}; struct x {}; struct y {}; struct z {}; /* A - t > B - v > D - x > F - u > C - w > E - y > F - z > G */ typedef graph<node<A, edge<t, B>, edge<u, C> >, node<B, edge<v, D> >, node<C, edge<w, E> >, node<D, edge<x, F> >, node<E, edge<y, F>, edge<z, G> > > adjacencies;

typedef typename tmparser::run_dfa<graph_dfa, graph_nodes, adjacencies>::type graph_data; typedef typename mpl_graph::adjacency_list_graph<graph_data> parsed_graph;

This allows the most concise possible description of graph metadata. It allows the format to be validated. And since the keywords used in these expressions are not metafunctions but just tokens to be matched, they can have different meaning in different grammars and contexts.

The matching is done by a DFA with a stack, implemented on MPL.Graph. At each level of the expression, the DFA follows transitions matching items in the current list. A transition can recurse into the deeper level of the expression, in which case a new stack frame is created with new data and a new start state. The transition is labeled with semantic actions for before and after recursing.

Next up: a language for specifying these DFA’s nicely. Right now it’s kind of messy:

template<typename ...> struct graph {}; template<typename ...> struct node {}; template<typename ...> struct edge {}; //bogus template<typename ...> struct foo {}; // parser for metadata text of adjacency_list format // graph< node< node_tag, edge< edge_tag, target_node_tag >, ...> > // this an example of a graph language parser, not realistic // define types for states struct graph_nodes : tmparser::dfa_state<mpl::true_, mpl::vector<> > {}; struct node_name : tmparser::dfa_state<> {}; struct node_edges : tmparser::dfa_state<mpl::true_> {}; struct edge_name : tmparser::dfa_state<> {}; struct edge_target : tmparser::dfa_state<> {}; struct edge_done : tmparser::dfa_state<mpl::true_> {}; // types for transitions struct graph_node : tmparser::dfa_transition<tmparser::match_token<node<> >, node_name, mpl::push_back<mpl::_1, mpl::_2> > {}; struct node_name_trans : tmparser::dfa_transition<mpl::always<mpl::true_>, void, mpl::pair<mpl::_2, mpl::vector<> > > {}; struct node_edge : tmparser::dfa_transition<tmparser::match_token<edge<> >, edge_name, mpl::pair<mpl::first<mpl::_1>, mpl::push_back<mpl::second<mpl::_1>, mpl::_2> > > {}; struct edge_name_trans : tmparser::dfa_transition<mpl::always<mpl::true_>, void, mpl::_2> {}; struct edge_target_trans : tmparser::dfa_transition<mpl::always<mpl::true_>, void, mpl::pair<mpl::_1, mpl::_2> > {}; typedef mpl_graph::adjacency_list_graph< mpl::vector<mpl::pair<graph_nodes, mpl::vector<mpl::pair<graph_node, graph_nodes> > >, mpl::pair<node_name, mpl::vector<mpl::pair<node_name_trans, node_edges> > >, mpl::pair<node_edges, mpl::vector<mpl::pair<node_edge, node_edges> > >, mpl::pair<edge_name, mpl::vector<mpl::pair<edge_name_trans, edge_target> > >, mpl::pair<edge_target, mpl::vector<mpl::pair<edge_target_trans, edge_done> > > > > graph_dfa;

I’m going to start posting some weird Metagraph ideas soon. (I’ve got a DFA parser for angly<type<expressions>> heh). But for now, a few more pictures.

Treehouse from below

The kitchen is almost complete.

View from inside the kitchen.

I am living in my treehouse this summer in order to Realize (I hope) the Metagraph. I’ve been thinking about this data structure for over ten years and I finally have figured out how to implement it, so I have come to solitude to write the code.

Some evening pictures:

I’ll post some daytime pictures soon.

Right now my friend Collin and I are building a kitchen hut so I don’t have to haul water all the way along the ridge, and don’t have to cook up here. This is a pretty small structure and I’m constantly knocking spoons and pans off the porch down 20 feet to the hill below. (Listen where it rolls on the leaves, then fetch later.)

Welcome to my new site devoted to news and information about Metagraphs, MPL.Graph, and Fusion.Graph.

I was thrilled to present MPL.Graph at BoostCon in Aspen last week. I had a decent audience of maybe 20 people. A lot of people got it; others laughed and shook their heads. I think I remembered to say most of the things I wanted to say and showed a lot of real code. (I discovered that Fusion.Graph currently fits in a single slide!)

Even better was talking with other library authors about half a dozen other applications for MPL.Graph. For me, MPL.Graph is the first step toward a particular data structure (the Metagraph), but there are many other places where graph metaprogramming can improve the structure and analysis of programs, as there are many programming language features which are inherently graphlike.

Take a look at some of the possibilities in the paper Introducing MPL.Graph.

Write me directly or through the Boost mailing lists or I’m occasionally grodo on IRC.