grmtools Expose more than one rule?

Question / Feature Request: Is there any way to parse a specific rule as the starting parser? For example, if I have:

%start Expr
%%
Expr -> ...;

Int -> ...;
%%

I also want to be able to parse a string as Int, not just Expr.

(I'm trying to port my parser from LALRPOP to lrpar (mainly because of the operator precedence feature) which exposes a parser for any rule prefixed with the keyword pub.)

May 30 '20 06:05 utkarshkukreti

I must admit, this isn't a feature that I'd thought about. At first glance, it seems an awkward fit with LR parsing, in the sense that, at least conceptually, you need 1 statetable per start rule. However, in practise I think you might be able to use a single statetable and "emulate" the accept state for an arbitrary start rule such that you can get away with a single statetable. However, I might be wrong about that. LALRPOP, uses the lane table algorithm, which might or might not make this stuff easier -- I haven't familiarised myself with the algorithm. @nikomatsakis might have a thought or two on this.

So, at the moment, unfortunately, your only option is to duplicate the Yacc grammar for each start rule. However, what I can fairly easily do is remove some of the assumptions the grmtools libraries have about start states. That won't get us all of the way to the feature you're asking for, but it will make it easier for someone else to implement it -- and I hope they do, because this is the sort of feature that I think grmtools should be flexible enough to accommodate!

May 30 '20 07:05 ltratt

I don't think Lane Table helps in particular but I've not given it a lot of thought. I think LALRPOP permits you to tag multiple rules as pub but IIRC it just generates separate parsers for each one, there is no shared code or state. (I could be misremembering.)

Jun 02 '20 13:06 nikomatsakis

Sorry @nikomatsakis for not saying thanks for your comment (I'm only nearly a year late)!

Coming back to this one with the benefit of hindsight, I think that I'd be fine with generating 1 parser per start rule at first. It's probably suboptimal, but I'm fine with getting the functionality in and working out how to make it more efficient in the future.

AFAICS neither Yacc nor Bison supports this, so that gives us both the freedom to do what we want, but also the difficulty of sifting amongst the design choices. My first thought is that I think it would be reasonable to allow the %start rule to specify more than 1 start rules (e.g. %start R1 R2).

The slightly tricky thing to think about is what are the resulting parsers called? At the moment if you have a lexer g.l and a grammar g.y you end up with modules named g_l and g_y with functions lexerdef and parser respectively. I think if a user specifies a single start rule, we should maintain that behaviour. I can then see at least two possibilities:

We generate two parser modules g_r1_y and g_r2_y both with parser functions.
We generate one parser module with r1_parser and r2_parser functions.

I slightly prefer the second option, but could be persuaded otherwise.

Mar 29 '21 14:03 ltratt