dune icon indicating copy to clipboard operation
dune copied to clipboard

Support for menhir's `.messages` files

Open Gbury opened this issue 5 years ago • 15 comments

Desired Behavior

Have builtin rules for .messages files produced by menhir (cf menhir's manual ). This would include:

  • generating a fresh .messages file for a new grammar / menhir parser
  • updating an existing .messages file when the grammar / menhir parser changes
  • compiling the .messages file into an .ml file to make it available in the rest of the project

Note that given that .messages files contain hand-written messages, they should be part of the source tree, hence the first two rules above (generating fresh .messages files and updating them) would need to promote the generated file (potentially overwriting the current one in the case of an update). However, the .ml file generated from the .messages file needn't be promoted.

Apologies if it already exists, but I couldn't find any reference to it neither in the docs (both for stable and latest), nor in the code (after a search for messages, nothing seemed related to menhir).

Example

I already have implemented manual rules for this in one of my projects, see https://github.com/Gbury/dolmen/blob/master/src/languages/smtlib2/v2.6/dune . These rules work however they are somewhat limited by the almost circular dependency of a .messages file on itself because of the updating process, so the example linked requires some user manual intervention to move a new or updated generated .messages file.

Gbury avatar Mar 20 '20 13:03 Gbury

No, this is not yet supported out-of-the-box. This would be a nice addition and not very hard I think, a nice task for someone wanting to get into dune's codebase.

For reference, here are the relevant rules for the .messages file of menhir itself:

https://gitlab.inria.fr/fpottier/menhir/-/blob/master/src/stage2/dune#L86-130

cc @fpottier

nojb avatar Mar 20 '20 13:03 nojb

I agree with Nicolas. @Gbury if you’re interested in contributing, we could guide you through it

rgrinberg avatar Mar 20 '20 15:03 rgrinberg

I'll try and do that this weekend then (one upside of the current situation: I'll have way more free time than usual, ^^). The part that might be more complex is to have rules that differ depending on whether a .messages file already exists in the source_tree.

Gbury avatar Mar 20 '20 15:03 Gbury

What is the expected workflow for such files? IIUC, we want the system to generate the file and keep it up to date, and at the same time we want the user to modify it. How is that supposed to work?

ghost avatar Mar 24 '20 16:03 ghost

Well that's kind of the problem, actually. What we want I think, is the following:

  • if the messages file does not exists, create it using menhir, and promote it in the source tree
  • the user might modify the message file in the source tree
  • when building, use menhir to generate an 'updated' version of the message file in the source tree; that updated version can be used in the rest of the build (this ensures that the message file used in the rest of the build is consistent with the parser). Additionally, we want to at one point to promote the updated message file in the source tree, in order for it to replace the message file in the source tree. However, this introduces a circular dependency in theory, but we know that one invocation of menhir is enough (and in theory also enough to reach a fixpoint if I'm correct). Currently, in the rules I wrote, the new and updated message files have different names and the user has to rename them manually (thus cutting the circular dependency), but ideally, this is not necessary.

Gbury avatar Mar 24 '20 16:03 Gbury

But if we use the newly generated updated version for the build, does that mean that the version carefully customised by the user will be ignored?

ghost avatar Mar 25 '20 11:03 ghost

But if we use the newly generated updated version for the build, does that mean that the version carefully customised by the user will be ignored?

No, as the menhir manual specifies:

The command --update-errors filename is used to update the auto-generated comments in the .messages file filename. It is typically used after a change in the grammar (or in the command line options that affect the construction of the automaton). A new .messages file is produced on the standard output channel. It is identical to filename, except the auto-generated comments, identified by ##, have been removed and re-generated.

However, after re-reading the doc, it's not clear whether the --update-errors flag will 1) add the potentially new errors due to the grammar change, and 2) remove the now unused errors for the new grammar, I guess we'd need to ask @fpottier .

Gbury avatar Mar 25 '20 11:03 Gbury

Ok, so I indeed misunderstood what the --update-errors option does. As explained in this issue, the update-error mechanism only updates the comments so it can be used to update the .messages file in the source tree safely.

However, in order to fully upgrade a .messages file after changing a grammar, the correct thing to do seems to be:

  • generate the new list of errors with the --list-errors flag
  • compare the new and old/current .messages file using the --compare-errors option
  • from the diff, the user has to manually add the new errors messages to the current .messages file.

From that I think we could have the following in dune. The menhir stanza can have an optional field specifying an ocaml module name for the .messages file. Let's say that name is Foobar. We would then have the following builtin rules in dune:

  • a rule to compile foobar.messages into foobar.ml using --compile-errors
  • a rule to generate a new fresh foobar.messages.new file using --list-errors (the user would have to manually move it to foobar.messages to use it)
  • a rule to generate a diff when changing a parser, using --compare-errors between foobar.messages.new and foobar.messages. The suer would have to manually copy this diff into foobar.messages and edit the error messages.
  • a rule to auto-update the comments generated by menhir inside foobar.messages using --update-errors (this is problematic because it introduces a cyclic dependency and I don't know if that can be done in dune currently)

Gbury avatar Mar 25 '20 12:03 Gbury

I feel like the diff/promotion workflow of dune is very well suited for such a case.

Does the format of the messages file makes it easy to distinguish between the parts that are under the control of the user and the parts that are under the control of menhir?

ghost avatar Mar 25 '20 14:03 ghost

Does the format of the messages file makes it easy to distinguish between the parts that are under the control of the user and the parts that are under the control of menhir?

I'd say yes. You can see an example of freshly generated .messages file here . The <symbol> : TOKEN TOKEN TOKEN .... line is generated by menhir to identify the parser state, the ##-commented lines are here to explain more about the error state, and finally the user can insert his/her message where indicated.

Gbury avatar Mar 25 '20 16:03 Gbury

Ok, then in this case I propose to setup things as follow: ask menhir generates a messages.corrected that updates the existing messages file and use (diff ...) to diff the two. When the file needs updating, dune will print a diff and the user will be free to promote the correction using dune promote or the dune-promote emacs command.

ghost avatar Mar 26 '20 11:03 ghost

Update: I haven't had the time to work on dune for this, but I managed to find an adequate combination of stanzas to achieve a reasonable workflow. The dune file can be found at https://github.com/Gbury/dolmen/blob/52f6e3befb76492bac5e34c915f9dc0ec026a2e2/src/languages/dune.common and a short explanation of how the rules work is in this comment : https://discuss.ocaml.org/t/dune-wish-list-for-2023/11083/60?u=zozozo

One thing that could be done in dune to help with all this is to add a messages field to the menhir stanzas, which would generate stanzas similar to those in my dune files, so that new users do not have to find the correct magic menhir invocations. One tricky thing to take care of is to correctly handle additional flags that users have given to menhir (e.g. in my example, I have functorised parsers, which force me to have tokens defined in a separate file, and thus menhir must be provided with adequate --external-tokens cli options, and the stanzas generating the messages files must also depend on the tokens.mly file).

Gbury avatar Jan 10 '23 13:01 Gbury