ST6RI-846 Create EBNF / GBNF extractor tool for KerML and SysML specs
Create a BNF Extractor tool that can extract the BNF snippets from the KerML textual notation EBNF grammar and from the SysML textual notation EBNF and graphical notation GBNF grammars, and convert it to two target formats:
- Plain text file, suitable for machine processing by an EBNF or GBNF parser.
- Simple HTML file, that supports easy browsing and navigation for human readers, with hyperlinks from usages of all grammar elements to their declaration.
The adapted tool should be added under tool-support .
A new folder bnf_grammar_tools has been created under tool-support, with two new Python command line tools: bnf_grammar_processor and bnf_file_parser. In addition to the standard Python modules, three other open source packages have been used, that are all licensed under the MIT License:
- Package beautifulsoup4 used to parse the HTML input files.
- Package lark, a parser toolkit written in python, is used to parse and verify the extracted BNF source.
- Package pytest is used to run python unit tests.
The README.adoc in subdirectory bnf_grammar_tools contains a full description on how to install and use the tools. A workflow to extract, validate and correct BNF grammars has been added.
In terms of data used for testing, under further subdirectory tests/KerML_and_SysML_spec_sources the raw HTML files that were exported from the release 2025-04 versions of the KerML and SysML specs in View Editor are stored. They have been used to test the correctness of the implementation. Also, subdirectory tests/KerML_and_SysML_grammars contains a complete set of all input files and generated output and log files. These illustrate how the tools can be used and how errors in the respective BNF grammars are detected and reported, and can be inspected for review. Also, BNF grammars before and after manual corrections can be compared, using a diff tool, to systematically compile mistakes to be raised as OMG RTF issues.
Pushed a fix for incorrect call to and body of GrammarProcessor.report_checks that caused report_checks to be called twice for textual-bnf and once for graphical-bnf syntaxes.
@seidewitz
In the meantime, I discovered and fixed a bug in Drawio Desktop (SMC variant) that caused incomplete export of PDF files and subsequent incorrect conversion to SVG files. This resulted in some broken SVG images in the generated SysML-graphical-bnf.html and SysML-graphical-bnf-corrected.html files (in folder tool-support/bnf_grammar_tools/tests/KerML_and_SysML_grammars).
At the appropriate time, I can push a fix to resolve this. It basically involves replacing all SVG files in the tool-support/bnf_grammar_tools/tests/KerML_and_SysML_grammars/images folder.
Actually, it seems that lxml-xml may actually be necessary in some form. I am now getting the following error:
File "/Users/seidewitz/Documents/Work/git/SysML-v2-Pilot-Implementation/tool-support/bnf_grammar_tools/bnf_grammar/bnf_grammar_processor.py", line 1427, in <module>
main()
File "/Users/seidewitz/Documents/Work/git/SysML-v2-Pilot-Implementation/tool-support/bnf_grammar_tools/bnf_grammar/bnf_grammar_processor.py", line 1418, in main
grammar_processor.extract_bnf_from_spec(input_dir, output_dir, file_name, syntax_kind, clause_id)
File "/Users/seidewitz/Documents/Work/git/SysML-v2-Pilot-Implementation/tool-support/bnf_grammar_tools/bnf_grammar/bnf_grammar_processor.py", line 616, in extract_bnf_from_spec
updated_line, img_index = self.rewrite_img_element(current_production, img_count, img_index, line)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/seidewitz/Documents/Work/git/SysML-v2-Pilot-Implementation/tool-support/bnf_grammar_tools/bnf_grammar/bnf_grammar_processor.py", line 840, in rewrite_img_element
svg_soup = BeautifulSoup(svg_file, "lxml-xml")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/seidewitz/miniconda3/lib/python3.11/site-packages/bs4/__init__.py", line 366, in __init__
raise FeatureNotFound(
bs4.exceptions.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml-xml. Do you need to install a parser library?
Actually, it seems that
lxml-xmlmay actually be necessary in some form. I am now getting the following error:
Aargh, yes I forgot. Need to install package lxml in the Python environment to perform look-up inside SVG files.
Aargh, yes I forgot. Need to install package lxml in the Python environment to perform look-up inside SVG files.
Once I install lxml, I can run the tool successfully. The README needs to be updated for this.
Aargh, yes I forgot. Need to install package lxml in the Python environment to perform look-up inside SVG files.
Once I install
lxml, I can run the tool successfully. The README needs to be updated for this.
OK, great. In the meantime, I verified running with Python 3.8, which fails due to too modern typing syntax. However, 3.9 succeeds. So Python v3.9 or higher is required. I will update that as well in the README. Also corrected some minor inconsistencies.
In the meantime, I discovered and fixed a bug in Drawio Desktop (SMC variant) that caused incomplete export of PDF files and subsequent incorrect conversion to SVG files. This resulted in some broken SVG images ... At the appropriate time, I can push a fix to resolve this. It basically involves replacing all SVG files in the
tool-support/bnf_grammar_tools/tests/KerML_and_SysML_grammars/imagesfolder.
@seidewitz Are you OK for me to push the updated SVG files? This will improve the quality of the generated graphical BNF grammars in HTML format.
Are you OK for me to push the updated SVG files? This will improve the quality of the generated graphical BNF grammars in HTML format.
Yes, please do.
New set of SVG symbols pushed. This fixes a number of broken SVG symbols in files:
-
tool-support/bnf_grammar_tools/tests/KerML_and_SysML_grammars/SysML-graphical-bnf.html -
tool-support/bnf_grammar_tools/tests/KerML_and_SysML_grammars/SysML-graphical-bnf-corrected.html
The above two commits are due to correcting a few minor mistakes in diagnostic reporting and documentation. Also removed a spurious debugging setting. There is no change in functionality.