New PEG-based Parser
Summary
This PR implements a new WebGAL parser based on parsing expression grammar (PEG). This has certain advantages over the current non-standard string-based parsing and enables the possibility of optimizations, advanced grammar, better error handling, etc. The new parser aims to be fully compatible with the old one.
Features (If Implemented Correctly)
- 100% backward compatibility
- More intelligent parsing behaviors
- Error reporting and recovering from command strings with syntax errors
Backward compatibility
The new parser also generates a sentenceList with all fields available in the old parser. It may add some extra fields (e.g., recording errors and some utility command string information), but the result should work on the current engine.
New Behavior for Syntax Errors
Now, if the parser encounters syntax errors when parsing a command, it will stop at the character that contains the error and skip to the next line, but the parsed command will still be effective
For example, if a script contains
changeFigure:stand.png -left -next;
pixiInit:; // this line has syntax error. `pixiInit` should not have ':'
setAnimation:enter-from-left -target=fig-left -next;
it will then be parsed as
changeFigure:stand.png -left -next;
pixiInit
setAnimation:enter-from-left -target=fig-left -next;
which preserves the behavior of pixiInit.
Error Reporting and Recovery
The new parser supports error reporting and recovery. All errors in the script will be recorded in an errors field, which can be shown to the user after adapting the language server protocol (LSP) on this.
The aforementioned new behaviors on syntax errors ensures error recovery.
Changes to Source Code
The source code of the old parser is moved to packages/parser_legacy. The package name is changed to webgal-parser-legacy
The source code of the new parser is put in packages/parser. It is distributed under MPL-2.0 license (license file attached).
What's Next
- Minify the generated parser. Currently, the generated PEG parser is ~200KB. For web deployment, this may become an issue. We can minify the compiled
parser.jsto ~100KB. Together with gzip on the web server, it may reduce the final data transmission. - Migrate post-processing logic. To ensure backward compatibility, the raw content of some parsed fields is preserved so that the post-processing still works. This results in unnecessary double parsing. We may need to migrate the post-processing logic to utilize the parsed fields directly.
@MakinoharaShoko, feel free to change the merge base branch :)
Before merging the new parser into the dev test, we need to ensure that https://github.com/OpenWebGAL/WebGAL/pull/558 passes to support a certain extent of multi-line parsing. Once multi-line parsing is validated and the new parser is confirmed to complete parsing as expected on this branch, we can proceed with the merge.
Additionally, the method for adding commands in the new parser is also different. It is hoped that a document can be added to explain how many places in the code need to be modified to add a new command.
After review, the following issues exist in the new parser:
- Failure to utilize
ADD_NEXT_ARG_LISTto add thenextparameter for statements that require it automatically. - Incorrect timing for resource preloading. It should occur after parsing.
- The
SceneParserlacks sufficient type hinting. The parsing results should conform to theIScenetype. - The parsing results lack the necessary fields
sceneNameandsceneUrl. - Failure to call
assetsSetter, resulting in script files not being converted to the correct paths as expected.
I have reverted the default exported SceneParser class in index.ts back to the old parser. I believe the following steps can be taken to align the new parser with the previous logic:
- Write a longer scene that covers as many statements and syntax variations as possible, and compare the differences in the parsing results between the new and old parsers. For this use case, the goal should be to achieve complete parity between the parsing results of the new and old parsers.
- For all test cases, first switch the parser used to the old parser. Then check whether failing test cases are due to issues within the old parser itself, or due to inconsistencies between the expected results of the test cases and the correct parsing results of the old parser.
I believe these temporary imperfections are not difficult to resolve. Our primary goal is to ensure consistency between the parsing results of the new and old parsers. Once this is achieved, we can leverage the enhanced error detection capabilities of the new parser.
由于先前的 WebGAL 项目不规范,使用 bfg 处理后需要重新同步 dev 分支并 cherry pick 这个 pr 上的提交到从 dev 新拉取的分支。这个 pr 的 commit 很多,很麻烦,所以建议: 1、直接将 parser 相关的目录和可能修改的文件拷贝到一个临时文件夹 2、完全同步 dev 分支到上游 3、从 dev 拉取一个新的分支 4、从临时文件夹拷贝回文件,并进行一次提交
#678