avro icon indicating copy to clipboard operation
avro copied to clipboard

AVRO-3403: Create and use ANTLR to parse IDL files

Open opwvhk opened this issue 3 years ago • 1 comments

Added a shared ANTLR grammar, without actions or predicates. This guarantees ANTLR can create usable parsers in any language it supports. The Java code creating a parse tree and transforming it into an IdlFile is the code to be ported to add IDL support to other languages.

The IDL reader using the ANTLR generated parser replaces the JavaCC parser in the maven plugin and in the tools. The JavaCC parser has been deprecated, but not removed.

Make sure you have checked all steps below.

Jira

  • [X] My PR addresses the following Avro Jira issues and references them in the PR title. For example, "AVRO-1234: My Avro PR"
    • https://issues.apache.org/jira/browse/AVRO-3403
    • In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.

Tests

  • [X] My PR adds quite a few ~the following~ unit tests ~OR does not need testing for this extremely good reason~: see PR

Commits

  • [X] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • [X] In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain Javadoc that explain what it does

opwvhk avatar Mar 08 '22 22:03 opwvhk

Added a change to the grammar (sorry for the way in which I did...) such that it the generated code can also work on Python.

opwvhk avatar Mar 22 '22 11:03 opwvhk

Hi! Our tool BreakBot found that this pull request introduces 3 breaking changes and they appear to impact 5 of your clients. You can find the full BreakBot report in our fork repository: report for PR#1588.

We hope this information is valuable to you, and apologize otherwise. If you're willing to help, we would kindly ask for your help to fill in a 5-minutes survey about the report. Your feedback will help us improve the tool and provide a better experience for users in the future!

lmove avatar Nov 18 '22 10:11 lmove

The BreakBot report marks the @Deprecated annotations as breaking changes.

Set aside that these are not actually breaking changes, they were added on purpose to signal a migration is needed for future compatibility.

opwvhk avatar Nov 18 '22 11:11 opwvhk

You are right. We include the impact of @Deprecated elements as it may give a better idea of which clients will be faced with a deprecated API. Thanks for your input @opwvhk!

tdegueul avatar Nov 18 '22 11:11 tdegueul

I didn't spend too much time (the change list is huge!) but I didn't notice anything bad.

Ah, yes... the tricky part about replacing the IDL parser is that that's unavoidable.

Thank you for taking the time though!

The subsequent PR's (#1589 & #1954) are significantly smaller 😅

opwvhk avatar Nov 18 '22 15:11 opwvhk

Why not put "Idl.g4" file in src/main/resources of Idl module ?

Because this is a shared location, which allows the grammar to be reused for e g. a Python implementation.

The reason to migrate to ANTLR is partly because it is supported, but also because it supports multiple languages.

opwvhk avatar Aug 03 '23 08:08 opwvhk

After resolving the conflicts, I'd like to merge this one and move on to PR #1589 (making the IDL syntax a full .avsc equivalent). As there's 2 approvals other than mine, are there any objections to merge?

opwvhk avatar Sep 11 '23 11:09 opwvhk