avro icon indicating copy to clipboard operation
avro copied to clipboard

AVRO-3520: Expose read schema in custom encoding

Open itstheceo opened this issue 3 years ago • 1 comments

Currently the read schema is not exposed when using CustomEncoding<T> which makes detecting schema changes impossible without rolling your own versioning inside the record itself. This is not an option for files that already exist, and it adds overhead by adding the version to every row in the file.

To address this issue I added a CustomEncoderFieldAccess interface and set the read schema onto the encoder for the respective field accessor. This method can be overridden in the concrete implementation of CustomEncoding<T> enabling the detection of a schema change in a way that makes backwards compatible reads possible.

Jira

  • [x] My PR addresses the following Avro Jira issues and references them in the PR title.
  • https://issues.apache.org/jira/browse/AVRO-3520

Tests

  • [x] My PR adds the following unit tests.
  • TestReflectCustomEncoderFieldAccess

Commits

  • [x] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • [x] In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain Javadoc that explain what it does

itstheceo avatar May 19 '22 22:05 itstheceo

I wonder if the method should be called withSchema rather than setSchema because it is apparently not intended to mutate the target instance.

KalleOlaviNiemitalo avatar Jul 05 '22 09:07 KalleOlaviNiemitalo