avro icon indicating copy to clipboard operation
avro copied to clipboard

AVRO-2872: Fix bug in SpecificCompiler which prevents Logical Types being registered for union fields

Open icecreamhead opened this issue 4 years ago • 11 comments

Make sure you have checked all steps below.

Jira

  • [x] My PR addresses the following Avro Jira issues and references them in the PR title.
    • https://issues.apache.org/jira/browse/AVRO-2872
    • In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.

Tests

  • [x] My PR adds the following unit tests OR does not need testing for this extremely good reason:

TestSpecificCompiler#testLogicalTypeUnionProducesConversionField

Commits

  • [x] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • [x] In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain Javadoc that explain what it does

icecreamhead avatar Feb 16 '21 16:02 icecreamhead

What do I need to do to get eyes on this? @RyanSkraba ?

icecreamhead avatar Feb 24 '21 16:02 icecreamhead

Thanks for the contribution and your patience! I asked a question on the JIRA -- what symptoms are you seeing in this bug?

I'm wondering what would happen if passed a record with a UNION field that had two logical types (something compatible like a date and a timestamp-millis). Is it currently unusable, and what would change with this fix?

RyanSkraba avatar Feb 25 '21 09:02 RyanSkraba

I'm running into an issue which I think is related but building this PR locally and testing it hasn't fixed the issue. I have a schema.avdl

@namespace("org.example")
protocol TestService {
  record MaybeLogicalTypes {
    union {null, date} dateType = null;
    union {null, time_ms} timeMsType = null;
    union {null, timestamp_ms} timestampMsType = null;
  }
  MaybeLogicalTypes getMaybeLogicalTypes();
}

I have server code which returns a MaybeLogicalTypes object with all the fields set however when the client code calls

SpecificRequestor.getClient(TestService.class, new HttpTransceiver(url)).getMaybeLogicalTypes()

I get an error when union {null, date} or union {null, time_ms} or union {null, timestamp_ms} is populated. e.g.

org.apache.avro.AvroRuntimeException: org.apache.avro.AvroRuntimeException: Unknown datum type java.time.Instant: 2021-03-31T11:26:20.123888Z
                                                at org.apache.avro.ipc.specific.SpecificRequestor.readError(SpecificRequestor.java:160) ~[avro-ipc-1.10.2.jar:1.10.2]
                                                at org.apache.avro.ipc.Requestor$Response.getResponse(Requestor.java:566) ~[avro-ipc-1.10.2.jar:1.10.2]
                                                at org.apache.avro.ipc.Requestor$TransceiverCallback.handleResult(Requestor.java:366) ~[avro-ipc-1.10.2.jar:1.10.2]
                                                at org.apache.avro.ipc.Requestor$TransceiverCallback.handleResult(Requestor.java:330) ~[avro-ipc-1.10.2.jar:1.10.2]
                                                at org.apache.avro.ipc.Transceiver.transceive(Transceiver.java:73) ~[avro-ipc-1.10.2.jar:1.10.2]
                                                at org.apache.avro.ipc.Requestor.request(Requestor.java:152) ~[avro-ipc-1.10.2.jar:1.10.2]
                                                at org.apache.avro.ipc.Requestor.request(Requestor.java:101) ~[avro-ipc-1.10.2.jar:1.10.2]
                                                at org.apache.avro.ipc.specific.SpecificRequestor.invoke(SpecificRequestor.java:108) ~[avro-ipc-1.10.2.jar:1.10.2]

StephenFlavin avatar Mar 31 '21 12:03 StephenFlavin

This looks related, but not exactly the same issue. This would fix the conversions array in the specific, generated MaybeLogicalTypes, but it looks like the IPC responder doesn't take it into account in any case. Can you create a new JIRA?

RyanSkraba avatar Apr 01 '21 12:04 RyanSkraba

Raised AVRO-3102 (sorry for the noise in the PR)

StephenFlavin avatar Apr 01 '21 20:04 StephenFlavin

Shouldn't LogicTypes be resolved recursively? Curious if it will be registered in the case of an Array of Logical Types

Fokko avatar Apr 06 '21 18:04 Fokko

Shouldn't LogicTypes be resolved recursively? Curious if it will be registered in the case of an Array of Logical Types

There's some discussion about this on the Jira. We've established that this fix works for a union of exactly two elements where one of them is null, but unions with more branches is still an open question. The current conversion resolver doesn't have flexibility to provide different implementations based on the runtime time. As an aside, are nested unions allowed? Even if they are, how are they functionally different from a single union with multiple elements?

icecreamhead avatar Apr 07 '21 08:04 icecreamhead

Good news: a UNION can't contain other unions, among other limitations (such as not containing two different logical types with the same underlying primitives, or even two ARRAY with different type contents.

It currently looks like an ARRAY of timestamp-millis does generate the methods with the correct type List<Instant>, but does not create the conversions[] array in the specific record.

RyanSkraba avatar Apr 07 '21 12:04 RyanSkraba

This is ready for review following our discussions 👍

icecreamhead avatar Apr 09 '21 09:04 icecreamhead

Nudge @RyanSkraba

icecreamhead avatar Apr 13 '21 09:04 icecreamhead

I would like to bring this one up again, as we experience issues while using an optional string with logicalType UUID, which is represented as a union of null and the string itself:

{
"name": "id",
"type": [
  "null",
  {
    "type": "string",
    "logicalType": "uuid"
  }
}

As a result the conversion array is not populated and therefore we're not able to serialize avro messages with the same exception as described in the description.

We're using version 1.11.3. For me it looks like this issue could be fixed with this PR.

cfender-shinefour avatar Feb 06 '24 07:02 cfender-shinefour