avro icon indicating copy to clipboard operation
avro copied to clipboard

logicalType - keep property

Open jwtryg opened this issue 1 year ago • 3 comments

Currently when parsing an avro schema, the schema.go file contains a number of reserved properties that are removed from the schemas. These reserved properties include the logicalType property.

However, some projects, like apache iceberg, uses a number of logical types that are not included in the avro specification (see this) and therefore don't have corresponding LogicalTypes defined. When parsing such an avro schema, these unknown logicalTypes are then simply ignored and furthermore removed from the schema propertiies which makes them impossible to detect.

My suggestion for a simple solution is that the logicalType properties are kept by removing it from the list of schemaReserved properties. Or do you have other alternatives?

jwtryg avatar Aug 16 '24 09:08 jwtryg

Someone floated an idea of supporting custom logical types. That could be an alternative.

nrwiersma avatar Aug 16 '24 18:08 nrwiersma

That would be awesome! I guess that means a rather large change though, if the types that does not currently support logical types should also be able to support custom logical types.

However, I would be willing to take a stab at a PR for this, but I might need a bit of guidance :)

jwtryg avatar Aug 18 '24 15:08 jwtryg

Hey. You are welcome to have a go. Let me know where you get stuck and I will help if I can.

nrwiersma avatar Aug 22 '24 15:08 nrwiersma

In addition to preserving unrecognized logical types, it would also be nice to preserve unrecognized properties.

Currently, if a custom property has name that matches a defined field for some other type (for example, a property named "fields" for an array type or a property named "symbols" for a record type, etc), it gets discarded when the schema is parsed.

Similarly, types that do not have defined logical types in the main Avro spec (such as records, enums, maps, and arrays) do not preserve a "logicalType" attribute when parsed.

And another case of losing properties is with fixed types -- if they happen to use a logical type other than "decimal" that also happens to use "precision" and "scale" properties, not only is the logical type not retained but neither are the precision and scale values.

jhump avatar Oct 10 '24 14:10 jhump

My suggestion for a simple solution is that the logicalType properties are kept by removing it from the list of schemaReserved properties. Or do you have other alternatives?

FWIW, this wouldn't quite be enough. It would work to preserve the "map" logical type for arrays. But for the "timestamp-nanos" and "uuid" logical types for long and fixed[16] respectively, these get discarded at a different stage and are never in the property map to begin with.

jhump avatar Oct 10 '24 15:10 jhump

Actually, it looks like "uuid" logical type for fixed[16] is in the spec, but it's not yet implemented by this module: https://avro.apache.org/docs/1.12.0/specification/#uuid

jhump avatar Oct 10 '24 17:10 jhump

UUID is supported by this module, via encoding.TextUnmarshaler interface. There are too many packages for this to support a single one, but almost all of them support this interface.

nrwiersma avatar Oct 10 '24 19:10 nrwiersma

@nrwiersma, "UUID" is currently only supported in this repo for string types. But the spec also states that it can be used with fixed[16].

jhump avatar Oct 11 '24 01:10 jhump