AVRO-2248 Added support for Java parameterized types
This patch allows support for Java Parameterised types in Avro. Ex: Pair<K, V>
Corresponding JIRA is: https://issues.apache.org/jira/browse/AVRO-2248
Please review this and I am willing to take suggestions and make required changes to mainline this. Thanks
Adding a new fundamental type of schema is incompatible. In particular, any implementation that has not implemented this will be unable to read data that uses this feature. Inclusion of this would thus force a major release of Avro, i.e., 2.0.
Also, does this correspond to a natural language feature in all or most of the languages where Avro is implemented? For those that it does not, how should it appear? Related, in Java, tests and examples using Generic and Specific data would be useful. How is a parameterized type's data represented in Json? We should not add a fundamental type that's only really supported by Java's ReflectData.
Hi Cutting, Thanks a lot for the reply.
- Sure. I have introduced param as a type because that seemed to be a good way to approach this. We can either take this to Avro 2.0 or I can work on making it backwards compatibile - ex: modelling it on top of existing types with extra checks in code to encode/decode as type parameters.
- The changes are there primarily in GenericData - so they should work with both Specific and ReflectData. I can take up the tests for SpecificData and test the code generation as well. I can submit them in this PR.
- Reg. schema representation, it is a type "param" with a record underneath. The field types are represented like "name", "type" as shown below. This is like how Map is implemented already in the code. Ex:
class Pair<K, V> {
K key;
V value;
}
{"type":"param","values":{"type":"record","name":"PairIntegerString6f72de2f3285b50e","namespace":"org.apache.avro","fields":[{"name":"key","type":"int"},{"name":"value","type":"string"}],"java-class":"org.apache.avro.TestParamTypes$Pair"}}
Missed one case to handle - the above code only works if the class has only type parameters. I am now working on making it work for mix of both typed-params and normal supported types. Ex: class ABC<U,V> { U val1; V val2; String val3; Integer val4; }
Will validate approaches and submit the updated patch soon. Will think about solving this with existing types if possible instead of using "param" type.
Do let me know if you have any design already in mind for this. Thanks
The case mentioned above of a mix of normal typed params and parameterized types is now working. Ex: class ABC<U,V> { U val1; V val2; String val3; Integer val4; }
The test for this is also added
How can I make use of this feature? The class for which I need to generate avro schema makes use of parameterized types and I can't change that class as it is not maintained by me.
Any update on this - seems it could be useful - would help address https://github.com/apache/pulsar/issues/3115 Thanks in advance!
Has this feature been fully integrated, and if so from what version?
We need this so badly for the java pulsar client to make it support generics! https://github.com/apache/pulsar/issues/3115
Here is my personal opinion as a non-Java user:
The PR introduces a new schema type (Param) only in the Java SDK without trying to explain this new type in the Avro specification in a language agnostic way. Some Avro SDKs/languages do not support generics at all, e.g. C, Perl, JavaScript, ... and this will make this new type an exception! But let's ignore that and continue with the review:
- The PR does not even give an example of the new schema! The only snippet I see is at https://github.com/apache/avro/pull/356#issuecomment-433292283 but IMO it is not very clear. IMO new tests should be added that cover parsing of this new schema type, encoding (binary and JSON)
- Any attempt to implement the same in a second SDK just to prove that the implementation is good and interop-able will make it easier to approve such new addition
- Looking at
{"type":"param","values":{"type":"record","name":"PairIntegerString6f72de2f3285b50e","namespace":"org.apache.avro","fields":[{"name":"key","type":"int"},{"name":"value","type":"string"}],"java-class":"org.apache.avro.TestParamTypes$Pair"}}I have the feeling that there is no need of a new Schema type at all. It seems like RecordSchema is almost capable to cover the need.
My 2c!