avro icon indicating copy to clipboard operation
avro copied to clipboard

AVRO-4026: [c++] Allow non-string custom attributes in schemas

Open jmd-z opened this issue 1 year ago • 12 comments

Custom attributes are encoded as JSON to avoid ambiguity with simple string values.

What is the purpose of the change

Fixes exception from C++ library when compiling schemas with non-string custom attributes.

Verifying this change

This change added tests and can be verified as follows:

  • Added test that compiles a schema with a non-string custom attribute

Documentation

  • Does this pull request introduce a new feature? yes
  • If yes, how is the feature documented? not documented

jmd-z avatar Aug 06 '24 21:08 jmd-z

@Gerrit0 @mkmkme Do you want to review too ?

martin-g avatar Sep 26 '24 11:09 martin-g

This particular case isn't much of a problem. The current behavior of the C++ library is to throw an exception if you attempt to read a schema with non-string custom attributes, so this is unlikely in the wild. This test creates the schema programmatically bypassing the reading issue. The compatibility issue with this PR is existing code that interrogates the value of a custom attribute that is a string type. With this PR, the value is JSON encoded text, so the returned value will contain a quoted string instead of the raw string in that case. There is another PR 3604 that avoids this compatibility issue, but at the cost of creating ambiguity. The API was recently changed, remaining CustomFields to CustomAttributes, signaling a breaking change. Perhaps renaming the function CustomAttributes::getAttributes to CustomAttributes::getAttributesJSON would do the same here.

On Thu, Sep 26, 2024, 7:13 AM Martin Grigorov @.***> wrote:

@.**** commented on this pull request.

In lang/c++/test/unittest.cc https://github.com/apache/avro/pull/3069#discussion_r1776855615:

@@ -457,11 +457,11 @@ struct TestSchema { std::string expectedJsonWithCustomAttribute = "{"type": "record", "name": "Test","fields": " "[{"name": "f1", "type": "long", "

  •        "\"arrayField\": \"[1]\", "
    
  •        "\"booleanField\": \"true\", "
    
  •        "\"mapField\": \"{\\\"key1\\\":\\\"value1\\\", \\\"key2\\\":\\\"value2\\\"}\", "
    
  •        "\"nullField\": \"null\", "
    
  •        "\"numberField\": \"1.23\", "
    
  •        "\"arrayField\": [1], "
    

Looking at these changes I think this change may break someone's application. Do we want to support a backward compatible encoding via some property/setting/... ?

— Reply to this email directly, view it on GitHub https://github.com/apache/avro/pull/3069#pullrequestreview-2330889691, or unsubscribe https://github.com/notifications/unsubscribe-auth/BKKFEFOTN2DUMQW6AUKEGT3ZYPT7DAVCNFSM6AAAAABMDDFJFGVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDGMZQHA4DSNRZGE . You are receiving this because you authored the thread.Message ID: @.***>

jmd-z avatar Sep 26 '24 15:09 jmd-z

Perhaps renaming the function CustomAttributes::getAttributes to CustomAttributes::getAttributesJSON would do the same here.

Perhaps this would be the best! In my experience (Java and Rust) the best is to keep the old behavior as deprecated and introduce a new one as the recommended. In a later version, e.g. 1.13.0, the deprecated method could be removed.

martin-g avatar Sep 27 '24 07:09 martin-g

It seems that everyone involved in this discussion agrees with @jmd-z's proposal, and the PR could be merged once the change is implemented. @jmd--z, do you have time to make the small adjustment?

pascalginter avatar Oct 11 '24 15:10 pascalginter

looking forward for this fix to be merged!

johd01 avatar Nov 19 '24 14:11 johd01

looking forward for this fix to be merged!

second this. Would make my ugly hack obsolete and not a day too soon.

erikgustavalm avatar Nov 20 '24 09:11 erikgustavalm

Do we wait for an improvement here (deprecate the old methods and introduce new ones) ? Or the community is happy with the current solution ?

martin-g avatar Nov 20 '24 10:11 martin-g

@martin-g IIUC others were waiting for the improvement. If @jmd-z doesn't have time to do this, I can do this

mkmkme avatar Nov 25 '24 14:11 mkmkme

Help is always welcome!

martin-g avatar Nov 25 '24 16:11 martin-g

Before I stumbled upon this issue (and the associated JIRA issue), I had made a patch similar to what's proposed -- but instead of just renaming methods to make it more clear that they are JSON-encoded, I had actually added new methods. I just opened a PR with that change (#3266), however I'm more than happy to close that PR if the change in this one is preferred.

Is this PR's merge being held up only for renaming the methods to have a "JSON" suffix? If we wanted to keep the old ones and just add new ones, that ends up looking a lot like the PR I just opened. I'd just need to mark the old methods deprecated in my branch.

jhump avatar Dec 16 '24 21:12 jhump

Is this PR's merge being held up only for renaming the methods to have a "JSON" suffix? If we wanted to keep the old ones and just add new ones, that ends up looking a lot like the PR I just opened. I'd just need to mark the old methods deprecated in my branch.

I didn't compare this PR against https://github.com/apache/avro/pull/3266 but yes, the requested change here is to keep the old API as deprecated and introduce new methods which behave better.

martin-g avatar Dec 17 '24 09:12 martin-g

@martin-g, I would happily see @jmd-z's change (this PR) merge, since he put the effort into it first. @jmd-z, do you plan to make these changes? It's been a few months since your last comment.

In the meantime, since we cannot push a commit to their branch to help push this forward, I have made changes in my own branch. Perhaps my PR (#3266) could be considered as an alternative if we don't hear back from @jmd-z? It deprecates all of the old string-only methods (addAttribute, getAttribute, and attributes) and introduces alternatives where the value is JSON-encoded and string values must be quoted and escaped (addAttributeJson, getAttributeJson, and jsonAttributes). It is, obviously, very similar to the change here.

jhump avatar Dec 17 '24 16:12 jhump