AVRO-4026: [c++] Allow non-string custom attributes in schemas
Custom attributes are encoded as JSON to avoid ambiguity with simple string values.
What is the purpose of the change
Fixes exception from C++ library when compiling schemas with non-string custom attributes.
Verifying this change
This change added tests and can be verified as follows:
- Added test that compiles a schema with a non-string custom attribute
Documentation
- Does this pull request introduce a new feature? yes
- If yes, how is the feature documented? not documented
@Gerrit0 @mkmkme Do you want to review too ?
This particular case isn't much of a problem. The current behavior of the C++ library is to throw an exception if you attempt to read a schema with non-string custom attributes, so this is unlikely in the wild. This test creates the schema programmatically bypassing the reading issue. The compatibility issue with this PR is existing code that interrogates the value of a custom attribute that is a string type. With this PR, the value is JSON encoded text, so the returned value will contain a quoted string instead of the raw string in that case. There is another PR 3604 that avoids this compatibility issue, but at the cost of creating ambiguity. The API was recently changed, remaining CustomFields to CustomAttributes, signaling a breaking change. Perhaps renaming the function CustomAttributes::getAttributes to CustomAttributes::getAttributesJSON would do the same here.
On Thu, Sep 26, 2024, 7:13 AM Martin Grigorov @.***> wrote:
@.**** commented on this pull request.
In lang/c++/test/unittest.cc https://github.com/apache/avro/pull/3069#discussion_r1776855615:
@@ -457,11 +457,11 @@ struct TestSchema { std::string expectedJsonWithCustomAttribute = "{"type": "record", "name": "Test","fields": " "[{"name": "f1", "type": "long", "
"\"arrayField\": \"[1]\", ""\"booleanField\": \"true\", ""\"mapField\": \"{\\\"key1\\\":\\\"value1\\\", \\\"key2\\\":\\\"value2\\\"}\", ""\"nullField\": \"null\", ""\"numberField\": \"1.23\", "
"\"arrayField\": [1], "Looking at these changes I think this change may break someone's application. Do we want to support a backward compatible encoding via some property/setting/... ?
— Reply to this email directly, view it on GitHub https://github.com/apache/avro/pull/3069#pullrequestreview-2330889691, or unsubscribe https://github.com/notifications/unsubscribe-auth/BKKFEFOTN2DUMQW6AUKEGT3ZYPT7DAVCNFSM6AAAAABMDDFJFGVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDGMZQHA4DSNRZGE . You are receiving this because you authored the thread.Message ID: @.***>
Perhaps renaming the function CustomAttributes::getAttributes to CustomAttributes::getAttributesJSON would do the same here.
Perhaps this would be the best! In my experience (Java and Rust) the best is to keep the old behavior as deprecated and introduce a new one as the recommended. In a later version, e.g. 1.13.0, the deprecated method could be removed.
It seems that everyone involved in this discussion agrees with @jmd-z's proposal, and the PR could be merged once the change is implemented. @jmd--z, do you have time to make the small adjustment?
looking forward for this fix to be merged!
looking forward for this fix to be merged!
second this. Would make my ugly hack obsolete and not a day too soon.
Do we wait for an improvement here (deprecate the old methods and introduce new ones) ? Or the community is happy with the current solution ?
@martin-g IIUC others were waiting for the improvement. If @jmd-z doesn't have time to do this, I can do this
Help is always welcome!
Before I stumbled upon this issue (and the associated JIRA issue), I had made a patch similar to what's proposed -- but instead of just renaming methods to make it more clear that they are JSON-encoded, I had actually added new methods. I just opened a PR with that change (#3266), however I'm more than happy to close that PR if the change in this one is preferred.
Is this PR's merge being held up only for renaming the methods to have a "JSON" suffix? If we wanted to keep the old ones and just add new ones, that ends up looking a lot like the PR I just opened. I'd just need to mark the old methods deprecated in my branch.
Is this PR's merge being held up only for renaming the methods to have a "JSON" suffix? If we wanted to keep the old ones and just add new ones, that ends up looking a lot like the PR I just opened. I'd just need to mark the old methods deprecated in my branch.
I didn't compare this PR against https://github.com/apache/avro/pull/3266 but yes, the requested change here is to keep the old API as deprecated and introduce new methods which behave better.
@martin-g, I would happily see @jmd-z's change (this PR) merge, since he put the effort into it first. @jmd-z, do you plan to make these changes? It's been a few months since your last comment.
In the meantime, since we cannot push a commit to their branch to help push this forward, I have made changes in my own branch. Perhaps my PR (#3266) could be considered as an alternative if we don't hear back from @jmd-z? It deprecates all of the old string-only methods (addAttribute, getAttribute, and attributes) and introduces alternatives where the value is JSON-encoded and string values must be quoted and escaped (addAttributeJson, getAttributeJson, and jsonAttributes). It is, obviously, very similar to the change here.