azure-sdk-for-java icon indicating copy to clipboard operation
azure-sdk-for-java copied to clipboard

[azure-spring-data-cosmos] Turn java.math.Big{Decimal,Integer} into simple types

Open Xiphoseer opened this issue 1 year ago • 23 comments

Description

This PR adds java.math.BigDecimal and java.math.BigInteger to the set of CosmosSimpleTypes, thus extending #31417 to those types. A "simple type" in the context of spring-data-commons is one that will "not [be] recursively analyzed" (c.f. AbstractMappingContext#setSimpleTypeHolder).

Using one of these types in a POJO in Java 17 currently results in a InaccessibleObjectException as described in #36831. While the workaround mentioned there to use --add-opens works, I believe that is the wrong approach. Adding that flag will prevent spring-data-commons from causing the error by using reflection to find properties, but that doesn't mean that Jackson / the ObjectMapper will treat it as a nested POJO.

Quite the opposite actually: Jackson has a special DecimalNode for BigDecimal and several configuration options on how it is serialized to and serialized from JSON. Thus, it's very much treated as an atomic type by Jackson, and by extension this library (azure-spring-data-cosmos) and should be a "simple type" in the context of the MappingContext. My understanding is that being a simple type does not prevent converters being registered for it, so this is orthogonal to #38691 and any kind of default conversion to JSON primitives for the type.

As for the question of whether this is a bug in spring-data-commons: I do believe that Spring Data Commons should default to a behavior that does not try to use reflection on java.* classes that do not have public fields. On the other hand, it is an abstraction layer over multiple different implementations and only those can reasonably opt-in to a type being both simple and supported based on the underlying technology. Not all Spring Data Projects use a JSON / NoSQL datamodel.

The default of the SimpleTypeHandler API is a set, which elements can't be excluded from without re-doing the whole set. So my conclusion would be that it's possible to commit to BigDecimal and BigInteger being simple for azure-spring-data-cosmos today, while it's not nearly as clear for upstream Spring Data Commons. Removing them from CosmosSimpleTypes after they've been added upstream is also not a breaking change.

The / one relevant bug on spring-data-commons is https://github.com/spring-projects/spring-data-commons/issues/2844, but with neither accepted, it's been causing issues for us in the services we work with. In our case, the source data model we want to import into CosmosDB is based on an XML Schema using xs:decimal, so the natural java type to use as per JAXB is java.math.BigDecimal.

All SDK Contribution checklist:

  • [x] The pull request does not introduce [breaking changes]
  • [x] CHANGELOG is updated for new features, bug fixes or other significant changes.
  • [x] I have read the contribution guidelines.

General Guidelines and Best Practices

  • [x] Title of the pull request is clear and informative.
  • [x] There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

  • [x] Pull request includes test coverage for the included changes.

Xiphoseer avatar May 17 '24 15:05 Xiphoseer

Thank you for your contribution @Xiphoseer! We will review the pull request and get back to you soon.

github-actions[bot] avatar May 17 '24 15:05 github-actions[bot]

@saragluna can you please review? Any issues that would arise from adding these types?

trande4884 avatar May 21 '24 15:05 trande4884

@Xiphoseer can you also add some tests that actually write/read to a database to ensure we are getting out what we put in? Thanks!

trande4884 avatar May 21 '24 16:05 trande4884

@microsoft-github-policy-service agree company="PRODYNA"

Xiphoseer avatar May 21 '24 18:05 Xiphoseer

@Xiphoseer can you also add some tests that actually write/read to a database to ensure we are getting out what we put in? Thanks!

I tried this and would appreciate some input on it. First, as far as CONTRIBUTING goes, I tried to run IT tests against the emulator but hit rate limiting even when explicitly starting it with the flag to disable that. Secondly, the Azure Portal wizard for "Azure Cosmos DB" now has a bunch more options to pick from (Core vs MongoDB, Serverless, Rate Limiting) which can default to values incompatible with the IT suite, 1000 RU limit is one example. It would be helpful to update CONTRIBUTING for that.

I guess the tricky part is to get this right without introducing breaking changes. My current understanding is that with or without the SimpleType changes, Jackson will special case BigDecimal and store it as a number, but I'd like to double-check that again. This means that https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/modeling-data#numbers-in-json applies which talks about the fact that number will be "stored as JSON". In practice that seems to mean that JSON numbers are stored as IEEE 754 binary64 in the backend, not only some clients, so there is some loss of precision involved.

In particular, BigDecimal keeps around trailing zeroes after the decimal point (e.g. 10.50) via its scale but the roundtrip through cosmos turns that into the mathematically equivalent 10.5, which doesn't compare as equals in the context of BigDecimal.

The solution to that would be to serialize the numbers as string, which is what spring-data-mongodb does via a converter, can also be achived locally using @JsonFormat(shape = Shape.STRING) or could be done in the object mapper via feature SERIALIZE_NUMBERS_AS_STRING (probably way too broad) or an AnnotationIntrospector (bit of a misnomer as per Jackson docs) for all BigDecimal types.

What I'm unsure about is whether that last point is

  • orthogonal to this PR because it's not using field reflection either way
  • possible to configure in the object mapper without breaking any users on java pre-16/17 or having added --add-opens as per the linked issues.

Xiphoseer avatar May 22 '24 06:05 Xiphoseer

/azp run java - cosmos - tests

saragluna avatar May 22 '24 08:05 saragluna

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar May 22 '24 08:05 azure-pipelines[bot]

@Xiphoseer can you also add some tests that actually write/read to a database to ensure we are getting out what we put in? Thanks!

I tried this and would appreciate some input on it. First, as far as CONTRIBUTING goes, I tried to run IT tests against the emulator but hit rate limiting even when explicitly starting it with the flag to disable that. Secondly, the Azure Portal wizard for "Azure Cosmos DB" now has a bunch more options to pick from (Core vs MongoDB, Serverless, Rate Limiting) which can default to values incompatible with the IT suite, 1000 RU limit is one example. It would be helpful to update CONTRIBUTING for that.

I guess the tricky part is to get this right without introducing breaking changes. My current understanding is that with or without the SimpleType changes, Jackson will special case BigDecimal and store it as a number, but I'd like to double-check that again. This means that https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/modeling-data#numbers-in-json applies which talks about the fact that number will be "stored as JSON". In practice that seems to mean that JSON numbers are stored as IEEE 754 binary64 in the backend, not only some clients, so there is some loss of precision involved.

In particular, BigDecimal keeps around trailing zeroes after the decimal point (e.g. 10.50) via its scale but the roundtrip through cosmos turns that into the mathematically equivalent 10.5, which doesn't compare as equals in the context of BigDecimal.

The solution to that would be to serialize the numbers as string, which is what spring-data-mongodb does via a converter, can also be achived locally using @JsonFormat(shape = Shape.STRING) or could be done in the object mapper via feature SERIALIZE_NUMBERS_AS_STRING (probably way too broad) or an AnnotationIntrospector (bit of a misnomer as per Jackson docs) for all BigDecimal types.

What I'm unsure about is whether that last point is

  • orthogonal to this PR because it's not using field reflection either way
  • possible to configure in the object mapper without breaking any users on java pre-16/17 or having added --add-opens as per the linked issues.

I've triggered the IT pipeline. This is also my concern, since if a user already store their BigDecimal in the Cosmos DB using another version client, can the number be read using this version.

saragluna avatar May 22 '24 08:05 saragluna

My current understanding is that with or without the SimpleType changes, Jackson will special case BigDecimal and store it as a number, but I'd like to double-check that again.

Ran my test case in MappingCosmosConverterUnitTest without the simple type changes, which, as expected, required adding --add-opens java.base/java.math=spring.core to <javaModulesSurefireArgLine> in pom.xml and it still passes.

I have to say I'm a bit surprised by the roundtrip to string and back in https://github.com/Azure/azure-sdk-for-java/blob/ba810b442781bb77731c2bacea08d78b2ba59591/sdk/spring/azure-spring-data-cosmos/src/main/java/com/azure/spring/data/cosmos/core/convert/MappingCosmosConverter.java#L134-L135 because that's what actually looses the trailing zeroes before even reaching the backend.

It seems to have been introduced by https://github.com/Azure/azure-sdk-for-java/commit/61b37c98459b2290ed13b1ddc83035b0cd39069a, which mentions no reason to choose that method over ObjectMapper#valueToTree, but even that one does a full round-trip to JSON primitives and does not have the DecimalNode precision logic enabled.

Xiphoseer avatar May 22 '24 09:05 Xiphoseer

I've triggered the IT pipeline.

Thanks!

This is also my concern, since if a user already store their BigDecimal in the Cosmos DB using another version client, can the number be read using this version.

After the testing mentioned above, I'm fairly confident that the SimpleType changes will indeed not affect the actual serialization / deserialization.

Edit: quite the opposite actually. The existing implementation already looses scale/precison without user intervention and this PR neither solves nor worsens that.

Xiphoseer avatar May 22 '24 10:05 Xiphoseer

Thanks @saragluna for the PR review, appreciate your help! @trande4884 can you please run the spring integration tests on this pipeline, thanks!

kushagraThapar avatar May 22 '24 15:05 kushagraThapar

/azp run java - spring - tests

trande4884 avatar May 22 '24 17:05 trande4884

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar May 22 '24 17:05 azure-pipelines[bot]

@Xiphoseer since you are having issues with IT tests if you give me access I can push some tests to this PR

trande4884 avatar May 22 '24 19:05 trande4884

@Xiphoseer since you are having issues with IT tests if you give me access I can push some tests to this PR

The "allow edits by maintainers" is on for this PR, but presumably that's not sufficient for some reason and you need write access to the fork?

Xiphoseer avatar May 22 '24 20:05 Xiphoseer

@Xiphoseer I just pushed the integration tests, let me re-run the CI's

trande4884 avatar May 24 '24 19:05 trande4884

/azp run java - spring - tests

trande4884 avatar May 24 '24 19:05 trande4884

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar May 24 '24 19:05 azure-pipelines[bot]

/azp run java - cosmos - tests

trande4884 avatar May 24 '24 19:05 trande4884

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar May 24 '24 19:05 azure-pipelines[bot]

@Xiphoseer apologies that I have not followed up on this sooner. There are now merge conflicts, if you could please resolve I will re-run the IT tests.

trande4884 avatar Jul 01 '24 14:07 trande4884

@Xiphoseer apologies that I have not followed up on this sooner. There are now merge conflicts, if you could please resolve I will re-run the IT tests.

Just as a heads up: I'll probably not find time for this before next week.

Xiphoseer avatar Jul 01 '24 19:07 Xiphoseer

@Xiphoseer bump on this :)

trande4884 avatar Aug 27 '24 15:08 trande4884

Hi @Xiphoseer. Thank you for your interest in helping to improve the Azure SDK experience and for your contribution. We've noticed that there hasn't been recent engagement on this pull request. If this is still an active work stream, please let us know by pushing some changes or leaving a comment. Otherwise, we'll close this out in 7 days.

github-actions[bot] avatar Nov 01 '24 05:11 github-actions[bot]

/azp run java - cosmos - tests

trande4884 avatar Nov 04 '24 20:11 trande4884

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar Nov 04 '24 20:11 azure-pipelines[bot]

API change check

API changes are not detected in this pull request.

azure-sdk avatar Nov 04 '24 21:11 azure-sdk

/azp run java - spring - tests

trande4884 avatar Nov 05 '24 19:11 trande4884

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar Nov 05 '24 20:11 azure-pipelines[bot]

/check-enforcer override

trande4884 avatar Nov 06 '24 16:11 trande4884