avro icon indicating copy to clipboard operation
avro copied to clipboard

AVRO-3781: [Rust] Enhance Decimal resolve

Open ftelnov opened this issue 2 years ago • 10 comments

This PR solves issue.

What was done:

  1. Now decimals could be resolved from strings and floats;
  2. Add special algorithm for string parsing into appropriate bigint;
  3. Floats parsing is done via converting to string first, then to the bigint through the string parsing algorithm.
  4. Add method for estimating the precision of the decimal - in other words, added a method to check the amount of bigint digits.

ftelnov avatar Jun 20 '23 10:06 ftelnov

The other day I started reviewing this PR but then I stopped for the following reasons:

* @RyanSkraba said that he is going to release 1.11.2 in the coming days and I didn't want to make such changes in the last moment

* then @clesaec suggested [AVRO-3779: using rust bigdecimal #2302](https://github.com/apache/avro/pull/2302) - using a proper BigDecimal type as a backing type is something that I wanted to explore some day. With it it is much easier to parse from and format to different types (String, floating number, byte array). The drawback here is that this is something new (to be added in the spec) and it may not work for someone who wants to use Avro SDKs which does not support it yet

I think we must switch to BigDecimal anyway. It wraps BigInt and that is what we need to use and not to write ourselves. Therefore, you can freely close this PR, and I'll get with another issue of rewriting decimal to bigdecimal usage, if that is what you would prefer.

I thought @clesaec suggested adding new schema type for some purpose - which is not good in my opinion. His idea to include serialization to the resulted decimal is, indeed, pretty good. It's a complete pain holding schema every time we want to operate with decimal - as without schema you can't understand what is the scale and precision. Maybe there was some idea behind such a decision. However, as you mentioned, it would require spec update.

ftelnov avatar Jun 23 '23 11:06 ftelnov

I would be happy to hear the decision of the maintainers about Decimal's future - and start implementing that future :)

ftelnov avatar Jun 23 '23 12:06 ftelnov

Hello, @martin-g, please, provide an information what is the future of rust decimals and how can I participate in it

ftelnov avatar Jul 11 '23 22:07 ftelnov

@ftelnov Do you think you could improve the current support for decimal based on bigdecimal/rustdecimal crates ? This would be a good approach, I think !

martin-g avatar Jul 12 '23 09:07 martin-g

I think I can move decimals to bigdecimal usage - indeed, that would be an easy task, as they are bigint internally. So it's almost like drop-in replacement. What I think is that it is generally a good idea - we have bunch of other crates, like malachite for bigdecimals, but they have restrictive licenses usually. bigdecimal, on the other hand, is available and does what we trying to do by hands - like it supports parsing string/f32/f64 to bigdecimals, AND you can construct BigInt(for example, from bytes) and use it with scale to construct bigdecimal.

So, I'd happy to try!

ftelnov avatar Jul 18 '23 07:07 ftelnov

FYI, the crate dashu created by me supports big ints (through type dashu::Integer) and big decimals (through type dashu::Decimal) with arbitrary precisions. These types also have complete parsing capabilities.

cmpute avatar Oct 24 '23 09:10 cmpute

Thank you, @cmpute ! With https://github.com/apache/avro/commit/c91b8876b1c21c93ee9021abe6d0d04cf1f50a32 we introduced usage of https://crates.io/crates/bigdecimal There is no release for this yet, so we are open to replace it if your crate is better (performance, maintainance, nicer API, etc.)!

martin-g avatar Oct 24 '23 09:10 martin-g

Thanks for your interest! Some of the benefits of choosing dashu over bigdecimal:

  1. dashu is faster: benchmark of exp indicates that dashu::Decimal calculates exp(1) with 100 digits in 10ms while bigdecimal::BigDecimal spends 15ms (on my laptop). You can try the benchmark here.
  2. dashu has thorough context and rounding support: dashu supports arbitrary precision limit at runtime, while bigdecimal only partially support setting precision at compile time. The correctness of bigdecimal with higher precision is not well tested. Dashu supports rounding mode and precision selection for each operation, while bigdecimal's context type only allows you to select them on the very basic operation (addition) yet.
  3. dashu supports arbitrary float base: dashu has decimal float type dashu::Decimal, it also has the binary float type dashu::Real. The decimal type can be converted from and to the binary type efficiently with rounding mode with the method with_base and with_base_and_precision. Therefore you can have a complete control when you want to convert f32/f64/other binary floats to decimal outputs (you can specify the precision or the rounding mode you want)
  4. dashu has seamless and opaque support for conversion between big integers and big decimals through from_parts and into_repr. After all, dashu is a collection of arbitrary precision integers, rationals and floats.

I can't say that dashu has a better API right now, but I'm working on a guide book in the next release.

cmpute avatar Oct 28 '23 01:10 cmpute

I think I can move decimals to bigdecimal usage - indeed, that would be an easy task, as they are bigint internally. So it's almost like drop-in replacement.

@ftelnov @martin-g Is it still planned?

It's not an easy task to work with floating point decimal values with current API. Even in the readme examples, there is only an example of working with integer values. And there is no way to represent value (for reader/writer) with a schema like this (please, correct me if I'm wrong):

{
  "name": "decimal",
  "type": "bytes",
  "logicalType": "decimal",
  "scale": 8,
  "precision": 28
}

All of this can be fixed with an internal representation of Value::Decimal as bigdecimal::BigDecimal. If there is a performance issue, I think it can be fixed by compile-time feature (decimal_as_bigdecimal?).

If these changes are ok, I can work on this and send a PR.

Gordon-F avatar Aug 05 '24 23:08 Gordon-F

@Gordon-F please send a PR!

martin-g avatar Aug 06 '24 04:08 martin-g