json icon indicating copy to clipboard operation
json copied to clipboard

Deserializing arbitrary precision numbers

Open 01mf02 opened this issue 3 years ago • 5 comments

Hello! I'm seeking to improve the performance of a program that reads JSON data as serde_json::Value and immediately converts it to a custom type.

In my program jaq, I create a Value type that should hold JSON values. I then parse JSON to serde_json::Value and convert them to my custom Value type. So far, so good.

Recently, I found that converting from serde_json::Value to my own Value type takes quite some time, namely roughly 10% of my total program runtime.

To avoid creating serde_json::Values and then converting them to my own Values, I have tried to implement Deserialize for my custom Value type; see playground. That allows me to bypass serde_json::Value, except for one point: numbers. In jaq, I use the arbitrary_precision feature of serde_json. With that, a floating-point number "1.0" becomes:

Map({"$serde_json::private::Number": String("1.0")})

(This is not visible if you run the example from the playground, because the playground does not set the arbitrary_precision feature.)

In serde_json, such number-representing maps are converted to a number using the private types KeyClassifier / KeyClass; however, these types are inaccessible to me from my program, so I cannot use them. Hardcoding the string "$serde_json::private::Number" in my program seems quite dangerous, so that's not a route I want to take.

Is there some way to deserialize JSON values, including arbitrary precision numbers, from serde_json without going through serde_json::Value?

In other words, is it possible to reimplement serde_json::Value outside of the serde_json crate (without duplicating the parsing of serde_json)?

01mf02 avatar Jun 07 '22 15:06 01mf02

Big integers deserialization is tricky. In my case, I knew that field is going to be a bigint, so I've overridden deserialization: I deserialize field to serde_json::Number (with arbitrary_precision feature enabled), then convert to a string (using .to_string() which yields exactly the same number as was in json), and then parse it. In that way I managed to deserialize bigint without duplicating serde_json code

In your case, you don't know structure of your data, so I'm not sure if that approach is applicable.

survived avatar Jul 18 '22 12:07 survived

Hi @survived, as you suspected, your approach is indeed not applicable in my case, because both maps (e.g. {"a": 1, "b": 2}) as well as numbers both get mapped to the same visit_map call, where I cannot distinguish maps and numbers.

01mf02 avatar Jul 22 '22 10:07 01mf02

Write a custom wrapper type for your arb precision number and deserilize to that?

DanielJoyce avatar Aug 08 '22 20:08 DanielJoyce

@DanielJoyce: I do not see how that could work, because at the time of deserialization, I am given only information that I cannot reliably decode (namely, the information from KeyClassifier / KeyClass).

01mf02 avatar Aug 10 '22 07:08 01mf02

By the way, the current approach of serde_json to encode arbitrary-precision numbers has another downside: When we read a JSON value {"$serde_json::private::Number": "1.0"}, it gets (incorrectly) mapped to the number 1.0, whereas reading anything else, such as {"$serde_json::private": "1.0"}, it gets (correctly) mapped to {"$serde_json::private": "1.0"}. That means that users cannot use "$serde_json::private::Number" as key in objects. This is related to #826.

Furthermore, we can craft input that is perfectly fine JSON, yet is rejected by serde_json with the arbitrary_precision feature enabled: for example, {"$serde_json::private::Number": "bla"}. I think that this should really not happen.

01mf02 avatar Nov 09 '22 19:11 01mf02