Deserializing arbitrary precision numbers
Hello! I'm seeking to improve the performance of a program that reads JSON data as serde_json::Value and immediately converts it to a custom type.
In my program jaq, I create a Value type that should hold JSON values. I then parse JSON to serde_json::Value and convert them to my custom Value type. So far, so good.
Recently, I found that converting from serde_json::Value to my own Value type takes quite some time, namely roughly 10% of my total program runtime.
To avoid creating serde_json::Values and then converting them to my own Values, I have tried to implement Deserialize for my custom Value type; see playground.
That allows me to bypass serde_json::Value, except for one point: numbers.
In jaq, I use the arbitrary_precision feature of serde_json. With that, a floating-point number "1.0" becomes:
Map({"$serde_json::private::Number": String("1.0")})
(This is not visible if you run the example from the playground, because the playground does not set the arbitrary_precision feature.)
In serde_json, such number-representing maps are converted to a number using the private types KeyClassifier / KeyClass; however, these types are inaccessible to me from my program, so I cannot use them. Hardcoding the string "$serde_json::private::Number" in my program seems quite dangerous, so that's not a route I want to take.
Is there some way to deserialize JSON values, including arbitrary precision numbers, from serde_json without going through serde_json::Value?
In other words, is it possible to reimplement serde_json::Value outside of the serde_json crate (without duplicating the parsing of serde_json)?
Big integers deserialization is tricky. In my case, I knew that field is going to be a bigint, so I've overridden deserialization: I deserialize field to serde_json::Number (with arbitrary_precision feature enabled), then convert to a string (using .to_string() which yields exactly the same number as was in json), and then parse it. In that way I managed to deserialize bigint without duplicating serde_json code
In your case, you don't know structure of your data, so I'm not sure if that approach is applicable.
Hi @survived, as you suspected, your approach is indeed not applicable in my case, because both maps (e.g. {"a": 1, "b": 2}) as well as numbers both get mapped to the same visit_map call, where I cannot distinguish maps and numbers.
Write a custom wrapper type for your arb precision number and deserilize to that?
@DanielJoyce: I do not see how that could work, because at the time of deserialization, I am given only information that I cannot reliably decode (namely, the information from KeyClassifier / KeyClass).
By the way, the current approach of serde_json to encode arbitrary-precision numbers has another downside:
When we read a JSON value {"$serde_json::private::Number": "1.0"}, it gets (incorrectly) mapped to the number 1.0, whereas reading anything else, such as {"$serde_json::private": "1.0"}, it gets (correctly) mapped to {"$serde_json::private": "1.0"}.
That means that users cannot use "$serde_json::private::Number" as key in objects. This is related to #826.
Furthermore, we can craft input that is perfectly fine JSON, yet is rejected by serde_json with the arbitrary_precision feature enabled: for example, {"$serde_json::private::Number": "bla"}. I think that this should really not happen.