Improve testing
We still have a rather low test coverage on parts of the code (my fault -.-) now that we can measure it we should aim to improve it. 90%+ sounds like a good target
Yay! You found my not-so-subtle reason for adding coverage 😂
To be fair the coverage of the critical path is quite good, >90% even with the buggy coverage. The tests missing are more for 'boring/unimporant' functions and will just exist to push the number and make it look nicer.
👍 , my 100% coverage club membership expired a couple years ago... 😂
I have lots of opinions on the topic 😂
Mostly the serde bits are left
getting there! the big things that jump out at me here are:
- [ ] serde
- [ ] stage2.rs
Wondering if there's a way to get the macros out of the function definition in stage2.rs ...
Ja serde is the big thing left.
I looked into some coverage and made some small steps as I'm real new to serde. Now I found what may be a bug with this main which panics on the last unwrap
use serde::{Deserialize, Serialize};
use serde_json::Result;
use std::collections::HashMap;
#[derive(Serialize, Deserialize, PartialEq, Eq, Hash, Debug)]
struct Bar1(u8);
fn main() -> Result<()> {
let b = Bar1(3);
println!("serde: Bar1:{}", serde_json::to_string(&b)?);
println!("simd : Bar1:{}", simd_json::to_string(&b).unwrap());
let mut h = HashMap::new();
h.insert(Bar1(3), 3i8);
let h_serde_str = serde_json::to_string(&h)?;
println!("serde HashMap:{}", &h_serde_str);
let mut h_simd_str = simd_json::to_string(&h).unwrap();
println!("simd HashMap:{}", &h_simd_str);
let h_serde_de = serde_json::from_str::<HashMap<Bar1, i8>>(&h_serde_str)?;
println!("h_de{:?}", h_serde_de);
let h_simd_de = simd_json::from_str::<HashMap<Bar1, i8>>(&mut h_simd_str).unwrap();
println!("h_de{:?}", h_simd_de);
Ok(())
}
The output is
serde: Bar1:3
simd : Bar1:3
serde HashMap:{"3":3}
simd HashMap:{"3":3}
h_de{Bar1(3): 3}
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error { index: 0, character: '?', error: ExpectedUnsigned }', src/main.rs:23:79
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
[Finished running. Exit status: 101]
I think the problem is when parsing the key of an JSON object: for example {"3":3} to a HashMap<i8, i8>.
In serde-json, a MapKey struct is used to skip the '"' char if needed base on the destination type to be able to parse the key with its right type and also skip the other '"' when done.
simd-json just calls deserialize on the seed (PhantomData<i8>) which is implemented here by parsing the str to a i8 and it fails with an ExpectedSigned Error because the str still contains the '"'.
I can confirm that because if I add
self.len -= 1;
self.de.skip();
before the seed.deserialize(&mut *self.de).map(Some), this test cases passes, though this is not the right fix and break many other tests.
Sorry for the late reply, this got buried in a bunch of other notification :)
yes that's quite possible, the big difference between serde and simd-json in this regard is that simd-json uses a intermediate format to parse out datatypes already so by the time it hits the serde interop layer simd-json already decided (and validated) that the key of an object is a string.
so the skipping part won't work, we would have to do the string -> i8 parsing after the fact. On the other hand it'll be more correct (i.e. would even allow escaped numbers) but I'm also fine with saying "Due to differences in approach this kind of mapping isn't possible automatically, please implement a custom deserializer for Bar1"
You're totally right, skipping won't work and string -> i8 should be done after the fact.
I think this would be the right thing to do because when a user adds #[derive(Deserialize)] to its struct or use a struct (like HashMap) that is known to be implementing the Deserialize trait, it expects the struct to be Deserializable.
Implement a custom serializer for a user struct means digging in the internals of simd-json and it then making the string -> i8 conversion anyway.
This kind of changes will make JSON -> Tape -> user type
#[cfg_attr(not(feature = "no-inline"), inline(always))]
#[allow(clippy::cast_sign_loss)]
fn parse_i8(&mut self) -> Result<i8> {
match unsafe { self.next_() } {
Node::Static(s) => s
.as_i8()
.ok_or_else(|| Self::error(ErrorType::ExpectedSigned)),
+ Node::String(s) => s
+ .parse::<i8>()
+ .map_err(|_| Self::error(ErrorType::ExpectedSigned)),
_ => Err(Self::error(ErrorType::ExpectedSigned)),
}
}
Now, there's still the owned/borrowed Value -> user type to address
#[cfg_attr(not(feature = "no-inline"), inline(always))] #[allow(clippy::cast_sign_loss)] fn parse_i8(&mut self) -> Result<i8> { match unsafe { self.next_() } { Node::Static(s) => s .as_i8() .ok_or_else(|| Self::error(ErrorType::ExpectedSigned)), + Node::String(s) => s + .parse::<i8>() + .map_err(|_| Self::error(ErrorType::ExpectedSigned)), _ => Err(Self::error(ErrorType::ExpectedSigned)), } }
That won't work either because it'll allow to parse {"3": 3} as HashMap<i8, i8> but will also parse {"3": "3"} to this same type which is wrong.
One way to do it might be to create a MapKey type somewhat equivalent to the serde one to allow for string -> i8 parsing only for keys.
Ja making parse_i8 generically parsing strings comes with some problems, one alternative would be looking at the Map deserializer, perhaps it's possible to hook into that bit?
https://github.com/simd-lite/simd-json/blob/main/src/serde/de.rs#L295
A cutsoim map visitor could work?
Would you consider adding --ignore-tests argument to the tarpaulin CI ?
yes that makes perfect sense :+1: I didn't knew that existed :) good call!
I think there's some dead code in the serde part of this crate. Can you help identify if this is correct ?
- In
serde/value/owned/se.rs, I didn't manage to trigger any function ofMapKeySerializerother thanserialize_str. I think this is because, internally, the Object type of a Value is aHashMap<String, Value>and thus, only String keys will be serialized - In
serde/value/owned/de.rs, theValueVisitor::visit_i/u 8/16/32should be called from theparse_integerfrom serde_json but it only ever returns a i64, a u64 or a f64 as of https://github.com/serde-rs/json/blob/master/src/de.rs#L380. - Same for the borrowed values
We have a lot of tests now