simd-json icon indicating copy to clipboard operation
simd-json copied to clipboard

Improve testing

Open Licenser opened this issue 6 years ago • 16 comments

We still have a rather low test coverage on parts of the code (my fault -.-) now that we can measure it we should aim to improve it. 90%+ sounds like a good target

Licenser avatar Oct 03 '19 18:10 Licenser

Yay! You found my not-so-subtle reason for adding coverage 😂

sunnygleason avatar Oct 03 '19 19:10 sunnygleason

To be fair the coverage of the critical path is quite good, >90% even with the buggy coverage. The tests missing are more for 'boring/unimporant' functions and will just exist to push the number and make it look nicer.

Licenser avatar Oct 04 '19 16:10 Licenser

👍 , my 100% coverage club membership expired a couple years ago... 😂

sunnygleason avatar Oct 04 '19 16:10 sunnygleason

I have lots of opinions on the topic 😂

Licenser avatar Oct 04 '19 16:10 Licenser

Mostly the serde bits are left

Licenser avatar Oct 07 '19 17:10 Licenser

getting there! the big things that jump out at me here are:

  • [ ] serde
  • [ ] stage2.rs

Wondering if there's a way to get the macros out of the function definition in stage2.rs ...

sunnygleason avatar Dec 01 '19 18:12 sunnygleason

Ja serde is the big thing left.

Licenser avatar Dec 01 '19 19:12 Licenser

I looked into some coverage and made some small steps as I'm real new to serde. Now I found what may be a bug with this main which panics on the last unwrap

use serde::{Deserialize, Serialize};
use serde_json::Result;
use std::collections::HashMap;

#[derive(Serialize, Deserialize, PartialEq, Eq, Hash, Debug)]
struct Bar1(u8);

fn main() -> Result<()> {
    let b = Bar1(3);
    println!("serde: Bar1:{}", serde_json::to_string(&b)?);
    println!("simd : Bar1:{}", simd_json::to_string(&b).unwrap());

    let mut h = HashMap::new();
    h.insert(Bar1(3), 3i8);
    let h_serde_str = serde_json::to_string(&h)?;
    println!("serde HashMap:{}", &h_serde_str);
    let mut h_simd_str = simd_json::to_string(&h).unwrap();
    println!("simd  HashMap:{}", &h_simd_str);

    let h_serde_de = serde_json::from_str::<HashMap<Bar1, i8>>(&h_serde_str)?;
    println!("h_de{:?}", h_serde_de);

    let h_simd_de = simd_json::from_str::<HashMap<Bar1, i8>>(&mut h_simd_str).unwrap();
    println!("h_de{:?}", h_simd_de);

    Ok(())
}

The output is

serde: Bar1:3
simd : Bar1:3
serde HashMap:{"3":3}
simd  HashMap:{"3":3}
h_de{Bar1(3): 3}
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error { index: 0, character: '?', error: ExpectedUnsigned }', src/main.rs:23:79
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
[Finished running. Exit status: 101]

dvermd avatar Oct 22 '21 21:10 dvermd

I think the problem is when parsing the key of an JSON object: for example {"3":3} to a HashMap<i8, i8>.

In serde-json, a MapKey struct is used to skip the '"' char if needed base on the destination type to be able to parse the key with its right type and also skip the other '"' when done.

simd-json just calls deserialize on the seed (PhantomData<i8>) which is implemented here by parsing the str to a i8 and it fails with an ExpectedSigned Error because the str still contains the '"'.

I can confirm that because if I add

            self.len -= 1;
            self.de.skip();

before the seed.deserialize(&mut *self.de).map(Some), this test cases passes, though this is not the right fix and break many other tests.

dvermd avatar Oct 25 '21 05:10 dvermd

Sorry for the late reply, this got buried in a bunch of other notification :)

yes that's quite possible, the big difference between serde and simd-json in this regard is that simd-json uses a intermediate format to parse out datatypes already so by the time it hits the serde interop layer simd-json already decided (and validated) that the key of an object is a string.

so the skipping part won't work, we would have to do the string -> i8 parsing after the fact. On the other hand it'll be more correct (i.e. would even allow escaped numbers) but I'm also fine with saying "Due to differences in approach this kind of mapping isn't possible automatically, please implement a custom deserializer for Bar1"

Licenser avatar Oct 25 '21 09:10 Licenser

You're totally right, skipping won't work and string -> i8 should be done after the fact.

I think this would be the right thing to do because when a user adds #[derive(Deserialize)] to its struct or use a struct (like HashMap) that is known to be implementing the Deserialize trait, it expects the struct to be Deserializable.

Implement a custom serializer for a user struct means digging in the internals of simd-json and it then making the string -> i8 conversion anyway.

This kind of changes will make JSON -> Tape -> user type

     #[cfg_attr(not(feature = "no-inline"), inline(always))]
     #[allow(clippy::cast_sign_loss)]
     fn parse_i8(&mut self) -> Result<i8> {
         match unsafe { self.next_() } {
             Node::Static(s) => s
                 .as_i8()
                 .ok_or_else(|| Self::error(ErrorType::ExpectedSigned)),
+            Node::String(s) => s
+                .parse::<i8>()
+                .map_err(|_| Self::error(ErrorType::ExpectedSigned)),
             _ => Err(Self::error(ErrorType::ExpectedSigned)),
         }
     }

Now, there's still the owned/borrowed Value -> user type to address

dvermd avatar Oct 26 '21 06:10 dvermd

     #[cfg_attr(not(feature = "no-inline"), inline(always))]
     #[allow(clippy::cast_sign_loss)]
     fn parse_i8(&mut self) -> Result<i8> {
         match unsafe { self.next_() } {
             Node::Static(s) => s
                 .as_i8()
                 .ok_or_else(|| Self::error(ErrorType::ExpectedSigned)),
+            Node::String(s) => s
+                .parse::<i8>()
+                .map_err(|_| Self::error(ErrorType::ExpectedSigned)),
             _ => Err(Self::error(ErrorType::ExpectedSigned)),
         }
     }

That won't work either because it'll allow to parse {"3": 3} as HashMap<i8, i8> but will also parse {"3": "3"} to this same type which is wrong.

One way to do it might be to create a MapKey type somewhat equivalent to the serde one to allow for string -> i8 parsing only for keys.

dvermd avatar Oct 27 '21 07:10 dvermd

Ja making parse_i8 generically parsing strings comes with some problems, one alternative would be looking at the Map deserializer, perhaps it's possible to hook into that bit?

https://github.com/simd-lite/simd-json/blob/main/src/serde/de.rs#L295

A cutsoim map visitor could work?

Licenser avatar Oct 27 '21 08:10 Licenser

Would you consider adding --ignore-tests argument to the tarpaulin CI ?

dvermd avatar Oct 29 '21 10:10 dvermd

yes that makes perfect sense :+1: I didn't knew that existed :) good call!

Licenser avatar Oct 29 '21 10:10 Licenser

I think there's some dead code in the serde part of this crate. Can you help identify if this is correct ?

  1. In serde/value/owned/se.rs, I didn't manage to trigger any function of MapKeySerializer other than serialize_str. I think this is because, internally, the Object type of a Value is a HashMap<String, Value> and thus, only String keys will be serialized
  2. In serde/value/owned/de.rs, the ValueVisitor::visit_i/u 8/16/32 should be called from the parse_integer from serde_json but it only ever returns a i64, a u64 or a f64 as of https://github.com/serde-rs/json/blob/master/src/de.rs#L380.
  3. Same for the borrowed values

dvermd avatar Nov 01 '21 06:11 dvermd

We have a lot of tests now

Licenser avatar Oct 09 '23 12:10 Licenser