bincode More helpful error messages based on `source()` chains

The current error API does not include a path to the type that failed to encode/decode; if the user sees DecodeError::Utf8 for example they would have to just guess where it came from. It gets even worse if the error is used in another function, because they might not even be able to tell that it came from deserializing Bincode as the error’s Display impl doesn’t explain what happened. I think the goal is to have errors that look like this, for a UTF-8 error in this struct:

#[derive(Decode)]
struct Foo {
    bar: String,
}

Error: failed to deserialize Bincode

Caused by:
   1: failed to decode struct `Foo`
   2: failed to decode field `bar`
   3: failed to decode string
   4: invalid utf-8 sequence of 2 bytes from index 8

This is much more helpful than just the last line.

Another problem is that the current error types are pretty ad-hoc and makes all third-party types harder to use than first-party ones.

To solve these problems, the traits need to be modified first to enable types to give their own errors. We could also box the errors here, but that’s a lot of boxing overall so I’m avoiding it.

trait Decode {
    type Error: 'static + Send + Sync + Error;
    fn decode<D: Decoder>(decoder: &mut D) -> Result<Self, Self::Error>;
}

As an example of how this would work for container types, the tuple implementation would look a bit like this (obviously would be generated with a macro):

const _: () = {
    impl<A, B> Decode for (A, B)
    where
        A: Decode,
        B: Decode,
    {
        type Error = Error<A::Error, B::Error>;
        fn decode<D: Decoder>(decoder: &mut D) -> Result<Self, Self::Error> {
            Ok((
                A::Decode(decoder).map_err(|e| Error::A(AError(e)))?,
                B::Decode(decoder).map_err(|e| Error::B(BError(e)))?,
            ))
        }
    }

    // intentionally unnameable and unique for this type
    #[derive(Debug)]
    pub enum Error<A, B> {
        A(AError<A>),
        B(BError<B>),
    }

    impl<A, B> Display for Error<A, B> {
        fn fmt(&self, f: &mut Formatter<'_>) -> fmt::Result {
            f.write_str("failed to decode 2-length tuple")
        }
    }

    impl<A, B> std::error::Error for Error<A, B>
    where
        A: 'static + std::error::Error,
        B: 'static + std::error::Error,
    {
        fn source(&self) -> Option<&(dyn 'static + std::error::Error)> {
            match self {
                Self::A(e) => Some(e),
                Self::B(e) => Some(e),
            }
        }
    }

    #[derive(Debug)]
    pub struct AError<E>(E);

    impl<E> Display for AError<E> {
        fn fmt(&self, f: &mut Formatter<'_>) -> fmt::Result {
            f.write_str("failed to decode element 0")
        }
    }

    impl<E> std::error::Error for AError<E>
    where
        E: 'static + std::error::Error,
    {
        fn source(&self) -> Option<&(dyn 'static + std::error::Error)> {
            Some(&self.0)
        }
    }

    // likewise for BError
    // `AError` and `BError` could also be shared and unified via const generics
};

NonZeroU8 would look something like this:

const _: () = {
    impl Decode for NonZeroU8 {
        type Error = Error;
        fn decode<D: Decoder>(decoder: &mut D) -> Result<Self, Self::Error> {
            let v = u8::decode(decoder).map_err(Error::U8)?;
            let v = Self::new(v)
                .ok_or(Error::NonZeroTypeIsZero(NonZeroTypeIsZero { non_zero_type: IntegerType::U8 }))?;
            Ok(v)
    }

    #[derive(Debug)]
    pub enum Error {
        U8(<u8 as Decode>::Error),
        NonZeroTypeIsZero(NonZeroTypeIsZero),
    }

    impl Display for Error {
        fn fmt(&self, f: &mut Formatter<'_>) -> fmt::Result {
            f.write_str("failed to decode non-zero 8-bit unsigned integer")
        }
    }

    impl std::error::Error for Error {
        fn source(&self) -> Option<&(dyn 'static + std::error::Error)> {
            match self {
                Self::U8(e) => Some(e),
                Self::NonZeroTypeIsZero(e) => Some(e),
            }
        }
    }
};

// in `bincode::error`; this module contains a bunch of common “root-cause” errors
// that previously made up the variants of `DecodeError`
pub mod common {
    pub struct NonZeroTypeIsZero {
        pub non_zero_type: IntegerType,
    }

    impl Display for NonZeroTypeIsZero {
        fn fmt(&self, f: &mut Formatter<'_>) -> fmt::Result {
            f.write_str("non-zero integer type is zero")
        }
    }

    impl std::error::Error for NonZeroTypeIsZero {}

    // includes other things like `struct LimitExceeded;` etc
}

At the top level, it’s easier not to expose the hell of generics that this will produce — instead, it can all be neatly boxed away behind a DecodeError (which also serves to give us that nice “failed to decode Bincode” message at the top):

pub struct DecodeError {
    // future possibility: add an `offset` field here?
    #[cfg(feature = "std")]
    source: Box<dyn Send + Sync + std::error::Error>,
}

impl DecodeError {
    // Users can use this to downcast the root cause into a common error type like
    // `NonZeroTypeIsZero` so we don’t lose any functionality over the enum
    pub fn root_cause(&self) -> &(dyn 'static + std::error::Error) { /* … */ }
}

impl Display for DecodeError {
    fn fmt(&self, f: &mut Formatter<'_>) -> fmt::Result {
        f.write_str("failed to decode Bincode")
    }
}

impl std::error::Error for DecodeError {
    fn source(&self) -> Option<&(dyn 'static + std::error::Error)> {
        Some(self.source.as_ref())
    }
}

The Decoder trait would have to be adjusted as well. claim_bytes_read would be easy — it’d just return a Result<(), LimitExceeded> — and Reader would use its own ReaderError:

#[derive(Debug)]
pub enum ReaderError {
    Io(io::Error),
    UnexpectedEnd(UnexpectedEnd),
}

impl Display for ReaderError {
    fn fmt(&self, f: &mut Formatter<'_>) -> fmt::Result {
        f.write_str("failed to take bytes from source")
    }
}

// plus a `std::error::Error` impl

Since this crate is #![no_std], we might need to polyfill Error with our own Error trait that has .source() — or we could just omit the Error infrastructure entirely in that case. DecodeError could potentially have an internal enum CommonRootCauses as well, so we don’t always throw away root causes (and we end up with the no lost behaviour in comparison to before).

I only demonstrated the Decode side of things here but the exact same pattern can be easily applied to Encode as well.

If you think this is a good idea, I’m happy to try implementing it.

Jul 22 '22 12:07 SabrinaJewson

I like the idea but I think it'll be very easy to break. A case that comes to mind:

enum LinkedListNode<T> {
    Occupied { value: T, next: Box<LinkedListNode<T>> },
    Vacant
}

#[derive(Debug)]
pub enum Error<T> {
    T(<T as Decode>::Error), // which is `Error<T>` again
}

To solve this you would have to Box every inner error, which would not work on no_std.

Jul 23 '22 09:07 VictorKoenders

My intention was to allow that (I even used it for the NonZeroU8 implementation), I only wanted to make the error types unnameable so that the variants would be hidden (I’m pretty sure you can’t access variants through associated types) and Bincode wouldn’t be silently adding public API to crates that use the derive.

Jul 23 '22 11:07 SabrinaJewson

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sep 22 '22 03:09 stale[bot]

I would love if this were explored further; hopefully we can reopen. In combination with the Error trait's provide method this could expose programmatic access to the nested path of where decoding failed.

Dec 03 '22 16:12 dtolnay