zerocopy icon indicating copy to clipboard operation
zerocopy copied to clipboard

Also return remaining bytes in `FromBytes` conversion methods

Open jswrenn opened this issue 1 year ago • 5 comments

This change makes the slice and non-slice conversion methods consistent, and provides a lightweight way (i.e., without Ref) to access the leftover bytes after conversions. For instance, this:

trait FromBytes {
    fn ref_from_prefix(bytes: &[u8]) -> Option<&Self>
    where
        Self: KnownLayout + NoCell;
}

...becomes this, in this PR:

trait FromBytes {
    fn ref_from_prefix(bytes: &[u8]) -> Option<(&Self, &[u8])>
    where
        Self: KnownLayout + NoCell;
}

...where the second element of the tuple are the remaining bytes. Providing the remainder eases the pattern where a T is parsed from a buffer, and then the buffer is advanced to exclude the parsed T.

This PR tests out the 'feel' of tuple-returning conversion methods. Alternatively, we could provide a mirror set of _split methods.

Addresses #1051, but is the antithesis of #884. Inspired by https://github.com/google/zerocopy/issues/1051#issuecomment-2007575409. It's syntactically cheap for users to discard remaining bytes at call sites if they wish, but the alternative — discarding the excess and requiring users to go through Ref — is syntactically and cognitively expensive.

jswrenn avatar Mar 19 '24 22:03 jswrenn

While I'm obviously in favor of doing this in general, I feel like modifying the existing methods would be too large of a breaking change to be worthwhile, which was why I suggested new methods.

smalis-msft avatar Mar 20 '24 16:03 smalis-msft

Hm, we'll need to balance the costs of subjecting current users to breaking changes, against the costs of subjecting current and future users to a more bloated API. Adding _split methods consistently would require adding eight additional methods to FromBytes (which has 14 methods in our latest 0.8 pre-release):

  mut_from
  mut_from_prefix
+ mut_from_prefix_split
  mut_from_suffix
+ mut_from_suffix_split
  mut_slice_from
  mut_slice_from_prefix
+ mut_slice_from_prefix_split
  mut_slice_from_suffix
+ mut_slice_from_suffix_split
  read_from
  read_from_prefix
+ read_from_prefix_split
  read_from_suffix
+ read_from_suffix_split
  ref_from
  ref_from_prefix
+ ref_from_prefix_split
  ref_from_suffix
+ ref_from_suffix_split
  slice_from_prefix
  slice_from_suffix

That's quite a hefty addition, and whatever course of action we take here will also likely be taken for our upcoming TryFromBytes trait, too.

Since we're preparing for a major breaking release anyways, we have a bit more license than usual to consider breaking changes.

jswrenn avatar Mar 20 '24 16:03 jswrenn

As I discuss in https://github.com/google/zerocopy/issues/884#issuecomment-2027828809, I'm strongly in support of not removing the plain-casting methods that don't return the suffix. Embedded code should do the minimum work necessary for the job. Zerocopy has a few design decisions that are good for networking code, but bad for embedded, like how all of the byteorder items are also unaligned. Removing these methods pushes it further in an embedded-hostile direction.

I personally find the zerocopy APIs lacking in useful transitive operations like this, so I would prefer the "bloat" of adding these methods and keeping the plain-casting ones.

What do you think of these methods starting with split_ instead of ending in _split? split_mut_from_prefix reads clearly in the imperative mood and highlights that it's doing more than just a bounds-check and pointer cast: it's splitting as well. This split is an extra operation, both cognitively and for the compiler.

I don't care about the *_from_suffix API and I don't know who does - we should be consistent where it makes sense to do so. Perhaps as a middle ground, this PR can introduce only splitting versions for the prefix APIs, which adds 5 methods instead of 10:

  mut_from
  mut_from_prefix
+ split_mut_from_prefix
  mut_from_suffix
  mut_slice_from
  mut_slice_from_prefix
+ split_mut_slice_from_prefix
  mut_slice_from_suffix
  read_from
  read_from_prefix
+ split_read_from_prefix
  read_from_suffix
  ref_from
  ref_from_prefix
+ split_ref_from_prefix
  ref_from_suffix
  slice_from_prefix
+ split_slice_from_prefix
  slice_from_suffix

kupiakos avatar Mar 30 '24 00:03 kupiakos

I don't care about the *_from_suffix API and I don't know who does

We use it for parsing certain things as well as prefix, and would strongly prefer to have split versions of it as well if you decide to go this route.

smalis-msft avatar Apr 01 '24 13:04 smalis-msft

I've made a discussion post exploring some of the naming and return type design space: https://github.com/google/zerocopy/discussions/1095

jswrenn avatar Apr 05 '24 20:04 jswrenn

I'm inclined to merge this based on the following reasoning:

  • The ergonomics of discarding an unneeded prefix/suffix are better than the ergonomics of reconstructing a prefix/suffix if one isn't returned by the API
  • From what little visibility we have into use cases in the ecosystem, it seems that wanting the prefix/suffix is the more common situation
  • For users who care about not relying on the compiler to perform optimizations, techniques like the one described in this comment are available
  • More generally, we are aware that our current internal software organization is based on the assumption that the compiler will aggressively optimize, and we are actively considering how to provide better support for users who aren't comfortable with this situation: https://github.com/google/zerocopy/issues/1125

joshlf avatar Apr 19 '24 18:04 joshlf