zerocopy icon indicating copy to clipboard operation
zerocopy copied to clipboard

Default value for enums

Open saltatory opened this issue 1 year ago • 2 comments

Project

It's not public but I am working on a high-throughput IO block device subsystem.

Use Case

I want to use enums for an operation log where the enum represents the operation type e.g. Put or Delete.

Current State

If I do

#[derive(FromBytes, FromZeroes, AsBytes, Unpacked)]
#[repr(u8)]
pub enum OperationType {
   Put = 1,
   Delete = 2
}

... I get an error that the enum must have all 256 variants.

Desired State

It sure would be nice if we could specify a default value somewhat like:

#[derive(FromBytes(default = 0), FromZeroes, AsBytes, Unpacked)]
#[repr(u8)]
pub enum OperationType {
   Unknown = 0,
   Put = 1,
   Delete = 2
}

This is similar to how the serde handles this issue.

saltatory avatar Jun 16 '24 00:06 saltatory

Thank you for the feature request! The FromBytes trait marks a type that can be soundly viewed from a buffer of arbitrary bytes. Those bytes may or may not be owned, shared borrowed, or mutably borrowed. How should this behave:

let bytes = &[42u8];
let ot: &OperationType = OperationType::from_bytes(bytes)

Both ot and bytes point to the same memory. OperationType doesn't have a variant that corresponds to the byte 42u8, and we can't mutate that memory during parsing because bytes is immutable. I don't see a way to reconcile these two issues.

That said, if your complaint is that defining 256 variants is irritating, I can only wholeheartedly agree. @joshlf perhaps we could provide an attribute macro to generate these excess variants.

jswrenn avatar Jun 16 '24 17:06 jswrenn

Understood and thanks for the thoughtful reply. I suppose in general this is the issue with SerDe libraries ... there is an impedance mismatch between the semantics of the language into which the bytes are deserialized and the bytes themselves (and the attendant "wire" specification).

Instead of expecting the language to match perfectly to the serialized format e.g.:

#[repr(u8)]
pub struct enum OperationType {
  Foo = 0,
  Bar = 1,
  Baz = 2,
}

let bytes = &[1u8];
let ot = &OperationType = OperationType::from_bytes(bytes);

We could expect that the language representation includes some method to detect a mismatch between the library and the binary representation. For example, the Rust SerDe library expects you to call deserialize from the Deserialize trait and that the method returns an error if it fails. Zerocopy could have methods to get the enum value if it exists and the raw bytes if it does not.

let bytes = &[3u8];
let ot: Enum<OperationType> = OperationType::from_bytes(bytes);
match ot.get() {
  Some(e) => {/*deserializes*/},
  None => {/*fails*/}
}
let raw = ot.raw();
let raw_mut = ot.raw_mut(); // Could return mutable slice containing just the value

... These method signatures are sketches for discussion. Tastes vary on specific method names and signatures :)

saltatory avatar Jun 17 '24 15:06 saltatory

A macro for generating missing enum values would be very welcome.

bluespider42 avatar Sep 15 '24 20:09 bluespider42

We'll look into this. In the meantime, does our new TryFromBytes address your use-case? Unlike FromBytes, it can be derived on enums that don't have the maximal number of variants.

jswrenn avatar Sep 16 '24 14:09 jswrenn

I can think of two ways to make this ergonomic:

Auto-generate all enum variants

As you suggested above, @jswrenn, use an attribute macro (or similar) to add a variant for each missing discriminant. I see a few issues with this:

First, in cases where the enum discriminant is assigned to a const expression, I'm not sure how we'd be able to compute which discriminants are missing.

Second, the ergonomics depend highly on your use case. This isn't so bad:

match op {
    OperationType::Put => ...,
    OperationType::Delete => ...,
    _ => ...,
}

However, this is dangerous since now your code will silently keep compiling even if the user adds more variants that should be handled. In fact, in Netstack3, we had an explicit rule requiring every match pattern to be explicitly handled for this reason. Explicitly matching in a case like this would be extremely unergonomic.

Rely on niche optimization

Alternatively, if niche optimizations are ever guaranteed, we could use that to generate an enum with the correct layout but better ergonomics. Interestingly, Rust currently doesn't seem to support niche optimizations with explicit variant discriminants (if you try, it yells at you that you need a #[repr(inttype)] attribute on your enum), so let's pretend that our variants have discriminants 0 and 1 rather than 1 and 2 (as in the original example on this issue):

enum OperationType {
    Put,
    Delete,
    Other(OperationTypeOpther),
}

#[rustc_layout_scalar_valid_range_start(2)]
struct OperationTypeOpther(u8);

The Rust playground confirms that OperationType is 1 byte in size.

The ergonomics would be quite good:

match op {
    OperationType::Put => ...,
    OperationType::Delete => ...,
    OperationType::Other(_) => ...,
}

...and would still retain the future-proofing behavior where a newly-added variant would make this code stop compiling.

This would require a number of language features to be stabilized publicly, and is likely years off at a minimum.

joshlf avatar Sep 17 '24 01:09 joshlf

Closing this since it seems like the discussion has stopped. Feel free to follow up if you have more questions.

joshlf avatar Nov 25 '24 22:11 joshlf