Dynamically enable/disable tags

Open rensje opened this issue 2 years ago • 0 comments

Description

Hi!

Love the library! I switched our entire code base from marshmallow to msgspec and experienced an amazing up to 200 times speed up.

I was wondering whether it would be possible to dynamically change whether tags are enabled for structs, irrespective of the struct definition, since in some cases a tag is not needed and disabling them could be a significant reduction in space and time.

My use-case is the following:

class Box(Struct, tag=?):
    x: float
    y: float
    w: float
    h: float

class Mask(Struct, tag=?):
    mask: np.ndarray

# Use case 1

class PredictionBox(Struct, tag=?)
    box: Box
    confidence: float

 # Box is known from the type
 # of PredictionBox definition no tag needed.
 # PredictionBox is the top-level type
 # I would like that one to be tagged.
 # so that decode can know what to decode.
encode(PredictionBox(...))

# Use case 2:
class Predictions(Struct, tag=?):
    predictions: List[PredictionBox]

 # Predictions is the top-level type
 # I would like that one to be tagged.
 # however, PredictionBox and Box should not be tagged.
encode(Predictions(...))

# Use case 3:

class PredictionMask(...):
    mask: ...

 # Predictions2 is the top-level type
 # I would like that one to be tagged.
 # PredictionBox and PredictionMask should be tagged.
 # since they are part of a union.
class Predictions2(...):
    predictions: List[Union[PredictionBox, PredictionMask]]

This example is a bit contrived, in actuality things (hopefully) make more sense. The gist is that sometimes the same types are used in a context where a tag is mandatory and sometimes a tag is detrimental for space and speed considerations.

It would be great if encode() had a parameter such as omit_tag that would allow it not to include a tag field when it is not necessary. I believe a tag is only really necessary when a type is part of a tagged union. In addition, I think that optionally encoding the top-level struct with a tag would be useful as well, since in this way a downstream decoder can always determine the type (assuming types are known at both ends). I believe nothing would have to be changed for the decode() API to function as is, since the tag field would just be ignored.

I'm currently working around this by automatically dynamically defining untagged structs for each of my tagged structs and using those in definitions when possible. However, this leads to quite a bit of noise and is unfriendly from a user's perspective.

Aug 18 '23 12:08 rensje