Add `strip_whitespce` as an option to Meta
Description
Right now Meta supports min_length and max_length, but those are mostly meaningless without first stripping out whitespace.
Currently, we have to do variations on this all over the place:
class Foo(Struct):
bar: Annotated[str, Meta(min_length=1, max_length=20)]
def __post_init__(self):
self.bar = self.bar.strip()
if len(self.bar) < 1:
raise ValueError('too short')
if len(self.bar) > 20:
raise ValueError('too long')
This is obviously bad because it repeats the length checks and adds a lot of boilerplate. Something like this would be much simpler:
class Foo(Struct):
bar: Annotated[str, Meta(min_length=1, max_length=20, strip_whitespace=True)]
Even without the length checks, having a built in way to strip whitespace would be very useful and cut down on boilerplate. Other than things like passwords and tokens we need to strip whitespace on every single string field on its way in.
You can remove 33% of your "boilerplate" by removing the second test:
How can a str of length <= 20 be longer than 20 after stripping whitespace?
You can remove 66% if you use Meta(pattern=r"^\s*\S.{0,19}\s*$") instead of what you have now and only do the mutation of your decoded value in the __post_init__.
Where it belongs IMHO. I don't think annotations should describe value manipulations.
I feel this comment misses the point entirely. I'm making suggestions to improve the developer experience.
Validating using annotations is a stated feature of msgspec:
Zero-cost schema validation using familiar Python type annotations. In benchmarks msgspec decodes and validates JSON faster than orjson can decode it alone.
msgpec already does validation by having things like min_length and max_length in place - this is normal and expected for a Python validation library. I'd wager most developers would also like some nice syntax for other common operations, like stripping whitespace. It is generally better for the library to do that, because it leaves less room for errors when all the consumers of that library have to implement it themselves.
Meta(pattern=r"^\s*\S.{0,19}\s*$") is also boilerplate that would have to be added everywhere. And, what's worse, it's much harder to read and understand than strip_whitespace=True.