marshmallow icon indicating copy to clipboard operation
marshmallow copied to clipboard

Extra documentation for new context implementation?

Open RosanneZe opened this issue 9 months ago • 21 comments

I was a bit surprised that in the documentation there is only an example for how to use the experimental Context class, and not an example of how to do it the none experimental way?

This seems a lot more complex than the old way of doing it, is there anyway to just say the context is a dict and leave it at that?

RosanneZe avatar Apr 24 '25 09:04 RosanneZe

Nope. The old not experimental way is gone.

The rationale for this is explained in #1826 and #2707.

lafrech avatar Apr 24 '25 12:04 lafrech

Ah, I think that I just misread the upgrade notes then. I took 'Use contextvars.ContextVar for passing context to fields, pre-/post-processing methods, and validators instead.' to mean that there was also a non-experimental way of doing it, that I just didn't understood because I never did anything with contextvars.

RosanneZe avatar Apr 28 '25 07:04 RosanneZe

Context variables is a neat feature.

It is really worth learning, especially if you need something context-related in marshmallow.

The experimental code in marshmallow provides the use of a context variable in a context manager. You may use it in your code or just as an example.

lafrech avatar Apr 28 '25 15:04 lafrech

One way that I use the context is to pass information from a parent schema into a nested one for conditional validation. This change has made things quite a bit more complex, was this use case considered?

juledwar avatar May 12 '25 03:05 juledwar

@juledwar can you post an example of the use case with the legacy API that you're trying to achieve with the new API? maybe we can assist with the migration

sloria avatar May 12 '25 03:05 sloria

I would appreciate that, thanks.

I'll try and pare it down here, I am not permitted to post the code as-is, so this is not something I've run but it should give you the rough idea:

class Foo(Schema):
    name = String(required=True)
    reason = String(required=True)

    @pre_load
    def set_required(self, data, **kwargs):
        if self.context.get('reason_not_required') is False:
            self.fields['reason'].required = False
        return data

class Bar(Schema):
    id = Integer()
    details = Nested(Foo)

    @pre_load
    def needs_reason(self, data, **kwargs):
        if data.get('some_thing'):  # This could be any property of the data
            self.context['reason_not_required'] = False
        return data

Previously I could encapsulate all of this decision making in the schema here, but now I need to set context vars outside of the schema. There's no way to enforce their presence as well, as far as I can tell, other than to just blow up at runtime. It would be nice to pass something via the field constructor but this is not something that can be controlled in the schema definition. The hacky way would be to do something like self.fields.details.orig_data = data and examine it in Foo's hooks.

This change also has some ramifications for OneOfSchema and I filed another issue on that project: https://github.com/marshmallow-code/marshmallow-oneofschema/issues/214

juledwar avatar May 12 '25 04:05 juledwar

Sounds like a polymorphism issue, indeed. Indeed not really the case I had in mind for the context feature.

lafrech avatar May 12 '25 14:05 lafrech

This is absolutely not using polymorphism here, it's just one example of setting properties to affect changes in validation on the nested schema, which is one of the use cases described in the documentation.

The OneOfSchema thing is possibly a polymorphism issue but I've written more there.

juledwar avatar May 12 '25 22:05 juledwar

We could think of it as Bar being polymorphic, some_thing being the key, and Bar1 nesting Foo1 in which reason is required and Bar2 nesting Foo2 in which reason is not required.

But maybe that's not the most natural way to see it. And it would only work on this possibly contrived example where the check on some_thing doesn't contain any complicated business logic. And it may require more code.

I don't see what part of the doc you're pointing to but such a use case may have been used to illustrate the old context feature. What I'm saying is that I didn't have this in mind when proposing the changes in v4.

I don't have time to try this now but doesn't it work if you just replace self.context with Context.get()?

lafrech avatar May 13 '25 07:05 lafrech

I think you're confusing OneOfSchema with this one? The doc I'm referring to is here https://marshmallow.readthedocs.io/en/latest/custom_fields.html#using-context where it talks about using external data to affect the serialization.

My use case is to encapsulate this into the schema itself because forcing callers to provide the context when it's already all available in the parent schema seems quite awkward.

I'll see if Context.get() works - I was initially concerned about the value persisting without a reset()

juledwar avatar May 13 '25 23:05 juledwar

I'm not confusing with OneOfSchema. I somehow missed the page you're pointing to and only saw this one.

Anyway, you're right, it won't work if the context is not opened by the with Context call. This is how it is meant to work. It is what I thought initially but for some reason, after reading the doc, I was in doubt.

A different approach to this would be to do the validation in the parent: in needs_reason, raise if the field is missing from Foo payload. I understand you like the logic to be contained in Foo schema but in this example, reason_not_required is quite explicit already, so the abstraction is weak. Could be because the example is too contrived.

Also, I understand it sounds better to use schema/fields API rather than duplicate validation code, but modifying schema such as in self.fields['reason'].required is not something I'm happy supporting. There's been discussion about this in #1281, although it was more specifically triggered by concerns about other attributes.

I hope I'm being helpful. Sorry about the wrong hint in previous comment.

lafrech avatar May 14 '25 09:05 lafrech

You're right in that it was a super-contrived example for the sake of brevity. But it's a general pattern I feel should be easier to do as I have quite a few nested schemas that need to change behaviour based on some external data that is a property of the original parent schema's data.

I would rather just inject the data from the call to load() than duplicate the validation in every parent. I could also use inheritance of a common validator in the parent but that feels conceptually wrong too.

Perhaps if I rename the context variable like this it's less of a weak-looking abstraction:

class Foo(Schema):
    name = String(required=True)
    reason = String(required=True)

    @pre_load
    def set_required(self, data, **kwargs):
        if self.context.get('totem') is False:
            self.fields['reason'].required = False  # or could do anything else
        return data

class Bar(Schema):
    id = Integer()
    details = Nested(Foo)

    @pre_load
    def set_totem(self, data, **kwargs):
        if data.get('some_thing'):  # This could be any property of the data
            self.context['totem'] = False
        return data

This is why self.context was so useful: it was a short-lived piece of data that was scoped to the data being loaded and perfectly thread safe as it is on the stack.

Regarding changing self.fields['reason'].required I think if you don't want to support that, the variable should be "private" with a leading _ otherwise API users see it as fair game (as noted in the Issue you linked), but it's such a useful way of dynamically setting the flag it would be a shame to see it go. How else could you make a field conditionally required this easy? A custom validator, sure, but that's many more lines of code.

I hope I'm being helpful. Sorry about the wrong hint in previous comment.

You have been nothing but very helpful every time I have a question on this project!

juledwar avatar May 14 '25 21:05 juledwar

just brainstorming: would adding a setter to Context meet your use case? so something like

from marshmallow.experimental.context import Context

class Foo(Schema):
    name = String(required=True)
    reason = String(required=True)

    @pre_load
    def set_required(self, data, **kwargs):
        if Context.get()["totem"] is False:
            self.fields['reason'].required = False  # or could do anything else
        return data

class Bar(Schema):
    id = Integer()
    details = Nested(Foo)

    @pre_load
    def set_totem(self, data, **kwargs):
        if data.get('some_thing'):  # This could be any property of the data
            Context.set({"totem": False})  # <-- doesn't yet exist
        return data

with Context({"totem": True}):
    Bar().load({...})

we held off on adding a setter to avoid pathological cases where the context isn't reset. we could avoid that by raising an error if the setter is called without entering the context manager.

sloria avatar May 15 '25 23:05 sloria

I'm not sure it would help as you've still got the context data being set outside the load(). Ideally this needs to be encapsulated in the schema itself.

I think a decent way to manage this might be to have a context hook on the base schema class, a bit like pre_load etc, where you can set up context based on the incoming data. The base schema class can then unset it later after the data was loaded.


class Bar(Schema):
    id = Integer()
    details = Nested(Foo)

    @load_context
    def set_totem(self, data, **kwargs):
        if data.get('some_thing'):  # This could be any property of the data
            return {'totem': False}
        return None

It's not returning data like the other hooks, it purely returns something to set in a contextvar, which is all managed for you by the base class. Similarly, a @dump_context could work in the same way.

This would effectively remove the need for users to know anything about contextvars, but would still use them "under the hood" by using a setter and a reset when needed.

Thoughts?

juledwar avatar May 16 '25 05:05 juledwar

oh that's an interesting idea. would we then need @pre_load_context, @post_load_context, @pre_dump_context, @post_dump_context?

sloria avatar May 16 '25 16:05 sloria

One way that I use the context is to pass information from a parent schema into a nested one for conditional validation.

This use case is legitimate but the previous context mechanism was already brittle because of the caching in details = Nested(Foo). If the same schema would be used to load multiple objects, the first load would instantiate Foo in details with a certain context value and it would not change anymore.

Example:

schemata = Bar(unknown=INCLUDE)
schemata.load({"details": {"name": "foo", "reason": "bar"}})
schemata.load({"some_thing": "foo", "details": {"name": "foo"}}) # The context is cached and requires the reason
# {'details': {'reason': ['Missing data for required field.']}}

Source code: https://github.com/marshmallow-code/marshmallow/blob/3.26.1/src/marshmallow/fields.py#L585C5-L585C8

Morikko avatar Jun 02 '25 15:06 Morikko

oh that's an interesting idea. would we then need @pre_load_context, @post_load_context, @pre_dump_context, @post_dump_context?

Sorry only just saw this - yes, I think that would be ideal. Obviously the devil is in the detail but it seems the cleanest way to implement this while following the existing decorator patterns. The base Schema class would do most of the heavy lifting.

juledwar avatar Jun 02 '25 22:06 juledwar

@Morikko Agreed - and the same problem exists with contextvars as well unless you use a different thread.

juledwar avatar Jun 02 '25 22:06 juledwar

the same problem exists with contextvars as well unless you use a different thread.

Really? Using context managers, the context should only be accessible when inside the context manager. I don't see the caching issue.

lafrech avatar Jun 23 '25 12:06 lafrech

Is there any update on this conversation? I was also using context to pass data from a parent schema to a child schema and I don't really see a good way to replicate this behavior while migrating to version 4. It was much cleaner to do this in a child schema rather than offload all of that logic onto the parent, and it also made for better error responses since any error messages (like two children having the same name) would then exist in the correct index of the parent's child, so it was much more convenient for rendering an error message on a form field. It seems much more cumbersome to accomplish that now in light of the changes to how context works.

artclark42 avatar Oct 27 '25 22:10 artclark42

I'd love to see the suggestion in https://github.com/marshmallow-code/marshmallow/issues/2824#issuecomment-2887169436 implemented.

juledwar avatar Oct 27 '25 22:10 juledwar