desert icon indicating copy to clipboard operation
desert copied to clipboard

Unknown Excluded in sublists

Open remidebette opened this issue 5 years ago • 11 comments

Hi,

I am quite new to the marshmallow ecosystem. I am considering the use of desert as a way to write python API clients by explicitly describing their data structure to ease the deserialisation.

In the process of writing an API Client for a specific purpose one could be confronted to a lot of unnecessary data in API responses that he does not intend to use. Exploring this data and writing up the schemas while continuously testing, a way to build such code could be to default to marshmallow EXCLUDE for unknown fields. That way, continuous TDD with "deserialisation of real life JSONS" tests would be possible without ValidationError while not having to describe unused chunks of the data.

Furthermore, continuing to use EXCLUDE after this initial stage of development could be acceptable, if the policy of the API itself is to authorize adding of values for minor changes. One could want to make its API client future-proof by not breaking when such a change happens on the other side that he does not necessarily controls.

But this behavior is broken when deserialising unknown fields in a List:

from dataclasses import dataclass
from typing import List

import desert
import marshmallow


@dataclass
class Ambiguous:
    firstname: str
    lastname: str


@dataclass
class AmbiguousList:
    id: str
    not_yet_known: List[Ambiguous]


AmbiguousListSchema = desert.schema_class(AmbiguousList, meta={"unknown": marshmallow.EXCLUDE})()

AmbiguousListSchema.loads("""
{
    "id":"789456132",
    "not_yet_known": [
        {
            "firstname": "John",
            "lastname": "Smith"
        },
        {
            "firstname": "Henry",
            "lastname": "Smith",
        }
    ]
}
""")
# Ok

AmbiguousListSchema.loads("""
{
    "id":"789456132",
    "shiny_new_arg": "uninteresting info",
    "not_yet_known": [
        {
            "firstname": "John",
            "lastname": "Smith"
        },
        {
            "firstname": "Henry",
            "lastname": "Smith"
        }
    ]
}
""")
# Still Ok

AmbiguousListSchema.loads("""
{
    "id":"789456132",
    "shiny_new_arg": "uninteresting info",
    "not_yet_known": [
        {
            "firstname": "John",
            "lastname": "Smith"
        },
        {
            "firstname": "Henry",
            "lastname": "Smith",
            "shiny_new_arg": "uninteresting info"
        }
    ]
}
""")
# marshmallow.exceptions.ValidationError: {'not_yet_known': {1: {'shiny_new_arg': ['Unknown field.']}}}

I am not sure what is the library that is responsible. It seems that marshmallow_dataclass shares the same error.

remidebette avatar May 01 '20 13:05 remidebette

Seems like the marshmallow.EXCLUDE is being applied only to the top level schema and isn't cascaded down. I can imagine sometimes wanting it to cascade (such as your case probably) but I can also imagine wanting to specify that per class. I'm not the only one here so hopefully some discussion can ensue on this.

altendky avatar May 01 '20 14:05 altendky

Hmm, I guess in the granular case it could be per attribute's usage of a class/schema? A pure marshmallow line for not_yet_known: List[Ambiguous] would look something like not_yet_known = marshmallow.fields.List(AmbiguousSchema(unknown=marshmallow.EXCLUDE)). So it seems that the closer we can get it to the specification of the class the better.

altendky avatar May 01 '20 14:05 altendky

Ok so either I need to cascade or to specify EXCLUDE for specific subclasses. I could live with either.

Is there already a syntax for transposing the above marshmallow line to a desert compatible dataclass without having to create an intermediary AmbiguousSchema?

remidebette avatar May 01 '20 14:05 remidebette

I guess it might be:

not_yet_known: List[Ambiguous] = desert.ib(
    marshmallow_field=marshmallow.fields.List(
        desert.schema(cls=Ambiguous, meta={'exclude': marshmallow.EXCLUDE})
    )
)

altendky avatar May 01 '20 14:05 altendky

If you need that a lot you can for now compensate with a helper function.

def my_excluding_ib(cls):
    return desert.ib(
        marshmallow_field=marshmallow.fields.List(
            desert.schema(cls=cls, meta={'exclude': marshmallow.EXCLUDE})
        )
    )

not_yet_known: List[Ambiguous] = my_excluding_ib(cls=Ambiguous)

Obviously still not as dry as we'd like desert to be what with the repetition of Ambiguous, but perhaps good enough to be a useful stop gap.

altendky avatar May 01 '20 14:05 altendky

So I tried your first example and got a Type error since fields.List accepts only FieldABC. I reworked it to use fields.Nested, but even so I still have the initial error:

The full code:

from dataclasses import dataclass
from typing import List

import desert
import marshmallow


@dataclass
class Ambiguous:
    firstname: str
    lastname: str

# Apparently Nested prefers to get the Schema Class
AmbiguousSchemaClass = desert.schema_class(Ambiguous, meta={'unknown': marshmallow.EXCLUDE})


@dataclass
class AmbiguousList:
    id: str
    not_yet_known: List[Ambiguous] = desert.ib(marshmallow_field=marshmallow.fields.List(marshmallow.fields.Nested(AmbiguousSchemaClass)))


AmbiguousListSchema = desert.schema_class(AmbiguousList, meta={"unknown": marshmallow.EXCLUDE})()

AmbiguousListSchema.loads("""
{
    "id":"789456132",
    "not_yet_known": [
        {
            "firstname": "John",
            "lastname": "Smith"
        },
        {
            "firstname": "Henry",
            "lastname": "Smith"
        }
    ]
}
""")
# Ok

dat = AmbiguousListSchema.loads("""
{
    "id":"789456132",
    "shiny_new_arg": "uninteresting info",
    "not_yet_known": [
        {
            "firstname": "John",
            "lastname": "Smith"
        },
        {
            "firstname": "Henry",
            "lastname": "Smith"
        }
    ]
}
""")
# Still Ok

AmbiguousListSchema.loads("""
{
    "id":"789456132",
    "shiny_new_arg": "uninteresting info",
    "not_yet_known": [
        {
            "firstname": "John",
            "lastname": "Smith"
        },
        {
            "firstname": "Henry",
            "lastname": "Smith",
            "shiny_new_arg": "uninteresting info"
        }
    ]
}
""")
# marshmallow.exceptions.ValidationError: {'not_yet_known': {1: {'shiny_new_arg': ['Unknown field.']}}}

remidebette avatar May 01 '20 17:05 remidebette

Yeah, I usually forget Nested, sorry. I also had my head all turned around on everything else, so sorry for that. desert.ib() is for attrs while desert.field() is for dataclasses. Also, the meta= for both are _not_ going down to marshmallow.

from dataclasses import dataclass
from typing import List

import desert
import marshmallow


@dataclass
class Ambiguous:
    firstname: str
    lastname: str


AmbiguousSchemaClass = desert.schema_class(Ambiguous)


@dataclass
class AmbiguousList:
    id: str
    not_yet_known: List[Ambiguous] = desert.field(
        marshmallow_field=marshmallow.fields.List(
            marshmallow.fields.Nested(
                AmbiguousSchemaClass(unknown=marshmallow.EXCLUDE),
            ),
        ),
    )


AmbiguousListSchema = desert.schema_class(AmbiguousList)

AmbiguousListSchema(unknown=marshmallow.EXCLUDE).loads("""
{
    "id":"789456132",
    "not_yet_known": [
        {
            "firstname": "John",
            "lastname": "Smith"
        },
        {
            "firstname": "Henry",
            "lastname": "Smith"
        }
    ]
}
""")

dat = AmbiguousListSchema(unknown=marshmallow.EXCLUDE).loads("""
{
    "id":"789456132",
    "shiny_new_arg": "uninteresting info",
    "not_yet_known": [
        {
            "firstname": "John",
            "lastname": "Smith"
        },
        {
            "firstname": "Henry",
            "lastname": "Smith"
        }
    ]
}
""")

AmbiguousListSchema(unknown=marshmallow.EXCLUDE).loads("""
{
    "id":"789456132",
    "shiny_new_arg": "uninteresting info",
    "not_yet_known": [
        {
            "firstname": "John",
            "lastname": "Smith"
        },
        {
            "firstname": "Henry",
            "lastname": "Smith",
            "shiny_new_arg": "uninteresting info"
        }
    ]
}
""")

altendky avatar May 01 '20 20:05 altendky

Thanks @altendky ,

I reworked the example for the consistency of the codestyle but now it works.

So my request about DRY coding in desert would be to have ways to pass marshmallow parameters more simply without having to redefine the intermediary schema and the marshmallow type in the desert.field. Or to have a way to cascade.

But at least I have a workaround to continue using desert for now.

The final code:

from dataclasses import dataclass
from typing import List

import desert
import marshmallow


@dataclass
class Ambiguous:
    firstname: str
    lastname: str

# This is what currently has to be introduced to do the unknown sub declaration
AmbiguousSchema = desert.schema_class(Ambiguous, meta={'unknown': marshmallow.EXCLUDE})()


@dataclass
class AmbiguousList:
    id: str
    
    # Here we have to code 2 times the type definition, in both dataclass style and marshmallow style
    not_yet_known: List[Ambiguous] = desert.field(marshmallow_field=marshmallow.fields.List(
        marshmallow.fields.Nested(AmbiguousSchema)))


AmbiguousListSchema = desert.schema_class(AmbiguousList, meta={"unknown": marshmallow.EXCLUDE})()

Maybe the desert.field could have a shortcut way to inject the marshmallow meta parameters while respecting the above logic behind the scenes?

Or maybe my issue is that I expected to be able to write only one final desert schema definition and the rest of the code would be purely dataclasses (without having to explicity write a schema for each sub dataclass that has to be customized) But today there no way to attach the meta parameter directly to the dataclass, unlike the @dataclass decorator of marshmallow_dataclass for instance

remidebette avatar May 02 '20 10:05 remidebette

There's a similar discussion to this in the marshmallow-code repo: https://github.com/marshmallow-code/marshmallow/issues/1490

wanderrful avatar Sep 04 '20 15:09 wanderrful

From my quick read of https://github.com/marshmallow-code/marshmallow/issues/1490 it seems that marshmallow intentionally doesn't want to propagate and that the OP concluded that was reasonable? Which would mean that we would be going counter to that I guess? Doesn't necessarily mean we shouldn't, just trying to get myself straight here. My gut says that people would often want to be able to specify this in one place rather than spreading it throughout their nested 'schemas'. I guess I need to read deeper into the secondary link layer explaining why they don't want to propagate. Perhaps we have an explicitly separate option (maybe like mentioned in 1490) that to enable propagation. Though making it an all or nothing has it's own limitations. Hmm...

altendky avatar Sep 19 '20 20:09 altendky

Hi, if it can help someone, I used this to make it work with desert and still keeping it not too convoluted.

@dataclass
class Child:
    name: str

@dataclass
class Parent:
    name: str
    dummy_list: List[str]
    children: List[Child] = field(
        default_factory=list,
        metadata=desert.metadata(
            m_fields.Nested(
                desert.schema(Child, meta={"unknown": EXCLUDE}, many=True), required=True
            )
        ),
    )

Ouradze avatar Mar 18 '21 11:03 Ouradze