Marshmallow’s Nested fields with a restricted schema steal the full schema’s registry spot
If we e.g. have a marshmallow Schema like this:
from marshmallow_sqlalchemy import fields, SQLAlchemyAutoSchema
class FooSchema(SQLAlchemyAutoSchema):
parent = fields.Nested("BarSchema", only=("id", "name"))
and call spec.components.schema('Foo', schema=FooSchema), then “BarSchema” gets registered as having only the fields id and name, and subsequent tries to register the real thing fail.
The problem lies here:
https://github.com/marshmallow-code/apispec/blob/d418e3efa27f799ae3dabbdbb00ccd07eb10ce07/src/apispec/ext/marshmallow/openapi.py#L108
As that function somehow doesn’t give us something useful. Also it should first modify names for schemas where schema.only is not None before it modifies schemas where schema.only is None.
In most cases you should not rely on get_unique_schema_name to provide names for schemas using modifiers. This function is provided as a fall back to ensure that the generated spec is valid - even if it is not particularly readable. As the warning indicates the recommended solution is either:
- add the modified and unmodified versions of
BarSchemaprior to addingFooSchema - provide a custom
schema_name_resolverfunction to provide appropriate names for the modified and unmodified versions ofBarSchema- this is where logic for modifying the names based onschema.onlycould reside
As I explained, option 1 isn’t possible, as spec.components.schema automatically adds all referred-to schemas transitively.
A custom schema_name_resolver does work, but should one have to rely on one for basic functionality? I’d think that stuff should just work without crashing with a inscrutable error message leading to a lot of debugging.
Does it work if you register mini bar schema like this first?
spec.components.schema("MiniBarSchema", BarSchema(only=("id", "name")))
Also, if you register BarSchema (real one) first, you should only get a warning, not an error, when the mini one is auto-registered.
Even if it does, that means figuring out for which schemas I need the mini one beforehand, instead of being able to rely on auto registering:
My suggestion still stands:
- The message should be clearer
- There should be a way to easily configure name generation to avoid things like this
There should be a way to easily configure name generation
That is schema_name_resolver. It is a somewhat advanced feature, indeed. But apispec can't reasonably define automatic names for modified schemas.
OK, so how about we
- remember which schema-configuration combination is registered for which name
- if a collision happens we show a nice error message describing the problem and mentioning that solving it involves either pre-registering all schemas before any auto registering happens or
schema_name_resolver
Helpful error messages are always better than having to jump into a debugger and googling some inscrutable message
Hey there 👋🏻
Context:
Just stumbled into this issue as I am encountering the same problem that @flying-sheep described. I think he explained it perfectly in his original message, but just to reiterate:
Problem:
When users use the spec.components.schema function to register schemas into an APISpec object, the MarshmallowPlugin (provided by this package) automatically registers nested schemas, making the user-level schema registration to fail when manually registering a schema that was defined as "nested" on a previous registered one.
One could say: "Ok then, do not register a schema that has already been register, duh". Sadly, there are use cases where schemas are defined within a list, and the short-and-nice registration loop fails... 😞
Example:
from apispec.core import APISpec
from apispec.ext.marshmallow import MarshmallowPlugin
from marshmallow import Schema
from marshmallow import fields
class Children(Schema):
children_id = fields.String(required=True)
class Parent(Schema):
parent_id = fields.String(required=True)
children = fields.Nested(
required=True,
nested=Children,
many=True,
)
# NOTE:
# This structure could be handy when trying to register all project-defined schemas,
# probably to be defined on a folder's __init__.py module, for instance.
all_schemas = [
Parent,
Children,
# ...
]
spec = APISpec(
title="Example",
version="0.1.0",
openapi_version="3.0.2",
plugins=[MarshmallowPlugin()],
)
# Fails...
for schema in all_schemas:
spec.components.schema(schema.__name__, schema=schema)
Proposal:
I can think of two solutions, either:
- To expose a parameter to avoid the registration of nested schemas (within
MarshmallowPlugin?). - To remove this
DuplicateComponentNameErrorraise, and just return when the schema is already registered.
Just to clarify (given the age of the issue creation): this is still a problem on the current version (5.1.1).
@Sinclert To avoid registering nested schemas, you can pass a schema_name_resolver that always returns None. If you do this you cannot have circular references in your schemas. My opinion is that defining nested schemas in line is more confusing for your users than defining them as references so I would simply catch the errors.
Thanks for the quick reply!
I see... that seems to avoid the automatic registration of nested schemas ✅
Not sure how to avoid people stepping into this confusion again though. I reviewed both the Nested schemas documentation, and the MarshmallowPlugin parameters section and I could not find any reference to "in order to avoid the automatic registering of nested schemas, please provide a None-returning schema_name_resolver function". Indeed, documentation specifies exactly the opposite:
@Bangertm could you include what you said on your previous message somewhere in the docs? I would also consider closing this issue so people see that there is actually an official way to solve it.
The documentation under the Nested Schema documentation you reference includes this sentence:
If the schema_name_resolver function returns a value that evaluates to False in a boolean context the nested schema will not be added to the spec and instead defined in-line.
Can you suggest a way to clarify further?
Maybe another "❕ NOTE" block 🤷🏻♂️
In order to deactivate the automatic registration of nested schemas, and avoid registration duplicates, consider providing a
Nonereturning function to the MarshmallowPluginschema_name_resolverinitialization argument.Example:
MarshmallowPlugin(schema_name_resolver=lambda schema_class: None)