pydantic-xml icon indicating copy to clipboard operation
pydantic-xml copied to clipboard

Allow attributes to be collections

Open jorants opened this issue 8 months ago • 2 comments

Currently collection types are not allowed for attributes. This is logical for the default serializers, but the user might overwrite default behavior with a custom validator that might return a collection. For example:

from typing import Annotated, Any
import pydantic_xml as pxml
from pydantic import BeforeValidator
from pydantic.functional_serializers import PlainSerializer

def validate_space_separated_attr(value: str) -> list[str]:  
    return value.split(" ")

def serialize_space_separated_attr(values: list[Any]) -> str:  
    return " ".join(str(x) for x in values)

SpaceSeparatedValueListAttr = Annotated[list[str], BeforeValidator(validate_space_separated_attr), PlainSerializer(serialize_space_separated_attr)]
 
class Person(pxml.BaseXmlModel):
    children: SpaceSeparatedValueListAttr = pxml.attr()
    name: str = pxml.element()
    
doc = """
<Person children="Bob Eve">
  <name>Alice</name>
</Person>
"""
    
alice = Person.from_xml(doc)
print(alice.children) # prints ['Bob', 'Eve']
print(alice.to_xml())

Instead of disallowing outright, this change parses these attributes as a string and leave it up to the user. This might not be a good final solution, it would be nicer to check if a custom validator logic is present and error when it is not. However, I do not see an easy way to add such a check.

What do you think? I at least got stuck parsing XML documents that contain space-separated lists in attributes.

jorants avatar Jun 16 '25 14:06 jorants

@jorants Hi,

For now you can define custom schema:

from typing import Annotated

import pydantic_xml as pxml
from pydantic_core import core_schema as cs


class SpaceSeparatedValueListSchema:
    @classmethod
    def __get_pydantic_core_schema__(cls, source_type, handler):
        schema = cs.no_info_after_validator_function(lambda val: val.split(' '), cs.str_schema())
        serialization = cs.plain_serializer_function_ser_schema(lambda lst: ' '.join(lst))
        return cs.json_or_python_schema(json_schema=schema, python_schema=schema, serialization=serialization)


class Person(pxml.BaseXmlModel):
    children: Annotated[list[str], SpaceSeparatedValueListSchema] = pxml.attr()
    name: str = pxml.element()


doc = """
<Person children="Bob Eve">
  <name>Alice</name>
</Person>
"""

alice = Person.from_xml(doc)
print(alice.children)  # prints ['Bob', 'Eve']
print(alice.to_xml())  # prints b'<Person children="Bob Eve"><name>Alice</name></Person>'

dapper91 avatar Jun 29 '25 16:06 dapper91

I suppose this collection attribute implementation is not very intuitive since it only works if validator and serializer are provided. In my opinion such code should work too:

class Person(pxml.BaseXmlModel):
    children: list[str] = pxml.attr()
    name: str = pxml.element()

but it fails with a misleading error.

dapper91 avatar Jun 29 '25 16:06 dapper91