quick-xml icon indicating copy to clipboard operation
quick-xml copied to clipboard

Ignore undeclared element when a $value field is present

Open emarsden opened this issue 2 years ago • 3 comments

The default behaviour of ignoring XML elements that are not declared is very useful when dealing with XML that follows an extensible schema. However, including a rename="$value" field changes this behaviour, and undeclared elements generate an UnexpectedStart parse error.

#[derive(Debug, Default, Serialize, Deserialize)]
struct Root {
    #[serde(rename = "$value")]
    content: Option<String>,
}

let xml = r#"<Root><Foo/></Root>"#;
let r1: Result<Root, quick_xml::DeError> = quick_xml::de::from_str(xml);
//  Parse error: UnexpectedStart([70, 111, 111])

If the rename = "$value" is replaced by rename = "$text" then the problem does not arise (parsing is successful and the Foo element is ignored). Is this the intended behaviour?

emarsden avatar May 01 '23 13:05 emarsden

Yes, for the sake of consistency it would be good to accept such XML. The problem only in construction of consistent rules that will not explode when combining. It is convenient to consider mapping of Rust types as definition of XSD types, which already has consistent rules. The presented type can be expressed in XSD at least by two definitions:

<?xml version="1.0" encoding="utf-8"?>
<xs:schema attributeFormDefault="unqualified"
           elementFormDefault="qualified"
           xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <!--
    Straightforward translation of `struct Root { ... }`.
    Does not allow nested elements (because of
    xs:simpleContent, required for extension of xs:string)
  -->
  <xs:complexType name="Root1">
    <xs:simpleContent>
      <xs:extension base="xs:string"/>
    </xs:simpleContent>
  </xs:complexType>

  <!-- More permissive translation of `struct Root { ... }` -->
  <xs:complexType name="Root2" mixed="true">
    <xs:sequence>
    </xs:sequence>
  </xs:complexType>

</xs:schema>

This is incomplete definitions, because without #[serde(deny_unknown_fields)] we assume that any attributes and any elements can arise inside the type. The above XSD does not reflect that, but that should be relatively simple to add.

I'm not sure, however, that this would be easy to add to quick-xml, because at deserializer side we don't know anything about contents field type. But as you can see from the Root2 type, the mixed attribute is defined in it, so it is the type who know how to deal with strings inside it. So this translation probably could be impossible to implement with serde.

So I leave this open if anyone wish to investigate this rabbit hole.

Mingun avatar May 02 '23 16:05 Mingun

Sometimes we need to get body accumulated to string, to deserialize later with different schema which version defined by attr

<root version="01.04"
      messageID="1"
      messageName="SomeTag">
    <SomeTag>
        <OtherTag/>
    </SomeTag>
</root>

Where "SomeTag" should be readed as string with whole content <SomeTag><OtherTag/></SomeTag> for later deserialization depended to version

#[derive(Debug, Serialize, Deserialize, PartialEq)]
#[serde(rename = "root")]
pub struct Message {
    #[serde(rename = "@version")]
    pub version: String,

    #[serde(rename = "@messageID")]
    pub id: u32,

    #[serde(rename = "@messageName")]
    pub method: String,

    #[serde(rename = "$value")]
    pub body: T?,
}

For example xmlserde has special struct Unparsed, but there another limitations

  • We should set exact tag name
  • Out of the box not method to get string from Unparsed

Any way to do that with quick-xml?

anton-dutov avatar Sep 29 '23 22:09 anton-dutov

I think, that what is you need here is a DOM node, that can be used as a type of body field. There is a minidom crate, build over quick-xml, which probably will close your needs. At time of writing it uses quick-xml 0.28, but I think, it would not be hard to update it.

I also have a plan to add basic DOM support to quick-xml itself and have a branch for that, but the work at the very beginning stage. My current goals are:

  • fix #630 and release 0.31.0
  • rewrite a parser to fix various bugs and improve performance (we are the second based on our benchmarks, the leader is maybe_xml). I have been doing this for almost a month and I plan to include that in 0.32.0
  • rework our errors and include position information to them -- 0.32.0
  • rework the ways to configure parser (using a struct Config with fields instead of calling methods) -- 0.32.0
  • release 0.32.0 at the end of year
  • finish DOM implementation -- 0.33.0

Mingun avatar Sep 30 '23 09:09 Mingun