spark-xml icon indicating copy to clipboard operation
spark-xml copied to clipboard

<xs:choice maxOccurs="unbounded"> does not produce array type

Open marpetr opened this issue 1 year ago • 3 comments

Example input:

	<xs:complexType name="Fruits">
		<xs:choice maxOccurs="unbounded">
			<xs:element name="apple" type="js:Apple"/>
			<xs:element name="orange" type="js:Orange"/>
		</xs:choice>
	</xs:complexType>

Expected output: fruits: struct<apple: array<struct<...>>, orange: array<struct<...>>>

Actual output: fruits: struct<apple: struct<...>, orange: struct<...>>

Proposed fix: https://github.com/databricks/spark-xml/blob/ddd1ef573a5318748763fafc974e4f7d8876fd6f/src/main/scala/com/databricks/spark/xml/util/XSDToSchema.scala#L227

-               if (element.getMaxOccurs == 1) {
+               if (element.getMaxOccurs == 1 && choice.getMaxOccurs == 1) {

marpetr avatar Jun 17 '24 08:06 marpetr

I think it would be fine to support this. I think the change is somewhat different though.

If an xs:element within the xs:choice has maxOccurs > 1, then that choice is an array type. That much works now.

If xs:choice has maxOccurs > 1, then the result isn't a struct, but an array of struct. The resulting Seq of StructField would have to be wrapped up in another ArrayType in this case to express this, I think.

If you can try that and it works would you open a pull request to test?

srowen avatar Jun 17 '24 13:06 srowen

@marpetr @srowen can I take over this issue ? I would like to work on this if possible ?

ranadheerg avatar Jul 26 '24 22:07 ranadheerg

Sure, you can open a pull request if you like

srowen avatar Jul 26 '24 22:07 srowen