etree icon indicating copy to clipboard operation
etree copied to clipboard

when write to file the cdata tag were removed

Open yudar1024 opened this issue 5 years ago • 8 comments

	log.Print("this is a log")
	doc := etree.NewDocument()
	if err := doc.ReadFromFile("D:\\Program\\FineReport_10.0\\webapps\\webroot\\WEB-INF\\reportlets\\books\\stock_detail_daily.cptx"); err != nil {
		panic(err)
	}
	databaseName := doc.FindElements("./WorkBook/TableDataMap/TableData/Connection/DatabaseName")
	for _, element := range databaseName {
		log.Print(element.Text())
		content := strings.Split(element.Text(), ".")
		log.Println(content)
		element.SetText(content[0])
	}
	doc.WriteToFile("D:\\Program\\FineReport_10.0\\webapps\\webroot\\WEB-INF\\reportlets\\books\\stock_detail_daily.cptx")

xml content origin

<Border>
<Top style="1" color="-4144960"/>
<Bottom style="1" color="-4144960"/>
<Left style="1" color="-4144960"/>
<Right style="1" color="-4144960"/>
</Border>
</Style>
<Style horizontal_alignment="0" imageLayout="1">
<Format class="com.fr.base.CoreDecimalFormat" roundingMode="6">
<![CDATA[#0.00]]></Format>
<FRFont name="SimSun" style="0" size="72"/>

xml content after run program

<Background name="NullBackground"/>
<Border>
<Top style="1" color="-4144960"/>
<Bottom style="1" color="-4144960"/>
<Left style="1" color="-4144960"/>
<Right style="1" color="-4144960"/>
</Border>
</Style>
<Style horizontal_alignment="0" imageLayout="1">
<Format class="com.fr.base.CoreDecimalFormat" roundingMode="6">
#0.00</Format>

yudar1024 avatar Aug 27 '20 08:08 yudar1024

Your call to SetText() is causing the CDATA token to be stripped. Try using SetCData() instead.

beevik avatar Sep 12 '20 02:09 beevik

Your call to SetText() is causing the CDATA token to be stripped. Try using SetCData() instead.

I mean the original xml file already has many CDATA tag, after I load them in etree and write to another new file with out any modification, the CDATA tags were removed. how can I keep the original CDATA when I write to a new file

yudar1024 avatar Sep 12 '20 03:09 yudar1024

Because etree is using xml decoder, the decoder gets element content of CDATA only. see https://github.com/golang/go/blob/master/src/encoding/xml/xml.go#L704 text(-1, true) will strip CDATA token

skyy-x avatar Jan 11 '21 12:01 skyy-x

@beevik this is a problem and not handling cdata properly breaks many xml documents.

@skyy-x if you look like 2 lines below you'll see the node type returned is CDATA. fixing this should be as easy as checking for xml.CData type and then emitting the prefix + suffix.

james-lawrence avatar Apr 07 '21 13:04 james-lawrence

@beevik this is a problem and not handling cdata properly breaks many xml documents.

@skyy-x if you look like 2 lines below you'll see the node type returned is CDATA. fixing this should be as easy as checking for xml.CData type and then emitting the prefix + suffix.

The type of Token is CharData, not xml.CDATA. https://github.com/golang/go/blob/master/src/encoding/xml/xml.go#L572 In addition, text section will also be CharData type, which means "raw text".

skyy-x avatar Apr 08 '21 09:04 skyy-x

ah my apologies. you can look here https://github.com/antchfx/xmlquery/blob/master/parse.go#L195 for an implementation around CharData for detecting cdata elements.

james-lawrence avatar Apr 08 '21 11:04 james-lawrence

I am having the same issue where the CDATA tags are being stripped from the whole XML document. Is there any workaround or fix?

mmandolesi-g avatar Apr 14 '21 22:04 mmandolesi-g

I am also having similar issue where the CDATA tags are being stripped from the whole XML document. @beevik Is there any workaround or fix?

abiramisinnarajan avatar Jul 07 '21 13:07 abiramisinnarajan

Unfortunately, the XML decoder this package is built on (encoding/xml) produces tokens that strip out the distinction between character data and CDATA elements. See https://pkg.go.dev/encoding/xml#Token. I am not aware of any way to obtain this information while decoding using the go XML decoder.

beevik avatar May 02 '23 01:05 beevik