jackson-dataformat-xml icon indicating copy to clipboard operation
jackson-dataformat-xml copied to clipboard

Support `CharacterEscapes` using Stax (Woodstox/Aalto) functionality

Open cowtowncoder opened this issue 12 years ago • 9 comments

Currently CharacterEscapes system does not work for XML module, mostly because module has no direct control over output escaping. However, Stax2 extension that Woodstox (and Aalto I think?) implements does have similar functionality, using properties:

  • P_TEXT_ESCAPER (for cdata)
  • P_ATTR_VALUE_ESCAPER (for attribute values)

so it would be great to use that functionality to support needs for customized escapes, if at all possible. It is hard to say for sure whether that would work, but it should be easy enough to check.

cowtowncoder avatar Aug 19 '13 21:08 cowtowncoder

Ok. So two approaches I think, depending on how fancy this should be:

  1. Simply allow registration of Stax2 EscapingWriterFactory, which allows user to implement escaping logic. More work for users, not much for Jackson
  2. Implement EscapingWriterFactory that can use CharacterEscapes for determination. Much nicer from user perspective, but more work and possibly bit more overhead since 2 different interfaces need to be adapted, which may have additional impedance.

cowtowncoder avatar Nov 12 '16 00:11 cowtowncoder

Is there any example of configuring EscapingWriterFactory?

dabulashvili-zz avatar Aug 17 '18 19:08 dabulashvili-zz

Unfortunately I can't find anything right now, except for actual unit test from Woodstox:

src/test/java/org/codehaus/stax/test/wstream/CharacterEscapingTest.java

which should show the idea. I should write a blog post one of these days, as I have written something about Woodstox in general (even if it's about 10 years since I actively worked with XML :) )

cowtowncoder avatar Aug 22 '18 18:08 cowtowncoder

This is ridiculous, ' and " should have been escaped automatically by default. Workaround is to implement EscapingWriterFactory:

https://stackoverflow.com/questions/56799368/escaping-quotes-using-jackson-dataformat-xml

mvysny avatar Dec 12 '19 10:12 mvysny

@mvysny uh? I do not appreciate tone of the comment: especially given that you give no context on WHERE single/double-quotes are not escaped where they should be. As far as I know they are properly escaped in XML content as per XML specification. I am also not sure how this relates to issue at hand, which is about whether it would be possible to map Jackson feature into native Woodstox mechanism (which indeed could be mechanism you reference).

cowtowncoder avatar Dec 12 '19 18:12 cowtowncoder

@cowtowncoder I apologize for the tone of my comment, I was mad by things not going as I expected, but of course that is no excuse. Thank you for your replies and for your hard work, I really appreciate it :+1:

When a POJO with text contents is serialized to XML with the default settings using Jackson's XmlMapper, the " and ' are not escaped to &quot; and &apos;. I thought the escaping was mandated by the XML spec, but I was wrong - it's not mandated. Still, I thought there would be a simple setting to always escape the five characters (<>&'") somewhere in XmlMapper; I was surprised to find that one needs to write EscapingWriterFactory to have those characters escaped.

Implementing EscapingWriterFactory is not an easy feat - it would be great to either have a documentation for that, or to have some simpler way of setting a set of chars which need to be escaped.

mvysny avatar Dec 12 '19 19:12 mvysny

Ah no problem. I know the feeling. :-)

But yes, it would be great to connect the functionality via Jackson API. And you are right, implement EscapingWriterFactory is not super easy; the best (only?) example I know of is at:

src/test/java/org/codehaus/stax/test/wstream/CharacterEscapingTest.java

of woodstox-core. I think I was hoping to write something more as part of

https://medium.com/@cowtowncoder/configuring-woodstox-xml-parser-stax2-properties-c80ef5a32ef1

but did not end up doing that, since I haven't had need to actually use it myself. ... and apparently others haven't either, for what that's worth (based on lack of Google hits).

cowtowncoder avatar Dec 13 '19 01:12 cowtowncoder

I've found one which worked for me here: https://stackoverflow.com/questions/56799368/escaping-quotes-using-jackson-dataformat-xml

I've trimmed it down a bit and converted to Kotlin, should help others too: (uses commons-lang3)

class CustomXmlEscapingWriterFactory : EscapingWriterFactory {
    override fun createEscapingWriterFor(out: Writer, enc: String?): Writer = object : Writer() {
        override fun write(cbuf: CharArray, off: Int, len: Int) {
            StringEscapeUtils.ESCAPE_XML.translate(String(cbuf, off, len), out)
        }
        override fun flush() = out.flush()
        override fun close() = out.close()
    }

    override fun createEscapingWriterFor(out: OutputStream?, enc: String?): Writer =
            throw IllegalArgumentException("not supported")
}

mvysny avatar Dec 13 '19 06:12 mvysny

Excellent! Thank you for sharing this.

cowtowncoder avatar Dec 13 '19 20:12 cowtowncoder