jackson-dataformat-xml icon indicating copy to clipboard operation
jackson-dataformat-xml copied to clipboard

Allow specifying encodings other than UTF-8 in XML declaration written

Open cassiomolin opened this issue 7 years ago • 14 comments

The UTF-8 encoding is hard coded in the ToXmlGenerator source code:

if (Feature.WRITE_XML_1_1.enabledIn(_formatFeatures)) {
    _xmlWriter.writeStartDocument("UTF-8", "1.1");
} else if (Feature.WRITE_XML_DECLARATION.enabledIn(_formatFeatures)) {
    _xmlWriter.writeStartDocument("UTF-8", "1.0");
} else {
    return;
}

Once ToXmlGenerator is final, there might not be an easy way to have other encodings such as ISO-8859-1:

<?xml version="1.0" encoding="ISO-8859-1"?>

See this question in Stack Overflow for reference.

cassiomolin avatar Oct 12 '18 12:10 cassiomolin

Ah. Yes, I see. So although underlying writer may actually use different encoding, xml declaration claims it is UTF-8. That's not good.

cowtowncoder avatar Dec 16 '18 01:12 cowtowncoder

Additionaly, XmlFactory outputs utf-8 only:

protected XMLStreamWriter _createXmlWriter(IOContext ctxt, OutputStream out) throws IOException
    {
     XMLStreamWriter sw;
     try {
         sw = _xmlOutputFactory.createXMLStreamWriter(_decorate(ctxt, out), "UTF-8");
     } catch (Exception e) {
         throw new JsonGenerationException(e.getMessage(), e, null);
     }
     return _initializeXmlWriter(sw);
}

saimonsez avatar May 12 '20 10:05 saimonsez

@saimonsez That is intentional however (partly since there is no mechanism to pass non-Unicode encodings); caller is expected to create Writer for alternate encodings.

However, I hope to introduce a mechanism to allow users to create document "header" (xml declaration and/or DOCTYPE declaration) via XMLStreamWriter, which would allow adding encoding in xml declaration.

cowtowncoder avatar May 12 '20 20:05 cowtowncoder

I see, thank you for clarification. In my case, the caller is spring-web (AbstractJackson2HttpMessageConverter) without a chance to configure an enocding other than unicode, so I am stuck again. Are you by chance involved with springs integration of jackson?

saimonsez avatar May 14 '20 06:05 saimonsez

I just created https://github.com/spring-projects/spring-framework/issues/25076 which is related to this issue (if jackson is used with spring).

saimonsez avatar May 14 '20 07:05 saimonsez

I am only involved whenever Spring folks file bugs, but do not know their code base (and they don't use, I think, Jackson JAX-RS provider). Their involvement would be needed even if new functionality / endpoints were added, for what that is worth.

cowtowncoder avatar May 14 '20 23:05 cowtowncoder

This is still moved for 2 years from 2.10 to 2.11 to 2.12. So next is2.13?

Any stupid workaround would be great.

kromit avatar Aug 18 '20 07:08 kromit

@kromit You are absolutely welcome to provide a fix as you seem to need it.

cowtowncoder avatar Aug 18 '20 14:08 cowtowncoder

@cowtowncoder I've looked into this and I would break significantly more things on the way, than I would fix. :see_no_evil: Not sure if my workaround is legit but this is what I am using.

private final String DOCTYPE ="<?xml version=\"1.0\" encoding=\"iso-8859-1\"?>\n";
  
Writer writer = new OutputStreamWriter(out, StandardCharsets.ISO_8859_1);
writer.write(DOCTYPE);

XmlMapper xmlMapper = new XmlMapper();
xmlMapper.writeValue(writer, value);

kromit avatar Aug 18 '20 15:08 kromit

Couple of possibly helpful pointers:

  • #150 is related; there should be a way to customize writing of DOCTYPE as well as xml declaration
  • You could also manually create XMLStreamWriter, use writer.writeStartDocument(...) to initialize, pass to XML-specific mapper.writeValue() method. But if you do so, need to disable ToXmlGenerator.Feature.WRITE_XML_DECLARATION (in fact that may already be necessary in your case?)

The idea with #150 (which I really would like to get in 2.12 if I have time) would be to allow registering a writer callback that would write all pre-amble events (xml declaration and/or DOCTYPE) the way caller wants. Conceptually simple just need to think of a way to do that in a way that fits with format-specific handling of Jackson's databind (most API is format-agnostic).

cowtowncoder avatar Aug 18 '20 16:08 cowtowncoder