xmlconvert icon indicating copy to clipboard operation
xmlconvert copied to clipboard

xml_to_df ends up ignoring encoding

Open cejkiebo opened this issue 3 years ago • 0 comments

I am trying to open a xml file encoded in ISO-8859-1 (aka latin-1) using xmlconvert, yet even if I specify xml_encoding it still claims my input isn't proper UTF-8. My call and traceback are as follows:

> carbu_df <- xmlconvert::xml_to_df("./PrixCarburants_instantane.xml",
+                                   xml.encoding = "latin-1",
+                                   records.xpath = "//pdv | //prix",
+                                   fields = "attributes")
Error in read_xml.raw(charToRaw(enc2utf8(x)), "UTF-8", ..., as_html = as_html,  : 
  Input is not proper UTF-8, indicate encoding !
Bytes: 0xE8 0x73 0x2D 0x4C [9]
> traceback()
4: read_xml.raw(charToRaw(enc2utf8(x)), "UTF-8", ..., as_html = as_html, 
       options = options)
3: read_xml.character(text, encoding = xml.encoding)
2: xml2::read_xml(text, encoding = xml.encoding)
1: xmlconvert::xml_to_df("./PrixCarburants_instantane.xml", xml.encoding = "latin-1", 
       records.xpath = "//pdv | //prix", fields = "attributes")

Loading the file using xml2::read_xml("./PrixCarburants_instantane.xml", encoding="latin-1") does work, and so does opening the file using Notepad and saving it as UTF-8 (which is a bit tedious). It appears to me that enc2utf8 and charToRaw somehow isn't doing its job when being confronted with direct latin-1 input.

My dataset can be found here

cejkiebo avatar Feb 03 '23 15:02 cejkiebo