Error: Non-whitespace before first tag. Line: 0 Column: 1 Char:
This should work out of the box, since xml2js has now code to remove the byte order mark before parsing.
Originally posted by @Leonidas-from-XIV in https://github.com/Leonidas-from-XIV/node-xml2js/issues/345#issuecomment-264904146
But it doesn't seem that way for other characters. To reproduce the error:
curl -O https://habd.as/post/show-latest-posts-github-profile/assets/index.xml
npm i -g [email protected]
Drop in the script provided in #345:
const fs = require('fs');
const util = require('util')
const parseString = require('xml2js').parseString;
const fileToParse = ('./index.xml')
const outputFilename=('output.json')
fs.readFile(fileToParse, function (err, data) {
parseString(data, function (err, result) {
console.dir(result);
if (err) {
console.log(err)
}
// do something with the JS object called result, or see below to save as JSON
fs.writeFile('output.json', JSON.stringify(result), (writeErr) => {
if (writeErr) throw writeErr;
console.log('The file has been saved to ' + outputFilename);
});
});
});
And you'll get the output:
undefined
Error: Non-whitespace before first tag.
Line: 0
Column: 1
Char:
at error (/Users/jos/Developer/node_modules/sax/lib/sax.js:651:10)
at strictFail (/Users/jos/Developer/node_modules/sax/lib/sax.js:677:7)
at beginWhiteSpace (/Users/jos/Developer/node_modules/sax/lib/sax.js:951:7)
at SAXParser.write (/Users/jos/Developer/node_modules/sax/lib/sax.js:1006:11)
at Parser.exports.Parser.Parser.parseString (/Users/jos/Developer/node_modules/xml2js/lib/parser.js:323
:31)
at Parser.parseString (/Users/jos/Developer/node_modules/xml2js/lib/parser.js:5:59)
at exports.parseString (/Users/jos/Developer/node_modules/xml2js/lib/parser.js:369:19)
at /Users/jos/Developer/balibebas-gh-repo/parse.js:8:5
at FSReqCallback.readFileAfterClose [as oncomplete] (internal/fs/read_file_context.js:61:3)
The file has been saved to output.json
Notice the XML is parsed as expected when opened in Firefox and Chrome: https://habd.as/post/show-latest-posts-github-profile/assets/index.xml
I tried replacing the first character (0x1F) from the file using a hex editor and running the parsing script (above) again but the issue persists. I suppose this may be an issue in the SAX parser—but I'm unable to identify the root cause.
I don't think this is the most elegant way to go about it. But I was able to get rid of theerror by parsing the string of xml.
let data = JSON.parse(JSON.stringify(myRequest.responseText))
data = parseString(data, (err, result) => {
console.dir(result);
if (err) {
console.log(err);
}
})