Oliverity
Oliverity
It seems the underlying problem was more or less addressed in the Node.js itself in 2010: http://debuggable.com/posts/streaming-utf-8-with-node-js:4bf28e8b-a290-432f-a222-11c1cbdd56cb Now it's just that people don't readily develop a habit of using instructions...
Made a pull request https://github.com/morungos/node-word-extractor/pull/55, but I'm not happy with it. Seems more like a quick and dirty trick than a robust solution.
I've found out that such XML files inside DOCX could only be in **UTF-8** or **UTF-16** (probably just one of the UTF-16*BE* and UTF-16*LE*, but I don't know which one)....