[BUG] replacement character (�) seems randomly on WorkbookReader with fileStream.
🐛 Bug Report
Lib version: 4.3.0
Hello. I'm using streams to load Excel data, and the replacement character (� - displayed as a black rhombus with a white question mark) appears in the middle of the text, about one in 2,000 lines.
I tried everything I could, but I couldn't find a solution.
Steps To Reproduce
When I re-upload and test the same file, the character does not appear or occurs in a different location. Therefore, general reproduction is difficult. Below is the code I'm using.
const fileStream = await getFileStreamSync({
key: excelFileS3ObjectKey,
});
const workbookReader = new Excel.stream.xlsx.WorkbookReader(fileStream, {
sharedStrings: 'cache',
styles: 'cache',
});
let readDataFromExcelStream;
for await (const worksheetReader of workbookReader) { // eslint-disable-line semi
if (worksheetReader.name === 'sheet1') {
readDataFromExcelStream = await createReadableStreamFromWorksheetReader({
worksheetReader,
invalidDataRowList,
});
}
}
pipeline(readDataFromExcelStream, ...);
Reading data in createReadableStreamFromWorksheetReader
for await (const row of worksheetReader) {
row.eachCell((cell, cellNumber) => {
const cellName = profileCardExcelItemKeyList[cellNumber - 1];
data = {
...data,
[cellName]: getCellValue(cell.value, cell.style),
};
});
}
The expected behaviour:
Loading normally without replacement character.
@moonjoungyoung Can you provide the full code? And your excel file?
@zurmokeeper
I'm sorry. For security reasons, it is difficult to provide the full code and excel file. But I added some code to read the data.
@moonjoungyoung Can you provide an excel file of the minimum reproduction units? You can delete some sensitive information and leave only that character that will report an error, it's hard to reproduce the problem with just the code and no file.
We are experiencing the same problem.
Data in the XLS itself seems fine, displays correct in MS Excel and Libreoffice Calc. Also can be copypasted out of those 2 without problems. Reading the XLS with WorkbookReader + Stream leads to replacement characters in seemingly random places. After making changes in the XLS new errors keep popping up in rows that haven't even been changed.
this.workbook = new ExcelJS.stream.xlsx.WorkbookReader(stream, { entries: "emit", worksheets: "emit", styles: "cache", sharedStrings: "cache", hyperlinks: "cache", });
Sadly i'm also struggling to put together an excel file that i can give out due to data security concerns.
Reading the same XLS file with
this.workbook.xlsx.read(stream);
and using the non-streaming API to access workbooks and rows does not produce the issue.
The issue lies in ExcelJS streaming the xl/sharedStrings.xml file, which causes the buffer to be sent incorrectly. We are encountering this problem, and #2084 could potentially resolve it
@vuggy17
Thank you for sharing this solution! The link you provided solved my problem perfectly.