Writing largeish sheets/xlsx-files(30-50MB) seems very slow/unable to complete writing files from std::strings
Hello!
I'm working with a project where I need to handle semi-large csv- and xlsx-files, and I tried to add OpenXLSX as part of my project for handling reading and writing the xlsx-file part.
Reading the example xlsx-file with 5 sheets to std::string-storages finishes in 7 seconds, which is very nice and fast.
But when trying to write the same data from std::strings as a new xlsx -file, the process keeps getting exponentially slower the more data/sheets it has already written. Basically OpenXLSX was unable to complete writing the data back to xlsx. I waited for 1,5 hours and had to kill the process becouse it was seemingly stuck at writing one sheet.
Writing a cell/row at a time makes basically no difference.
My problem might be related to this issue: https://github.com/troldal/OpenXLSX/issues/154
Is there any way to skip the shared strings -checks, and just write everything as plain strings? Or does someone have any other tips which might make writing files actually usable when dealing with larger random string-data?
Answering for myself, and for future reference in case someone is having the same issue.
Looking around the openxlsx files I managed to find a sort of fix, at least for my case.:
in XLCellValue.cpp i commented out lines 402, 405 and 409:
// ===== Set the type attribute. m_cellNode->attribute("t").set_value("s"); // ===== Get or create the index in the XLSharedStrings object. auto index = (m_cell->m_sharedStrings.stringExists(stringValue) ? m_cell->m_sharedStrings.getStringIndex(stringValue) : m_cell->m_sharedStrings.appendString(stringValue)); // ===== Set the text of the value node. m_cellNode->child("v").text().set(index);
and uncommented lines 412 and 413 instead:
// m_cellNode->attribute("t").set_value("str"); // m_cellNode->child("v").text().set(stringValue);
without touching the following lines, uncommenting these at 415-419 caused problems....
// auto s = std::string_view(stringValue); // if (s.front() == ' ' || s.back() == ' ') { // if (!m_cellNode->attribute("xml:space")) m_cellNode->append_attribute("xml:space"); // m_cellNode->attribute("xml:space").set_value("preserve"); // }
Saving my earlier example file was now done in less than 30 seconds, which is around what I was hoping for:
edit: accidently closed the issue, not sure if my comment/uncomment tweak counts actually as solving this whole issue.