OpenXLSX icon indicating copy to clipboard operation
OpenXLSX copied to clipboard

Writing largeish sheets/xlsx-files(30-50MB) seems very slow/unable to complete writing files from std::strings

Open og-yona opened this issue 1 year ago • 2 comments

Hello!

I'm working with a project where I need to handle semi-large csv- and xlsx-files, and I tried to add OpenXLSX as part of my project for handling reading and writing the xlsx-file part.

Reading the example xlsx-file with 5 sheets to std::string-storages finishes in 7 seconds, which is very nice and fast.

But when trying to write the same data from std::strings as a new xlsx -file, the process keeps getting exponentially slower the more data/sheets it has already written. Basically OpenXLSX was unable to complete writing the data back to xlsx. I waited for 1,5 hours and had to kill the process becouse it was seemingly stuck at writing one sheet.

Writing a cell/row at a time makes basically no difference.

My problem might be related to this issue: https://github.com/troldal/OpenXLSX/issues/154

Is there any way to skip the shared strings -checks, and just write everything as plain strings? Or does someone have any other tips which might make writing files actually usable when dealing with larger random string-data?

image

og-yona avatar Mar 15 '24 11:03 og-yona

Answering for myself, and for future reference in case someone is having the same issue.

Looking around the openxlsx files I managed to find a sort of fix, at least for my case.:

in XLCellValue.cpp i commented out lines 402, 405 and 409:

// ===== Set the type attribute. m_cellNode->attribute("t").set_value("s"); // ===== Get or create the index in the XLSharedStrings object. auto index = (m_cell->m_sharedStrings.stringExists(stringValue) ? m_cell->m_sharedStrings.getStringIndex(stringValue) : m_cell->m_sharedStrings.appendString(stringValue)); // ===== Set the text of the value node. m_cellNode->child("v").text().set(index);

and uncommented lines 412 and 413 instead:

// m_cellNode->attribute("t").set_value("str"); // m_cellNode->child("v").text().set(stringValue);

without touching the following lines, uncommenting these at 415-419 caused problems....

// auto s = std::string_view(stringValue); // if (s.front() == ' ' || s.back() == ' ') { // if (!m_cellNode->attribute("xml:space")) m_cellNode->append_attribute("xml:space"); // m_cellNode->attribute("xml:space").set_value("preserve"); // }

Saving my earlier example file was now done in less than 30 seconds, which is around what I was hoping for: image

edit: accidently closed the issue, not sure if my comment/uncomment tweak counts actually as solving this whole issue. image

og-yona avatar Mar 15 '24 17:03 og-yona