OpenXLSX icon indicating copy to clipboard operation
OpenXLSX copied to clipboard

OpenXLSX does not do garbage collection in sharedStrings.xml

Open afalkenhahn opened this issue 3 years ago • 1 comments

When overwriting a string in a cell with a new, unique string, the new string is simply appended to the end of the table in sharedStrings.xml without the old string getting removed from sharedStrings.xml in case it's no longer used.

afalkenhahn avatar Oct 05 '22 14:10 afalkenhahn

This is an unfortunate consequence of the complex indexing that Excel does across worksheets - shared strings have no explicit index, they are referred by cells only by their position inside the shared strings xml array. This means every time a shared string would be deleted, the whole workbook would require re-indexing.

This could possibly be addressed in a future patch by a function "cleanupSharedStrings" or something like that, which does the reindexing once, on the users request, and letting the user control when the computation overhead happens.

I'll keep this open for now but can't promise a quick implementation :)

aral-matrix avatar Aug 19 '24 23:08 aral-matrix

Guess what :) https://github.com/troldal/OpenXLSX/commit/4589a6c647ebbd1b11bf5c533e0eb19f3d047afb XLDocument now has XLDocument::cleanupSharedStrings() (in the development-aral branch) - and it's not even half bad in terms of performance (tested with a huge workbook and ca. 500KB of shared strings XML).

aral-matrix avatar Feb 02 '25 15:02 aral-matrix

Functionality is now merged into master.

aral-matrix avatar Feb 03 '25 16:02 aral-matrix