cif::pdb::reconstruct_pdbx is very slow
Hello, many thanks for the development and maintenance of libcifpp!
I've noticed that cif::pdb::reconstruct_pdbx is very slow. E.g. on 7soy mmCIF file from the PDB it takes < 0.2 seconds to parse, but running cif::pdb::reconstruct_pdbx on it takes roughly 4.5 seconds, i.e. a 20x slow-down if one wants to perform the correctness check/autofix.
Vast majority of the time is spent in cif::compound_factory::create:
Could that time be reduced? Also, cif::compound_factory::create seems to be called from multiple places. Would it make sense to cache that load?
I think that this could also be sped up if the CCD was compressed using zstd instead of gzip, as it decompresses much faster.
Could it be that your components.cif file is compressed? What happens if you extract that file, the one in /var/cache/libcifpp, does that help?
You mentioned using zstd. That's a good suggestion, but the point is, when you use the bundled script to update components.cif it will write out a file uncompressed. Removing the need for decompression entirely.
As a reference, cif-validate on 7soy takes 0.2 seconds on my laptop:
$ time build/cif-validate /tmp/7soy.cif.gz
real 0m0,246s
user 0m0,239s
sys 0m0,007s
Due to lack of response, I'm closing this issue
I am sorry, I was too busy with other things. But it is very likely it was due to compression. We managed to work around this by removing the validation since we validate the CIF files separately.