libcifpp icon indicating copy to clipboard operation
libcifpp copied to clipboard

cif::pdb::reconstruct_pdbx is very slow

Open Augustin-Zidek opened this issue 1 year ago • 3 comments

Hello, many thanks for the development and maintenance of libcifpp!

I've noticed that cif::pdb::reconstruct_pdbx is very slow. E.g. on 7soy mmCIF file from the PDB it takes < 0.2 seconds to parse, but running cif::pdb::reconstruct_pdbx on it takes roughly 4.5 seconds, i.e. a 20x slow-down if one wants to perform the correctness check/autofix.

Vast majority of the time is spent in cif::compound_factory::create:

image

Could that time be reduced? Also, cif::compound_factory::create seems to be called from multiple places. Would it make sense to cache that load?

I think that this could also be sped up if the CCD was compressed using zstd instead of gzip, as it decompresses much faster.

Augustin-Zidek avatar Sep 16 '24 09:09 Augustin-Zidek

Could it be that your components.cif file is compressed? What happens if you extract that file, the one in /var/cache/libcifpp, does that help?

mhekkel avatar Sep 16 '24 10:09 mhekkel

You mentioned using zstd. That's a good suggestion, but the point is, when you use the bundled script to update components.cif it will write out a file uncompressed. Removing the need for decompression entirely.

mhekkel avatar Sep 23 '24 10:09 mhekkel

As a reference, cif-validate on 7soy takes 0.2 seconds on my laptop:

$ time build/cif-validate /tmp/7soy.cif.gz

real	0m0,246s
user	0m0,239s
sys	0m0,007s

mhekkel avatar Sep 23 '24 11:09 mhekkel

Due to lack of response, I'm closing this issue

mhekkel avatar Feb 05 '25 15:02 mhekkel

I am sorry, I was too busy with other things. But it is very likely it was due to compression. We managed to work around this by removing the validation since we validate the CIF files separately.

Augustin-Zidek avatar Feb 05 '25 15:02 Augustin-Zidek