openff-toolkit icon indicating copy to clipboard operation
openff-toolkit copied to clipboard

Replace use of tar file with zip file

Open adalke opened this issue 4 years ago • 1 comments

The test suite uses the tar.gz files AlkEthOH_tripos.tar.gz and FreeSolv.tar.gz .

Test configuration requires reading those files to figure out what names are present. Test runs require unpacking the contents to a temporary file. Together this add over a second to each test run, simply during import.

This can be reduced by replacing those tar.gz files with a zip file. The zip file includes an index, so filename discovery is fast, and the entries can be extracted directly, without going through a temporary unpack.

The test discovery process replaces:

    with tarfile.open(alkethoh_tar_file_path, "r:gz") as tar:
        # Collect all the files discarding the duplicates in the test_filt1 folder.
        slow_test_cases = {
            extract_id(m.name)
            for m in tar.getmembers()
            if "crd" in m.name and "test_filt1" not in m.name
        }

with something like:

    with zipfile.ZipFile(alkethoh_zip_file_path) as z:
        # Collect all the files discarding the duplicates in the test_filt1 folder.
        slow_test_cases = {
            extract_id(m)
            for m in z.namelist()
            if "crd" in m and "test_filt1" not in m
        }

There is some space cost to having a zip file; the files are larger:

-rw-r--r--  1 dalke  admin  5042284 May 26 16:00 AlkEthOH_tripos.tar.gz
-rw-r--r--  1 dalke  admin  8474005 May 27 14:58 AlkEthOH_tripos.zip
-rw-r--r--  1 dalke  admin  1484592 May 26 16:01 FreeSolv.tar.gz
-rw-r--r--  1 dalke  admin  2406597 May 27 15:21 FreeSolv.zip

The tests.utils functions like get_alkethoh_file_path() will also need to be updated to use the zip file.

If accepted, this should reduce test time, and simplify test configuration.

adalke avatar May 27 '21 15:05 adalke

I like these runtime improvements, but the our git file history has become embarrassingly large, and making this change would make it even larger. This could be a good improvement to tackle alongside #43.

j-wags avatar May 28 '21 02:05 j-wags