bagit-python icon indicating copy to clipboard operation
bagit-python copied to clipboard

Avoid nested bags by default

Open theatischbein opened this issue 7 months ago • 0 comments

In #186 I describe the unwanted creation of nested bags. This PR also closes the issue. Currently it is not transparent that a nested bag is created.

Because it may be used, I implemented a flag that still allows the creation of nested bags, but by default a RuntimeError will be raised.

Changes

  • added a function is_bag(bag_dir), which uses the Bag constructor to test whether a directory is already a bag.
  • add flag allow_nested_bag=False to function make_bag
  • add logic to function make_bag that raises a RuntimeError if the given bag_dir is already a bag using the new function is_bag
  • add test cases for the functions is_bag and make_bag

Tests

All test are running successfully with my changes. See the log for more information.

Details of output of test.py

❯ python test.py
/home/thea/git/bagit-python/bagit.py:1451: DeprecationWarning: 'count' is passed as positional argument
  s = re.sub(r"%0D", "\r", s, re.IGNORECASE)
/home/thea/git/bagit-python/bagit.py:1452: DeprecationWarning: 'count' is passed as positional argument
  s = re.sub(r"%0A", "\n", s, re.IGNORECASE)
.........../home/thea/git/bagit-python/bagit.py:165: DeprecationWarning: The `checksum` argument for `make_bag` should be replaced with `checksums`
  warnings.warn(
...Disabling requested hash algorithm not-really-a-name: hashlib does not support it
An error occurred creating a bag in /tmp/tmp8450qsbp
Traceback (most recent call last):
  File "/home/thea/git/bagit-python/bagit.py", line 260, in make_bag
    total_bytes, total_files = make_manifests(
                               ~~~~~~~~~~~~~~^
        "data", processes, algorithms=checksums, encoding=encoding
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/home/thea/git/bagit-python/bagit.py", line 1275, in make_manifests
    checksums = [manifest_line_generator(i) for i in _walk(data_dir)]
                 ~~~~~~~~~~~~~~~~~~~~~~~^^^
  File "/home/thea/git/bagit-python/bagit.py", line 1418, in generate_manifest_lines
    hashers = get_hashers(algorithms)
  File "/home/thea/git/bagit-python/bagit.py", line 1136, in get_hashers
    raise ValueError(
    ...<3 lines>...
    )
ValueError: Unable to continue: hashlib does not support any of the requested algorithms!
.Bag directory /home/thea/git/bagit-python/this-directory-does-not-exist does not exist
.....The following files do not have read permissions:
('/tmp/tmpnompusg1/loc/2478433644_2839c5e8b8_o_d.jpg',)
An error occurred creating a bag in /tmp/tmpnompusg1
Traceback (most recent call last):
  File "/home/thea/git/bagit-python/bagit.py", line 229, in make_bag
    raise BagError(
        _("Read permissions are required to calculate file fixities")
    )
bagit.BagError: Read permissions are required to calculate file fixities
.Unable to write to the following directories and files:
['/tmp/tmpgl1u4_go']
An error occurred creating a bag in /tmp/tmpgl1u4_go
Traceback (most recent call last):
  File "/home/thea/git/bagit-python/bagit.py", line 213, in make_bag
    raise BagError(
        _("Missing permissions to move all files and directories"))
bagit.BagError: Missing permissions to move all files and directories
.The following directories do not have read permissions:
('/tmp/tmp4qzlyr7a/loc',)
An error occurred creating a bag in /tmp/tmp4qzlyr7a
Traceback (most recent call last):
  File "/home/thea/git/bagit-python/bagit.py", line 229, in make_bag
    raise BagError(
        _("Read permissions are required to calculate file fixities")
    )
bagit.BagError: Read permissions are required to calculate file fixities
.Unable to write to the following directories and files:
['/tmp/tmp6t3bs2_m', '/tmp/tmp6t3bs2_m/loc']
An error occurred creating a bag in /tmp/tmp6t3bs2_m
Traceback (most recent call last):
  File "/home/thea/git/bagit-python/bagit.py", line 213, in make_bag
    raise BagError(
        _("Missing permissions to move all files and directories"))
bagit.BagError: Missing permissions to move all files and directories
..........The following files do not have read permissions:
('/tmp/tmpcutz17p7/bag-info.txt',)
..........Creating bag for directory /tmp/tmp65leir02
Creating data directory
Moving si to /tmp/tmp65leir02/tmpjdlclbxc/si
Moving loc to /tmp/tmp65leir02/tmpjdlclbxc/loc
Moving README to /tmp/tmp65leir02/tmpjdlclbxc/README
Moving /tmp/tmp65leir02/tmpjdlclbxc to data
Using 1 processes to generate manifests: sha256, sha512
Generating manifest lines for file data/README
Generating manifest lines for file data/loc/2478433644_2839c5e8b8_o_d.jpg
Generating manifest lines for file data/loc/3314493806_6f1db86d66_o_d.jpg
Generating manifest lines for file data/si/2584174182_ffd5c24905_b_d.jpg
Generating manifest lines for file data/si/4011399822_65987a4806_b_d.jpg
Creating bagit.txt
Creating bag-info.txt
Creating /tmp/tmp65leir02/tagmanifest-sha256.txt
Creating /tmp/tmp65leir02/tagmanifest-sha512.txt
..............................bag-info.txt defines multiple Payload-Oxum values!
...data/README exists in manifest but was not found on filesystem
data/extra_file exists on filesystem but is not in the manifest
...data/README sha256 validation failed: expected="9006a02daf291a3ce8eebbb094ed3d17fcb0177b8e8d3421fbb8a080a2be48bf" found="d54d79889e20997c4b265488131fb593580f1885b3a5d75df49fe7f6604b66d0"
data/README sha512 validation failed: expected="06f3dedbd5c7796b75a7d5021aaf54559e0679c27b37d355f65ea64e31fd29a70b6e06e5c0b73fad809c579fb0f6fb7076ceec055c17a173e49007955c9f5820" found="c758e703c015e05a7e0631cb4f15ed5397c318e8ad56e1227ad2ce974d00c33642ec413172414545102708cb326176935e30e41c1f72733c894c2fb031477145"
..tmpk6fiecpp/tagfile md5 validation failed: expected="8e2af7a0143c7b8f4de0b3fc90f27354" found="098f6bcd4621d373cade4e832627b4f6"
tmpk6fiecpp/tagfile exists in manifest but was not found on filesystem
.tmp79jtp40e/tagfolder/tagfile md5 validation failed: expected="8e2af7a0143c7b8f4de0b3fc90f27354" found="098f6bcd4621d373cade4e832627b4f6"
tmp79jtp40e/tagfolder/tagfile exists in manifest but was not found on filesystem
.Unable to calculate file hashes for /tmp/tmprxq331w5
Traceback (most recent call last):
  File "/home/thea/git/bagit-python/bagit.py", line 916, in _validate_entries
    pool = multiprocessing.Pool(
        processes if processes else None, initializer=worker_init
    )
  File "/usr/lib/python3.13/unittest/mock.py", line 1169, in __call__
    return self._mock_call(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/unittest/mock.py", line 1173, in _mock_call
    return self._execute_mock_call(*args, **kwargs)
           ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/unittest/mock.py", line 1228, in _execute_mock_call
    raise effect
RuntimeError
.bag-info.txt exists in manifest but was not found on filesystem
data/extra_file exists on filesystem but is not in the manifest
.data/loc/2478433644_2839c5e8b8_o_d.jpg md5 validation failed: expected="9a2b89e9940fea6ac3a0cc71b0a933a0" found="Could not read /tmp/tmprxwtxkyt/data/loc/2478433644_2839c5e8b8_o_d.jpg: [Errno 13] Permission denied: '/tmp/tmprxwtxkyt/data/loc/2478433644_2839c5e8b8_o_d.jpg'"
.bag-info.txt exists in manifest but was not found on filesystem
data/README exists in manifest but was not found on filesystem
data/extra exists on filesystem but is not in the manifest
.data/README md5 validation failed: expected="8e2af7a0143c7b8f4de0b3fc90f27354" found="fd41543285d17e7c29cd953f5cf5b955"
................bag-info.txt defines multiple Payload-Oxum values!
...data/README exists in manifest but was not found on filesystem
data/extra_file exists on filesystem but is not in the manifest
...data/README sha256 validation failed: expected="9006a02daf291a3ce8eebbb094ed3d17fcb0177b8e8d3421fbb8a080a2be48bf" found="d54d79889e20997c4b265488131fb593580f1885b3a5d75df49fe7f6604b66d0"
data/README sha512 validation failed: expected="06f3dedbd5c7796b75a7d5021aaf54559e0679c27b37d355f65ea64e31fd29a70b6e06e5c0b73fad809c579fb0f6fb7076ceec055c17a173e49007955c9f5820" found="c758e703c015e05a7e0631cb4f15ed5397c318e8ad56e1227ad2ce974d00c33642ec413172414545102708cb326176935e30e41c1f72733c894c2fb031477145"
..tmp9s2ei8kh/tagfile md5 validation failed: expected="8e2af7a0143c7b8f4de0b3fc90f27354" found="098f6bcd4621d373cade4e832627b4f6"
tmp9s2ei8kh/tagfile exists in manifest but was not found on filesystem
.tmp5na6jn06/tagfolder/tagfile md5 validation failed: expected="8e2af7a0143c7b8f4de0b3fc90f27354" found="098f6bcd4621d373cade4e832627b4f6"
tmp5na6jn06/tagfolder/tagfile exists in manifest but was not found on filesystem
.bag-info.txt exists in manifest but was not found on filesystem
data/extra_file exists on filesystem but is not in the manifest
.data/loc/2478433644_2839c5e8b8_o_d.jpg md5 validation failed: expected="9a2b89e9940fea6ac3a0cc71b0a933a0" found="Could not read /tmp/tmpcmz8z7bq/data/loc/2478433644_2839c5e8b8_o_d.jpg: [Errno 13] Permission denied: '/tmp/tmpcmz8z7bq/data/loc/2478433644_2839c5e8b8_o_d.jpg'"
.bag-info.txt exists in manifest but was not found on filesystem
data/README exists in manifest but was not found on filesystem
data/extra exists on filesystem but is not in the manifest
.data/README md5 validation failed: expected="8e2af7a0143c7b8f4de0b3fc90f27354" found="fd41543285d17e7c29cd953f5cf5b955"
.
----------------------------------------------------------------------
Ran 117 tests in 1.151s

OK

theatischbein avatar Jun 03 '25 08:06 theatischbein