bagit-python icon indicating copy to clipboard operation
bagit-python copied to clipboard

Filenames with spaces at the cause bag incompleteness errors.

Open nkrabben opened this issue 5 years ago • 3 comments

Because bagit-python removes blank characters at the end of each manifest line, the in-memory representation of a filename like "path_with_space_at_the_end.txt " is "path_with_space_at_the_end.txt" which causes a completeness fail.

https://github.com/LibraryOfCongress/bagit-python/blob/4b76c143e61d815043f1e8bdfbb159ce98f7d978/bagit.py#L669

I think the likely reason is that L699 is being used to remove the new line characters, but is overly aggressive. I can write a test for this but want some feedback about potential solutions before trying to code that up.

nkrabben avatar Feb 11 '20 00:02 nkrabben

This isn't a block or anything, but trailing spaces in filenames can produce terrible issues on windows (maybe ntfs in general?). Windows prevents you from doing it, but I've seen drives where the space was created in another operating system and the folder/file often not only doesn't appear to a user, the space on the disk is designated free space ready for overwriting. See cause 6 here: https://support.microsoft.com/en-ie/help/320081/you-cannot-delete-a-file-or-a-folder-on-an-ntfs-file-system-volume

kieranjol avatar Feb 11 '20 07:02 kieranjol

Also I doubt that the line of code was concerned with this issue and it was more about cleaning up trailing whitespace.

kieranjol avatar Feb 11 '20 09:02 kieranjol

Doesn’t the RFC say one or more spaces as a separator? If I remember correctly - on phone, sorry - that would require using the URL encoded form.

acdha avatar Feb 11 '20 22:02 acdha