Known file corruption risk when copying between s3-compatible and s3-incompatible filesystems
Background
The file/object concepts of s3-compatible services and other file systems are different. So their respective file hierarchies(/layouts) does not have one-to-one correspondence.
For example[^bonus] in s3 both ke/y and ke//y are valid object keys. However in many
filesystems (especially in Unix-like OSs) slash is file seperator and both of ke/y and ke//y
keys are mapped to ke/y that is file y in directory ke. So in sync and cp operations
from s3 to such a filesystem, both of those files will be written into same file.
What happens
s5cmd allows concurrent download of different files and different parts of the same
file. So it is possible that both ke/y and ke//y objects be downloaded
concurrently. If there is a time interval that both download operations opened
the file (which they can) then both of them will write to same file and the content
of the downloaded file will be corrupted (that is contents of ke/y and ke//y will
be arbitrarily interleaved and overridden by one another).
Conclusion
This problem arises from the fundamental incompatibility of s3 and other filesystems. So it doesn't have a solution in a sense.
As an attempt of mitigation, one may propose blocking concurrent writes to same file
so that the end file will be copy of either ke/y or ke//y instead of being
corrupted. However, this will also block valid concurrent download of the file.
So should not be done.
Nevertheless a warning that emphasize these limitations may be given in the readme.
ps. We (with @seruman) have noticed this corruption problem while discussing how "/" and "\" can be handled considering incompatibility of s3, unix-like systems and windows.
[^bonus]: There is also another similar incompatibility problem: in s3 both keand ke/y
can coexists. But in unix-like filesystems ke/y implies the existence of directory ke
which conflicts with a file named ke.
Maybe an idea of a workaround: a user could provide an option with regex or other mapping to convert filepath when syncing between filesystem<=>S3. This would be very useful for uploading files for static sites. For instance the mapping could specify cleaning html extension when uploading built static nextjs site for "/path", "/path/subpath", which has "path.html" and "path/subpath.html" files and "path" folder. While in the bucket the files should be named "path", "path/subpath".