s5cmd icon indicating copy to clipboard operation
s5cmd copied to clipboard

If file is updated during the list operation of a download operation, it is not downloaded/synced

Open Chris-Dee opened this issue 3 years ago • 0 comments

I've tried this on v1.4.0 and v2.0.0.

I have a workload to run a download (cp -u -s or sync) a lot of objects onto a local filesystem, which means that the list operation on the target bucket can take a long time. Several of these objects are updated very frequently (sometimes multiple times per minute, or even more often). Non-deterministically, several of these files are not synced. This happens much more frequently with files in large directories, or in directories/subdirectories who are later in alphabetical order (as S3 gets to them last), and mroe likely to happen with lower num-workers.

This looks to be an issue with these https://github.com/peak/s5cmd/blob/master/storage/s3.go#L212 checks, as files updated after the list operation starts are passed over (and not uploaded at all, even if the old versions are retained).

This works for my use case when I remove these checks (though it looks like they were added for some recursion issue, so probably would break for others?).

Chris-Dee avatar Aug 19 '22 21:08 Chris-Dee