s3cmd icon indicating copy to clipboard operation
s3cmd copied to clipboard

S3 Bucket - Sync with Versioning

Open p-obrien opened this issue 5 years ago • 3 comments

I have a number of EBS Snapshots of a large EBS volume and I want to use s3cmd to store the files in S3 using versioning so I can effectively de-duplicate the files and delete the EBS Snapshots.

I restored the original volume and ran the following command: python c:\s3cmd\s3cmd sync . s3://bucket name/test-folder/

All the files were loaded into the S3 bucket which already had versioning enabled.

I then removed the snapshot and mounted the next in the sequence and ran the same command: python c:\s3cmd\s3cmd sync . s3://bucket name/test-folder

However the S3 Bucket shows that every time I run the command a new version is created even if the file is not modified in anyway.

I expect the sync command would only upload content if the checksums are different, am I missing something?

p-obrien avatar Apr 29 '21 01:04 p-obrien

Using the debug command I can see s3cmd thinks the md5 of the file doesn't match between the local volume and S3.

The interesting thing is if I run the command multiple times the source MD5 remains the same but the destination md5 is different each time, checking the x-amz-meta-s3cmd-attrs attribute in S3 I can see it matches the source md5.

DEBUG: Applying --exclude/--include DEBUG: CHECK: test-folder/test-file.txt DEBUG: PASS: 'test-folder/test-file.txt' DEBUG: CHECK: test-file.txt DEBUG: PASS: 'test-file.txt' INFO: Found 1 local files, 2 remote files INFO: Verifying attributes... DEBUG: Comparing filelists (direction: local -> remote) DEBUG: CHECK: test-file.txt DEBUG: XFER: test-file.txt (md5 mismatch: src=55e69a552b710d7343dbd7e5f6058dc8 dst=386fb30a06552c5f14c4517011e8da45) INFO: Summary: 1 local files to upload, 0 files to remote copy, 1 remote files to delete DEBUG: attr_header: {'x-amz-meta-s3cmd-attrs': 'md5:55e69a552b710d7343dbd7e5f6058dc8'} DEBUG: DeUnicodising '.\test-file.txt' using cp1252 DEBUG: DeUnicodising '.\test-file.txt' using cp1252 DEBUG: DeUnicodising '.\test-file.txt' using cp1252 WARNING: Module python-magic is not available. Guessing MIME types based on file extensions. DEBUG: CreateRequest: resource[uri]=/test-folder/test-file.txt upload: '.\test-file.txt' -> 's3:///test-folder/test-file.txt' [1 of 1] DEBUG: DeUnicodising '.\test-file.txt' using cp1252 DEBUG: Using signature v4 DEBUG: get_hostname(): .s3.amazonaws.com DEBUG: canonical_headers = content-length:131 content-type:text/plain host: x-amz-content-sha256:7457bb2d01cb7bed77f2517b048753a571618a3576eeb7830c81e5410aa8f29e x-amz-date:20210429T015222Z x-amz-meta-s3cmd-attrs:md5:55e69a552b710d7343dbd7e5f6058dc8 x-amz-storage-class:STANDARD

p-obrien avatar Apr 29 '21 02:04 p-obrien

I'm encountering this issue as well. Using the --no-check-md5 flag seems to be a workaround but isn't viable for us as we have files that are modified but don't necessarily change in size.

dcoobs avatar Jul 12 '21 16:07 dcoobs

Disabling server-side encryption (SSE) seems to have fixed this issue for me. When turned on, the ETag isn't the MD5 checksum (see the API documentation). After disabling SSE on the bucket, uploading the file once more creates the correct MD5 and subsequent upload attempts calculate properly.

dcoobs avatar Jul 12 '21 16:07 dcoobs