S3 Bucket - Sync with Versioning
I have a number of EBS Snapshots of a large EBS volume and I want to use s3cmd to store the files in S3 using versioning so I can effectively de-duplicate the files and delete the EBS Snapshots.
I restored the original volume and ran the following command: python c:\s3cmd\s3cmd sync . s3://bucket name/test-folder/
All the files were loaded into the S3 bucket which already had versioning enabled.
I then removed the snapshot and mounted the next in the sequence and ran the same command: python c:\s3cmd\s3cmd sync . s3://bucket name/test-folder
However the S3 Bucket shows that every time I run the command a new version is created even if the file is not modified in anyway.
I expect the sync command would only upload content if the checksums are different, am I missing something?
Using the debug command I can see s3cmd thinks the md5 of the file doesn't match between the local volume and S3.
The interesting thing is if I run the command multiple times the source MD5 remains the same but the destination md5 is different each time, checking the x-amz-meta-s3cmd-attrs attribute in S3 I can see it matches the source md5.
DEBUG: Applying --exclude/--include
DEBUG: CHECK: test-folder/test-file.txt
DEBUG: PASS: 'test-folder/test-file.txt'
DEBUG: CHECK: test-file.txt
DEBUG: PASS: 'test-file.txt'
INFO: Found 1 local files, 2 remote files
INFO: Verifying attributes...
DEBUG: Comparing filelists (direction: local -> remote)
DEBUG: CHECK: test-file.txt
DEBUG: XFER: test-file.txt (md5 mismatch: src=55e69a552b710d7343dbd7e5f6058dc8 dst=386fb30a06552c5f14c4517011e8da45)
INFO: Summary: 1 local files to upload, 0 files to remote copy, 1 remote files to delete
DEBUG: attr_header: {'x-amz-meta-s3cmd-attrs': 'md5:55e69a552b710d7343dbd7e5f6058dc8'}
DEBUG: DeUnicodising '.\test-file.txt' using cp1252
DEBUG: DeUnicodising '.\test-file.txt' using cp1252
DEBUG: DeUnicodising '.\test-file.txt' using cp1252
WARNING: Module python-magic is not available. Guessing MIME types based on file extensions.
DEBUG: CreateRequest: resource[uri]=/test-folder/test-file.txt
upload: '.\test-file.txt' -> 's3://
I'm encountering this issue as well. Using the --no-check-md5 flag seems to be a workaround but isn't viable for us as we have files that are modified but don't necessarily change in size.
Disabling server-side encryption (SSE) seems to have fixed this issue for me. When turned on, the ETag isn't the MD5 checksum (see the API documentation). After disabling SSE on the bucket, uploading the file once more creates the correct MD5 and subsequent upload attempts calculate properly.