torchsnapshot icon indicating copy to clipboard operation
torchsnapshot copied to clipboard

s3 storage plugin upload concurrency

Open GeorgeShao opened this issue 1 year ago • 1 comments

📚 Question

From what I see, this library's s3 storage plugin uses the aiobotocore library's put_object function, which sends a single PutObject request, rather than using multipart uploads for greater upload concurrency.

I created a patch of torchsnapshot that uses the aioboto3 library's upload_fileobj function. In my tests, it significantly increased write performance for large files.

Is there a reason torchsnapshot's s3 storage plugin doesn't support multipart uploads?

I see that DCP has superseded torchsnapshot. Is there an easy migration path for codebases still using torchsnapshot?

GeorgeShao avatar Feb 12 '25 00:02 GeorgeShao

I found that https://github.com/awslabs/s3-connector-for-pytorch handles s3 uploads for DCP... Maybe multi-part parallel uploads need to be checked with them?..

vadimkantorov avatar Jun 17 '25 07:06 vadimkantorov