deeplake icon indicating copy to clipboard operation
deeplake copied to clipboard

[FEATURE] Skip already transferred files when restarting with hub.copy

Open JossWhittle opened this issue 3 years ago • 2 comments

🚨🚨 Feature Request

  • Related to #1609 (closed)

Is your feature request related to a problem?

When hub.copy throws an exception the transfer is terminated. Rerunning prompts to use the overwrite=True flag which force redownloads all of the data, ignoring the chunks that have already been downloaded.

# Fails for some reason (QuotaExceeded / dropped connection / ect)
hub.copy('hub://activeloop/imagenet-val', 's3://foobar/imagenet-val', dest_creds={ ... }) 

# Restart prompts to run with overwrite=True
hub.copy('hub://activeloop/imagenet-val', 's3://foobar/imagenet-val', dest_creds={ ... }, overwrite=True)

Description of the possible solution

When a transfer is restarted, validate existing chunks that have been transferred and skip them if they are complete.

JossWhittle avatar Apr 21 '22 18:04 JossWhittle

really appreciate this, @JossWhittle . let us know if you have any other feature suggestions/feedback in the meantime!

mikayelh avatar Apr 21 '22 18:04 mikayelh

No worries, sorry I missed a title.

JossWhittle avatar Apr 21 '22 18:04 JossWhittle