b2-sdk-java icon indicating copy to clipboard operation
b2-sdk-java copied to clipboard

Starting a multi-input-stream large file upload, and adding the input streams later one-by-one.

Open bemehiser opened this issue 6 years ago • 8 comments

I'm trying to

  1. start a large file upload, then
  2. add parts later as I receive them.

I get the parts as input streams, and one would assume from the REST API that I should be able to upload them one by one. However, it looks like I either need to have all input streams when I start the upload, or copy the data from each input stream to another input stream which the upload is using, and let it manage the parts.

Is it possible to use generic part upload functionality like the REST API b2_upload_part with each input stream as I acquire it, or can that functionality be added or exposed?

I'm currently using revision 3dfc97 (2019-10-08)

bemehiser avatar Dec 04 '19 16:12 bemehiser

Hi Bruce --

Great question. As you point out, there's definitely an API for uploading parts and we definitely have code in the SDK that does that. It could be exposed. Before we dive into that, we should talk more about your use case to make sure that whatever we decide upon will actually help you.

Because the SDK may need to retry uploads, our ContentSource objects must be able to create an InputSource on demand that starts at the beginning of your content. When you say you already have an input stream, are you able to create it on demand or is someone posting it to you outside your control?

thanks, ab

certainmagic avatar Dec 04 '19 17:12 certainmagic

Thanks for the prompt response!

The part input stream source is mostly out of my control, coming from a different library which allows for generic implementation of cloud storages, but it assumes we manage the parts and retry functionality.

I appreciate the ease of use which the wrapper allows, but I'd also like to be able to access the low level implementation, maybe as a "not recommended, but here you go anyway - use this like you would the REST API" type of thing. I'd be fine implementing retry myself.

Many thanks!

bemehiser avatar Dec 04 '19 17:12 bemehiser

Hi Bruce --

Ok. You said the magic words -- you're willing to sign up for doing the retry logic. :)

Take a look at B2StorageClientWebifierImpl. It's the class responsible for turning an abstract desire to upload a part to the call to the WebApiClient call that does the work. I'm not sure how you are creating your B2StorageClient, but you definitely have a webifier, even if it's the default one built by some higher-level code.

Webifier methods are usually called from inside an invocation of retryer.doRetry(). See B2LargeFileUploader.uploadOnePart() to see how the SDK is calling the webifier's uploadPart. A lot of the code in there is related to setting up progress notifications and making a substream from the large file. Do notice that we're using an B2UploadPartUrlCache to reuse upload urls and authorizations.

I'm not sure if B2Retryer and/or B2DefaultRetryPolicy will be useful for you, but you might want to look at them to understand the different cases we handle today. In particular, some errors need to be retried and some are unretryable.

I'd be happy to review your approach with you before you get too far and happy to look at any PR.

thanks, ab

On Wed, Dec 4, 2019 at 5:44 PM Bruce Emehiser [email protected] wrote:

Thanks for the prompt response!

The part input stream source is mostly out of my control, coming from a different library which allows for generic implementation of cloud storages, but it assumes we manage the parts and retry functionality.

I appreciate the ease of use which the wrapper allows, but I'd also like to be able to access the low level implementation, maybe as a "not recommended, but here you go anyway - use this like you would the REST API" type of thing. I'd be fine implementing retry myself.

Many thanks!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Backblaze/b2-sdk-java/issues/96?email_source=notifications&email_token=ABHJFCXTXZF64YWZSDALSX3QW7T6JA5CNFSM4JVLAAAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF54USQ#issuecomment-561760842, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHJFCQVRMTPQSL3YWJZUNDQW7T6JANCNFSM4JVLAAAA .

certainmagic avatar Dec 04 '19 22:12 certainmagic

@certainmagic, I got the functionality I wanted to work by

  • adding a method to the B2StorageClient which allows me to upload a single part.
  • duplicating and modifying the B2LargeFileStorer to B2LargeFilePartStorer, which allows me to manage the part size instead of breaking an input stream into multiple parts.

I can mark the input stream for a given part, and reset it if the B2ContentSource requests a new input stream, so letting the client manage the retry for any given part is fine. I tried wrapping all the incoming part input streams in a single input stream which the b2 client large file upload could read to upload all parts, but that didn't work because finished part input streams weren't released, so I couldn't reclaim memory (the application stores incoming parts in memory) and very large files couldn't upload.

I'm sure it's not the prettiest way, but it seems to work fine for my use case, and it would be nice functionality to have in the SDK (assuming it doesn't exist already, and I've managed to miss it.)

https://github.com/bemehiser/b2-sdk-java/commit/22be8f629dc750f90463b1ec8e9cff5c791f6a80

bemehiser avatar Dec 10 '19 13:12 bemehiser

Thanks, Bruce. I need to involve the developer who worked on the large file stuff most recently. We'll take a look.

ttfn, ab

On Tue, Dec 10, 2019 at 1:46 PM Bruce Emehiser [email protected] wrote:

@certainmagic https://github.com/certainmagic, I got the functionality I wanted to work by adding a method to the B2StorageClient which allows me to upload a single part.

I'm sure it's not the prettiest way, but it seems to work fine for my use case, and it would be nice functionality to have in the SDK (assuming it doesn't exist already, and I've managed to miss it.)

bemehiser@22be8f6 https://github.com/bemehiser/b2-sdk-java/commit/22be8f629dc750f90463b1ec8e9cff5c791f6a80

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Backblaze/b2-sdk-java/issues/96?email_source=notifications&email_token=ABHJFCWUTLUUBIAVOAEETLDQX6MSHA5CNFSM4JVLAAAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGPIZZI#issuecomment-564038885, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHJFCRL7D7DF2IKXUYPNPLQX6MSHANCNFSM4JVLAAAA .

certainmagic avatar Dec 10 '19 18:12 certainmagic

Hi Bruce --

Thanks for making that version.

I'd like to be able to figure out the right way to add this feature and my first stab or two got a little more complicated than I was expecting.

Now that you have your version, are you unblocked? Or are you stuck waiting for official support for this feature?

thanks, ab

certainmagic avatar Dec 11 '19 18:12 certainmagic

@certainmagic,

I'm unblocked. I'd ideally like this feature to be officially supported, but official support is not required at the moment.

Many thanks.

bemehiser avatar Dec 11 '19 18:12 bemehiser

hi --

btw, i haven't forgotten about this. i took a few stabs at making a clean version in mid-December, but haven't gotten anything good yet. it could be a while before i get back to it.

ttfn, ab

certainmagic avatar Jan 15 '20 18:01 certainmagic