google-api-ruby-client icon indicating copy to clipboard operation
google-api-ruby-client copied to clipboard

Streaming Google storage upload

Open Fonsan opened this issue 7 years ago • 18 comments

I am trying to solve the case of uploading files in a streaming fashion without keeping the full contents in memory

I have previously implemented the same feature in https://github.com/aws/aws-sdk-ruby/pull/1711

As identified in https://github.com/googleapis/google-cloud-ruby/issues/1997 There are some limitations in google-api-client which hard codes the use of the content length header

I have written a minimal wrapper library that monkey patches it here which should serve as a good source of inspiration: https://github.com/Fonsan/gcs_stream_upload

require 'gcs_stream_upload'

storage = Google::Cloud::Storage.new(timeout: 5 * 60)
bucket = storage.bucket('bucket')
gcs_stream_upload = GCSStreamUpload.new(bucket)

gcs_stream_upload.upload('object') do |io|
 IO.copy_stream(IO.popen('yes | head -n 10'), io)
end
# => Written "y\n" * 10


gcs_stream_upload.upload('object') do |io|
 io << 'data'
 io << 'dat2'
end
# => Written "datadat2"

Fonsan avatar Nov 21 '18 14:11 Fonsan

@Fonsan Thank you for your patience with this request! I finally got the chance today to run https://github.com/Fonsan/gcs_stream_upload and the examples work for me.

quartzmo avatar Jan 17 '19 19:01 quartzmo

@Fonsan Can you see a backward-compatible way to add this functionality to googleapis/google-api-ruby-client? I think that might be the first step toward incorporating it into google-cloud-storage. Even then, however, I'm not sure if other considerations might prevent adding this feature, for example the use of #rewind as mentioned here by @blowmage.

quartzmo avatar Jan 17 '19 21:01 quartzmo

@quartzmo Given the way request_header works in #send_start_command; it fails even if we would stub the #size method on objects that do not respond to size when passed and then stub #to_s we would still end up in a scenario where UPLOAD_CONTENT_LENGTH => nil. This still sets the header and the GCS backend service fails in that scenario currently.

What I am describing above is a bit of a hack but I would really rather see the change in https://github.com/googleapis/google-api-ruby-client/blob/711dfb83b33c03535076917726956584d5c8bf9a/lib/google/apis/core/upload.rb#L173

Is there a specific reason why google-api-ruby-client could not be extended to allow for IO objects that do not respond to size? @blowmage

BTW @ https://www.kaiko.com/ we are running the gem above in production until a cleaner alternative comes around.

Fonsan avatar Jan 18 '19 07:01 Fonsan

Is there a specific reason why google-api-ruby-client could not be extended to allow for IO objects that do not respond to size?

Should this conversation be moved to a new issue on googleapis/google-api-ruby-client ?

quartzmo avatar Jan 18 '19 21:01 quartzmo

@Fonsan If it's OK with you, I will transfer this issue to googleapis/google-api-ruby-client for further discussion.

quartzmo avatar Jan 18 '19 22:01 quartzmo

@quartzmo any efforts to further the progression would be excellent :)

Fonsan avatar Jan 19 '19 10:01 Fonsan

@quartzmo I'll let you investigate the feasibility of this.

dazuma avatar Jul 25 '19 19:07 dazuma

@blowmage Any comment on this?

Is there a specific reason why google-api-ruby-client could not be extended to allow for IO objects that do not respond to size?

quartzmo avatar Jul 30 '19 20:07 quartzmo

I believe HttpClient is checking for size. But yes, google-api-client could be rewritten to use something other than HttpClient and not check File#size.

blowmage avatar Jul 31 '19 14:07 blowmage

Judging from the documentation, Content-Length header is not required. Seems like it may be a straightforward fix. @quartzmo do you still plan on transferring the issue?

ianks avatar Oct 19 '19 02:10 ianks

quartzmo transferred this issue from googleapis/google-cloud-ruby on Jan 29, 2019

quartzmo avatar Dec 07 '20 16:12 quartzmo

hi @quartzmo are you actively working on the streaming support on this? I noticed this issue has no one assigned to it right now

tinco avatar Feb 16 '21 14:02 tinco

It's not being actively worked on right at this point. Likely this will require first completing https://github.com/googleapis/google-api-ruby-client/issues/2348 (which is on my plate but I won't get to it for a few more weeks).

dazuma avatar Feb 16 '21 17:02 dazuma

No note this is related as well - https://github.com/googleapis/google-cloud-ruby/pull/8235, any news? Is PR for #2348 welcomed?

simi avatar Mar 10 '21 23:03 simi

@dazuma is there anything I can help with? Would it be welcomed to push #2348 forward?

simi avatar Apr 13 '21 09:04 simi

:'( no progress? Anything I can help with here?

simi avatar Jul 28 '21 18:07 simi

@quartzmo @frankyn @dazuma It is 2023 and this is still not part of the official library if I understand it well. :cry:

simi avatar May 29 '23 12:05 simi