Streaming Google storage upload
I am trying to solve the case of uploading files in a streaming fashion without keeping the full contents in memory
I have previously implemented the same feature in https://github.com/aws/aws-sdk-ruby/pull/1711
As identified in https://github.com/googleapis/google-cloud-ruby/issues/1997
There are some limitations in google-api-client which hard codes the use of the content length header
I have written a minimal wrapper library that monkey patches it here which should serve as a good source of inspiration: https://github.com/Fonsan/gcs_stream_upload
require 'gcs_stream_upload'
storage = Google::Cloud::Storage.new(timeout: 5 * 60)
bucket = storage.bucket('bucket')
gcs_stream_upload = GCSStreamUpload.new(bucket)
gcs_stream_upload.upload('object') do |io|
IO.copy_stream(IO.popen('yes | head -n 10'), io)
end
# => Written "y\n" * 10
gcs_stream_upload.upload('object') do |io|
io << 'data'
io << 'dat2'
end
# => Written "datadat2"
@Fonsan Thank you for your patience with this request! I finally got the chance today to run https://github.com/Fonsan/gcs_stream_upload and the examples work for me.
@Fonsan Can you see a backward-compatible way to add this functionality to googleapis/google-api-ruby-client? I think that might be the first step toward incorporating it into google-cloud-storage. Even then, however, I'm not sure if other considerations might prevent adding this feature, for example the use of #rewind as mentioned here by @blowmage.
@quartzmo Given the way request_header works in #send_start_command; it fails even if we would stub the #size method on objects that do not respond to size when passed and then stub #to_s we would still end up in a scenario where UPLOAD_CONTENT_LENGTH => nil. This still sets the header and the GCS backend service fails in that scenario currently.
What I am describing above is a bit of a hack but I would really rather see the change in https://github.com/googleapis/google-api-ruby-client/blob/711dfb83b33c03535076917726956584d5c8bf9a/lib/google/apis/core/upload.rb#L173
Is there a specific reason why google-api-ruby-client could not be extended to allow for IO objects that do not respond to size? @blowmage
BTW @ https://www.kaiko.com/ we are running the gem above in production until a cleaner alternative comes around.
Is there a specific reason why google-api-ruby-client could not be extended to allow for IO objects that do not respond to size?
Should this conversation be moved to a new issue on googleapis/google-api-ruby-client ?
@Fonsan If it's OK with you, I will transfer this issue to googleapis/google-api-ruby-client for further discussion.
@quartzmo any efforts to further the progression would be excellent :)
@quartzmo I'll let you investigate the feasibility of this.
@blowmage Any comment on this?
Is there a specific reason why google-api-ruby-client could not be extended to allow for IO objects that do not respond to size?
I believe HttpClient is checking for size. But yes, google-api-client could be rewritten to use something other than HttpClient and not check File#size.
Judging from the documentation, Content-Length header is not required. Seems like it may be a straightforward fix. @quartzmo do you still plan on transferring the issue?
quartzmo transferred this issue from googleapis/google-cloud-ruby on Jan 29, 2019
hi @quartzmo are you actively working on the streaming support on this? I noticed this issue has no one assigned to it right now
It's not being actively worked on right at this point. Likely this will require first completing https://github.com/googleapis/google-api-ruby-client/issues/2348 (which is on my plate but I won't get to it for a few more weeks).
No note this is related as well - https://github.com/googleapis/google-cloud-ruby/pull/8235, any news? Is PR for #2348 welcomed?
@dazuma is there anything I can help with? Would it be welcomed to push #2348 forward?
:'( no progress? Anything I can help with here?
@quartzmo @frankyn @dazuma It is 2023 and this is still not part of the official library if I understand it well. :cry: