hip icon indicating copy to clipboard operation
hip copied to clipboard

Figure out what interface the Response object should expose for streaming data

Open njsmith opened this issue 6 years ago • 1 comments

In urllib3, HTTPResponse inherits from io.IOBase and implements the standard Python file interface. This is an interesting issue that discusses a bug in their implementation, and also why people value this feature: https://github.com/urllib3/urllib3/issues/1305

We need to decide what interface our version of HTTPResponse should expose. Constraints:

  • The async version can't inherit from IOBase, because IOBase is a sync interface, and for async we probably want to expose something compatible with trio's ReceiveStream API
  • But our sync version will need to provide some way to expose an IOBase, so people can keeping feeding it into the csv module etc., like they're doing in the issue I linked above
  • We generally try to make sync and async interfaces identical, modulo async markers

You'll notice that these constraints are contradictory :-). Especially since the ReceiveStream and IOBase interfaces actually conflict with each other: they both support iteration, but for ReceiveStream it gives arbitrary chunks of bytes, while for IOBase it gives lines.

Some options:

  • We could have HTTPResponse implement the IOBase API in sync mode, and the ReceiveStream API in async mode.
  • We could have it implement IOBase in sync mode, and an async version of IOBase in async mode, and also provide an as_stream() method
  • We could have it implement ReceiveStream in async mode, a sync-ified version of ReceiveStream in sync mode, and also provide an as_file() method.
    • Further options: in async mode, the as_file method could either be not defined at all, or it could still exist and return an async version of the file interface
  • We could have it implement neither interface directly, but provide both as_stream and as_file methods (and then there's the same question about which methods are available in which modes)

njsmith avatar Aug 31 '19 07:08 njsmith

Had more discussions about this, here's the results of that discussion:

  • Having a Response.as_file() method which exposes a file-like object that acts exactly as a usual open() file is desirable for csv readers, data-frames, etc
  • Something which has caused problems for urllib3 in the past is users seeing the interface of 'HTTPResponse' and noticing it inherits IOBase but doesn't exactly behave like a file. Let's not make that mistake again.
  • The as_file() interface shouldn't close once all data is read just like a real file but should close the contained Response object after all data is read. This prevents Responses from being left un-closed but also gives an interface that places nice with interfaces expecting files.

Also discussed the usage of .text()/.data()/.json() as methods that load all data onto the Response. There were questions about whether you should be able to call .text() and then maybe .text() again.

If we are one-shot only then you'd only be able to call a single "body" function on a response. However @pquentin brought up that this would make things annoying for debugging as you'd have to reissue requests to get another body.

Another thought was that if a response body is completely loaded into memory we could hold onto that body internally on the request to allow calling .text(), .json(), and .data() again if needed.

sethmlarson avatar Nov 28 '19 00:11 sethmlarson