Support streaming bodies in the client
Moved from here:
https://github.com/fruux/sabre-dav/issues/321
Hello :-),
The Sabre\HTTP\Client::parseCurlResult method computes an array containing the response index. This index contains a Sabre\HTTP\Response object. This object extends Sabre\HTTP\Message. On this object, we have the getBody, getBodyAsStream and getBodyAsString methods.
So, because this response from Sabre\HTTP\Client::parseCurlResult is returned by the doRequest method, basically, this feature is already supported.
QED ■.
The point of this ticket is, that someone may be using the library to do downloads of very large files.
In those cases, we want to ensure that the entire file is accessed as a stream, and never placed into memory.
So while it's possible right now to convert the string into a stream, the goal is to change the client a bit so it uses a stream under the hood as well.
@evert But the result is already in memory because cURL gives you all the “stuff”: https://github.com/fruux/sabre-http/blob/c55cbc1daa91293cda92ea4b90de79c743c4a149/lib/Client.php#L483. I will check if cURL can gives only few informations in order to create a stream.
An interesting link from a friend of me @pmartin: http://stackoverflow.com/questions/1342583/manipulate-a-string-that-is-30-million-characters-long/1342760#1342760. However, I don't know how it works if we don't want to load the response yet we receive it but later: When reading the stream only.
In case where the user has a stream ready and with the write permission, she can give this stream to the HTTP client and the response will be copied into this stream. This answer to one use-case.
streams in request bodies already 100% work, this is about turning a HTTP response into a stream.
@evert How does it work. I missed it in the source code?
https://github.com/fruux/sabre-http/blob/master/lib/Client.php#L405
@evert Yup, it's for sending a request. What you would like to do is for receiving a response, right?
Indeed, yes!
Any news on this? I am very interrested in using streams as im going to down/upload large files >2GB with the webdav client.
@h44z Not from me yet.
This is still something that interesting for us, but we haven't had time implementing it yet.
So the problem here is that curl actually doesn't have an easy way for us to just access the stream resource, as far as I can tell.
The only way we can progressively get access to the stream, is by using the CURLOPT_WRITEFUNCTION option, but that only gives us 'bits of the string' as opposed to a full-on stream.
With that function, we could send everything to a temporary stream (php://temp/) which would cache the result in memory, but that only solves part of the problem.
Ideally we'd want the response to return as soon as it starts coming in and not after all the bytes have arrived, and ideally we would want to not have to cache/buffer it anywhere.
I don't see an easy way to do that.
CURLOPT_WRITEFUNCTION seems to be invoked when a certain sized chunk was downloaded by CURL. So it looks more or less like a "real" stream.
Which parts of the problem doesnt this solve for you, could you elaborate?
Well, it would be really nice if we can do something like :
$request = "...";
$response = $client->send($request);
stream_copy_to_stream('php://output', $response->getBodyAsStream());
This should:
- Use real stream resources internally
- Never buffer anything, anywhere.
Absolutely. So we need to change from using a string and use e.g. a php tmp stream in https://github.com/fruux/sabre-http/blob/master/lib/Client.php#L496
this will be not "real" streaming but it will be way better as what we have ATM. whats the actual shop stopper? Did I miss something?
The problem with using the temporary stream is that it only partially solve the goals. It does not:
- Avoid a cache/buffer. The entire response will be stored in memory or on disk.
- We can't start reading until the entire response is in, because we can't write and read the string at the same time.
It's better than nothing though.
Avoid a cache/buffer. The entire response will be stored in memory or on disk.
right, but thats how streaming works nevertheless... php://memory seems like a good fit for that.
We can't start reading until the entire response is in, because we can't write and read the string at the same time.
hm IIRC we could do this using a non-blocking read/write stream, couldn't we?
maybe it would also be a good occasion to use a different http client, e.g. https://github.com/amphp/artax
(so we dont need to workaround curl limitations)
right, but thats how streaming works nevertheless... php://memory seems like a good fit for that.
That's not really true... If I didn't use curl and used PHP's built-in HTTP stream wrappers, there would be no buffer.
Here's an example (and note that I did stream_copy_to_stream() wrong in my previous code snippet, sorry about that):
stream_copy_to_stream(
fopen('http://example.org/','r'),
fopen('php://output','w')
);
If I do the same with a temporary stream, it would look more like this:
$tmp = fopen('php://temp','r+');
stream_copy_to_stream(
fopen('http://example.org/','r'),
fopen($tmp,'w')
);
rewind($tmp);
stream_copy_to_stream(
$tmp,
fopen('php://output','w')
);
This last example has two passes, and requires a buffer (disk or memory depending on the size) and this is exactly how the curl example with WRITEFUNCTIONwould work as well. This is far from ideal. The use-case I want to solve is indeed the '2GB download' use-case, and if I force people to store the entire thing on disk first that would be sub-optimal.
We can't start reading until the entire response is in, because we can't write and read the string at the same time.
hm IIRC we could do this using a non-blocking read/write stream, couldn't we?
Not at the same time, and not without a buffer. Perhaps with steam_select() and mkfifo() :)
maybe it would also be a good occasion to use a different http client, e.g. https://github.com/amphp/artax
Not a bad idea =)
I think the only really good solution will be a different http client. With artax you can even use single threaded concurrency in case some requests can be made in parallel which could be another perf. win.
I'll definitely look into it. My preference would go for something lightweight, so maybe artax is that =)
In the future I want to kick of sabre/davclient again, so that will be good timing to dig into that.