`_single_line_reply()` is truncating large strings
While I was retrieving some JSON (rather huge sets) I noticed that _single_line_reply() was returning only a (large) random chunk of what it should. The chunk was usually 8192 bytes, but sometimes 16k. The same thing happened with base64 encoded data larger than ~32k .
Identical behavior happens when data is just serialized prior to setting. Yet, a GET from redis console produces the entire thing perfectly.
My understanding of fgets() is the same as in the documentation:
Reading ends when length - 1 bytes have been read, on a newline (which is included in the return value), or on EOF (whichever comes first). If no length is specified, it will keep reading from the stream until it reaches the end of the line.
Try it with any large serialized array ( > 8k) e.g.
$redis->set('foo', serialize($big_array));
Then observe what you see via $redis->get('foo') vs what you see via GET foo in the console. I checked the points at which it chomped the data and there was no newline (escaped or otherwise) at that point. Always at 8k or 16k.
Can anyone reproduce this?
To narrow it down a bit more, any string over 20k seems to be doing this.
Here is a gist with a test case. Once I hit past ~16k, _single_line_reply() (or rather, fgets()) starts truncating the result, and I can't figure out why. Note in the gist I have it named zredis, It's the library I use while I phase out use of the old version of the library (lots of places to not deal with spaces anymore).
Weird stuff, got the same thing here:
Read payload of 15872 bytes
Writing payload of 16384 bytes
Read payload of 16376 bytes
Did not read the expected amount of bytes
fsockopen('unix:///tmp/redis.sock')
doesn't do the trick either.
Found some info:
I'm extremely happy to say I've now got it working, and it's everything I dreamed it could be. The major bug stopping me was truncation of values more than 16k in size, and that turns out to be a bug in the Net_TokyoTyrant PHP wrapper (and arguably a bug in PHP's libraries). The wrapper was using a single fread() call to get values, but this has size limits, and so needs to be called repeatedly in a loop to get the full result. Jeremy Hinegardner got me attacking this again after he confirmed he was using Tokyo successfully through Ruby, and after some debugging made me suspicious of fread's reliability Blagovest Buyukliev's post confirmed it was the cause, and gave me a drop-in fix.
http://petewarden.typepad.com/searchbrowser/2009/06/how-to-get-tokyo-tyrant-working-in-php.html
What version of PHP did you run that under? I'm using 5.4.5, about to try it on 5.3
PHP 5.3.15 with Suhosin-Patch (cli) (built: Aug 24 2012 17:45:44)
Well, this is definitely not a bug in the library. I'm going to (tomorrow) rule out using the loopback device (127.0.0.1) and bind a few new servers to a public IP. I am completely baffled, so are a lot of other people (the PHP folks in SO chat who helped trying to narrow this down). It's just so odd that 16k is a 'breaking point' in a few places.
Great. Keep us posted.
No, it's the same over all interfaces. I'm out of ideas here. I have tried every possible way of reading from a descriptor opened with fsockopen(). fgets() truncates, fread() blocks indefinitely (strace shows poll() not firing while there is still ready data on the wire) and the stream family exhibits the same behavior as fgets(). I even tried setting the socket to nonblock and reading in very small chunks, a do { } that breaks if the read length is zero, and even some magic pixie dust. I even tried fscanf(), dies right around 16k.
I'm going to see if I can reproduce this with Predis and if not, examine how they get around it. So odd that I can send, but not receive 16k over the wire. So very very odd.
I played around with stream_set_chunk_size() and stream_set_read_buffer(). Tweaking these, I was able to get (reliably) up to 40kb read before that quirk kicked in. My _single_line_reply() function now looks like this:
private function _single_line_reply()
{
stream_set_read_buffer($this->_connection, 0);
stream_set_chunk_size($this->_connection, 131072);
return trim(fgets($this->_connection));
}
Note, that is an ugly hack, I'm setting the chunk size to 128kb and turning off buffered reads.
Predis uses sockets, not streams and doesn't suffer from this problem (at least for a single connection, I don't know what it does for sharding). It seems like we have two options here:
- Change the code to use sockets
- Deal with a 16k barrier
I thought there was a way to tell a stream to give up the underlying socket, so that the socket family of functions could be used just in _single_line_reply() (the only place where we DON'T know how much we have to read prior to reading), however there seems to be no such way. We have to use either sockets or stream resources.
The tweaks I made to the chunk and buffer size eliminate my immediate problem. Do we want to just use sockets and not deal with the quirks associated with streams? That's a change I'm willing to write over a weekend. I don't see myself dealing with data over 40k serialized, so the immediate pressure is off.
Also, worth writing a (proper) test for this (up to 512k) and just let it fail for now?
Let me know one way or another.
Well, that kludge did not work out so well. I'm going to add a sockets version of the library to my fork and get it working, then send a pull request for testing (Sockredis). I had to wrap Predis to actually get this done, and that's just so much code that I don't need. In fact, I'm not even committing it. Predis is great, but I don't need that much code.
This falls right on PHP's doorstep. I'm going to be raising a bit of a stink about it soon. They need to either fix or document this quirk. In the mean time, if you do socket programming, use sockets, not streams if you like your ass un-bitten :)
Edit
There is a way to get to the underlying socket for that single method. Trying it tomorrow.
Ha! Thanks for your thorough investigation Tim. Looking forward to your fork and pull.
O.k., socket_import_stream() is probably not going to work for us, because we have to read from the socket prior to importing it, which causes it to error due to losing bytes during conversion. E.g:
socket_import_stream(): 6 bytes of buffered data lost during stream conversion!
I tried setting the read buffer to 0 (none) with stream_set_buffer() prior to initially reading, which causes the conversion to work, but also socket_read() to block waiting for data despite setting the read buffer back to what it was immediately after conversion (and tried directly on the socket too via socket_set_option). Sockets in PHP are a mess.
I'm going to try a few other things. Here's also a proper test that can be used to reproduce the problem conventionally.
What I have so far is this (assume $this->sock is now a class variable)
First, we have to turn off buffering while we read the first bytes to determine the reply type:
private function _read_request()
{
if ($this->sock === NULL)
{
stream_set_read_buffer($this->_connection, 0);
}
$type = fgetc($this->_connection);
if ($this->debug === TRUE)
{
log_message('debug', 'Redis response type: ' . $type);
}
switch ($type)
{
case '+':
return $this->_single_line_reply();
break;
case '-':
return $this->_error_reply();
break;
case ':':
return $this->_integer_reply();
break;
case '$':
return $this->_bulk_reply();
break;
case '*':
return $this->_multi_bulk_reply();
break;
default:
return FALSE;
}
}
Then, we actually get a reference to the socket in _single_line_reply(), and set the buffer back:
private function _single_line_reply()
{
if ($this->sock === NULL)
{
$this->sock = socket_import_stream($this->_connection);
stream_set_read_buffer($this->_connection, 4096);
}
if (! $this->sock) {
log_message('error', 'Unable to obtain socket from connection stream in single line reply');
return NULL;
}
$value = '';
while (($buff = socket_read($this->sock, 1024)) != "")
{
$value .= $buff;
}
return trim($value);
}
I have tried that with and without sock_set_option() to do the same .. and it just blocks forever. I have also tried various settings for the read buffer and the chunk size we read, even reading one byte at a time blocks.
Hopefully I'm able to get it working. Setting the socket to nonblock just results in a ton of -EAGAIN from recvfrom (as shown via strace), and it stops reading prematurely.
Note, I began with just getting the socket in the constructor, and tried the current (mess) after experiencing the same issue.
I solved the blocking issue. Actually, I have it just about working. A normal socket read in PHP uses the flag PHP_BINARY_READ, which takes its cues on where to stop from the system's recv() function. There is also PHP_NORMAL_READ, which is defined as:
PHP_NORMAL_READ - reading stops at \n or \r.
But it doesn't, and there are open bugs going back to 2009 about that. Using strpos() to find if \r is in the chunk we just read works, and we only examine 1k at a time.
However, I did get one random failure. I'm going to clean it up a bit and try running 100 tests to see what happens. I moved getting the reference to the socket and associated sanity checking back into the constructor, so the only places with modifications are the constructor and _single_line_reply() functions. I'll have a pull request out in a while.
It's .. still breaking at around 16k randomly. I was close to happy, I ran tests (repeating 100x each) half a dozen times, and then I get:
Failed asserting that 16384 matches expected 25600.
... just before I was about to commit.
Something about a socket created as a stream does something odd, I even tried tweaking the receive buffer higher. I'll work on an alternate universe version of the library just using sockets in the development branch of my fork. Do you want a pull request for the test? It's just going to fail repeatedly until this is fixed.
Edit
I just don't get it, I ran tests 30 more times, all pass with flying colors. Ten minutes later I ran them again, failed on the third attempt. So I went sadistic on it:
phpunit --repeat 1000 --stop-on-failure
PHPUnit 3.6.10 by Sebastian Bergmann.
Configuration read from /home/.../codeigniter-redis/phpunit.xml
................................................................... 67 / 4 (1675%)
................................................................... 134 / 4 (3350%)
................................................................... 201 / 4 (5025%)
................................................................... 268 / 4 (6700%)
................................................................... 335 / 4 (8375%)
................................................................... 402 / 4 (10050%)
................................................................... 469 / 4 (11725%)
................................................................... 536 / 4 (13400%)
................................................................... 603 / 4 (15075%)
................................................................... 670 / 4 (16750%)
................................................................... 737 / 4 (18425%)
................................................................... 804 / 4 (20100%)
................................................................... 871 / 4 (21775%)
................................................................... 938 / 4 (23450%)
................................................................... 1005 / 4 (25125%)
................................................................... 1072 / 4 (26800%)
................................................................... 1139 / 4 (28475%)
................................................................... 1206 / 4 (30150%)
................................................................... 1273 / 4 (31825%)
................................................................... 1340 / 4 (33500%)
................................................................... 1407 / 4 (35175%)
....F
Time: 02:24, Memory: 3.50Mb
There was 1 failure:
1) RedisTest::test_chunk_reads
Failed asserting that 16000 matches expected 30208.
1, 10, 100 or 1000 times, it just randomly fails. It's not redis, the data is there. I'll just work on converting to sockets.
Why does PHP hate 16k so much?
If it's not too much work that would be great. Hang in there :)!
It should not be to difficult to just use sockets. I've been meaning to send a change to (optionally) just use a domain socket anyway.
This is just beyond nuts. I'll try to get it done over the coming weekend.
Less-hacky fix until a "real" fix gets put in place: (this is a git diff for anyone who wants to implement)
[code] @@ -242,13 +242,25 @@ class Redis { $value_length = (int) fgets($this->_connection); if ($value_length <= 0) return NULL;
-
$response = rtrim(fread($this->_connection, $value_length + 1)); -
$response = $this->_read_line( $value_length ); fgets($this->_connection); // Get rid of the \n\r return isset($response) ? $response : FALSE; } -
private function _read_line( $value_length ) -
{ -
$response = ''; -
$remaining_length = $value_length; -
while(strlen($response) < $value_length) { -
$line = fread($this->_connection, $remaining_length); -
$remaining_length -= strlen($line); -
$response .= $line; -
} -
return $response; -
} -
[/code]/** \* Multi bulk reply *
(github is editing my response and turning the + into a dot)
Thanks Dave!
slightly more elegant version...
private function _read_line( $length )
{
$data = '';
while($length > 0) {
$buffer = fread($this->_connection, $length);
$length -= strlen($buffer);
$data .= $buffer;
}
return $data;
}
Does this need closed? I think we figured this problem out with the multi_reply section. It would probably be easy to move that fix over to here if this is still an actual issue.
Eric, my gut says yes, but this issue has been so random that I'd prefer some confirmation from others first.
On Aug 22, 2013, at 8:04 PM, Eric Greer [email protected] wrote:
Does this need closed? I think we figured this problem out with the multi_reply section. It would probably be easy to move that fix over to here if this is still an actual issue.
— Reply to this email directly or view it on GitHub.
Is there anyone still running into this problem using the latest HEAD?
This thing should be dead. I have tested the hell out of it on my instance with the code i committed.
Great, so you're talking about code that is already in the develop branch or referring to #48?
The stuff in 48. :-( On Nov 4, 2013 12:36 AM, "Joël Cox" [email protected] wrote:
Great, so you're talking about code that is already in the develop branch or referring to #48 https://github.com/joelcox/codeigniter-redis/pull/48 ?
— Reply to this email directly or view it on GitHubhttps://github.com/joelcox/codeigniter-redis/issues/17#issuecomment-27669669 .
Thanks, I might have a go myself and fix the style issues in #48.
Really will eventually fix this. I just got done with about 1000 lines this weekend, have my day job in the morning and Google glass arriving. Did I mention I have a wife and kids?
I need to learn the style guide. I will eventually get there if you don't want to waste your time. On Nov 4, 2013 2:02 AM, "Joël Cox" [email protected] wrote:
Thanks, I might have a go myself and fix the style issues in #48https://github.com/joelcox/codeigniter-redis/pull/48 .
— Reply to this email directly or view it on GitHubhttps://github.com/joelcox/codeigniter-redis/issues/17#issuecomment-27674408 .
I didn't mean it like that Eric, sorry if I came across that way.
This library is provided free of charge and I appreciate every single effort in order to make it better. Just wanted to say that I might have a stab at fixing the code style myself, because you already took care of the hard part – actually fixing the bug.
Thanks again for your input.