Fixing error handling for socket timeouts that occur due to a race-li…
…ke condition
@bpot So, bare with me here...
During testing, we found if the connection idles for long periods of time, you can run into a case where an exception occurs, that is not handled correctly. It would appear as though, in the time between IO.select and @socket.write (and I'm assuming read as well, because why not), the socket actually timesout. This causes an ERRNO::ETIMEDOUT to be thrown, but not caught.
I tried to run your integration specs but kept getting a file not found issue after supplying the directory to my kafka installation.
I have a really dumb looking test script that I'm able to reliably reproduce the issue with.
https://gist.github.com/StabbyCutyou/e0050d3b8b12c7c42736
I seem to be able to get it every 3rd run of the loop, but your mileage may vary. You'll know it happens when the stack trace ERRNO::ETIMEDOUT shows up. I have another branch with some extra logging I could link you to that'll dump some info out in the connection.rb class during each attempt to publish, I used it to verify what was happening.
Again - super weird case, but one that I'm able to reproduce.
EDIT
It switches from once every twenty minutes to 100 messages, each every 100ms to try and reproduce a behavior others had seen where the connection remaining in a bad state for several writes, but I couldn't reproduce that. The script is definitely the result of some random testing approaches.