okhttp Feature Request: Offer More Flexible Websocket Pong Failure Policy

Right now, OkHttp's websocket pong rule is: it must receive a pong for every ping, and that pong must be received before the time comes for the next ping (see #3227 and this code). I'm sure that works in many cases, but it is a fairly rigid policy. For example, right now I am dealing with an IoT device where I have to ping frequently, but I might not get pongs all the time.

At the moment, I am trying to work around this by:

Identify a websocket failure due to a missed pong, by the icky approach of examining the exception handed to onFailure() and seeing if it is a SocketTimeoutException and has didn't receive pong in the message
Open a fresh websocket and hope the IoT device can deal (spoiler alert: it can't deal all that well...)

The current implementation is fine as a default, but it would be useful if we could have an asymmetric policy, with the ping interval at least somewhat independent of the pong timeout period. In my case, it would be something like:

Send pings every 2 seconds
Fail only if we do not receive a pong within 40 seconds

(or something like that)

Right now, we are using another websocket library. We are able to implement what we need there because all the guts are exposed, so we can send extra ping frames. We have configured the library's automatic ping logic to have a long period (e.g., 40 seconds) to prevent an early websocket failure due to a missed pong, with the extra ping frames satisfying the IoT device's need for more frequent pings. I am trying to switch over to OkHttp, because we're going to need some of what it offers (particularly in the SSL/TLS area) eventually.

Mar 06 '20 13:03 commonsguy

Hm. My aspirations here are to make the pings + pongs dynamic and based on the other activity on the web socket.

Perhaps something like “send a ping after 10 seconds of inactivity” instead of “send a ping every 10 seconds”, and backing off from 10 seconds to 30 to 60 to 120, so that a web socket that doesn’t get much use also doesn’t need many pings. In such cases we could probably adjust the pong timeout to be based on a fixed deadline (say 30 seconds in high-latency environments, and 2 seconds in low-latency ones) rather than on the transmission time of the next ping.

Mar 07 '20 04:03 swankjesse

Perhaps the ping delivery/pong requirement rules could be driven by a pluggable strategy implementation. The current strategy (pingInterval + pong received by the time of the next ping) would be one supplied configurable implementation. What you describe could be another. The project that I'm helping could develop a custom strategy that works with this IoT device. And so on.

Mar 07 '20 12:03 commonsguy

@swankjesse on Android we could benefit from being able to control this. e.g. in the background we probably favour losing the connection instead of trying (and depending on Android version failing) to keep a connection and radio alive in the background.

Mar 07 '20 14:03 yschimke

In addition to the original request, if there's an incoming traffic then we might be interested in considering the connection as not stale even besides no pong is received on time.

Jan 07 '21 17:01 frank-leap

@swankjesse, re adding ping max failures. I think what we ultimately need here is a way of adding custom logic for the ping/pong handling failure scenarios. What is currently implemented is too limiting.

Feb 23 '21 09:02 m4ce

I would highly suggest to expose lower level ping/pong functions to the user:

Add onPing and onPong callbacks that are triggered when receiving a ping or pong message. Those callbacks should also provide the ping/pong data if any.
Add a sendPing and sendPong methods to the WebSocket class to allow the user to manually send a ping request or pong reply, along with some custom data if desired.
Add an autoPongReply option that indicates if the library should automatically reply to ping requests received or not, in case the user wants to handle the replies himself in the onPing callback (ex: with custom data). Another way without adding an option is to assume the user wants to handle the pong replies if he overrides the onPing function.

Having access to those methods and callbacks would allow the user full flexibility to implement any desired ping-pong logic at the application level if the default behavior isn't enough.

Nov 19 '21 18:11 Totalus

Browsers don't expose pings and pongs and I'd prefer we not either.

I suspect there's a good general purpose strategy we could pack in rather than requiring each user to learn the API and configure. (We might need very high level tuning parameters, so you could tell OkHttp if the device is a phone or a server inside a data center).

Nov 21 '21 16:11 swankjesse

Browsers don't expose pings and pongs and I'd prefer we not either.

Why not ? Websocket libraries often do expose them.

I suspect there's a good general purpose strategy we could pack in rather than requiring each user to learn the API and configure.

Yes, you're right. A good general purpose strategy would answer most use cases and should be implemented. But one does not prevent the other. I think you should do both: expose the functions and provide a general purpose strategy you can turn on or off.

Nov 22 '21 14:11 Totalus

Why not ?

It's just our approach to building an API. There's lots of low-level libraries for users who want exact control over every byte. OkHttp works closer to the application layer: you tell OkHttp what request to make and it mostly just figures out the rest. We similarly don't offer knobs to select how our HTTP/2 flow control works, or how much data we buffer when streaming a request body.

You might argue that this is inefficient, but I think in aggregate it’s quite efficient. We can automatically make good decisions even as the context changes. For example, when we support web sockets over HTTP/2 it'll make very little sense to do pings the websocket layer and not the connection layer.

So if you’re with me, even just a bit, I'd love your best ideas on what our built-in ping policy should be.

Nov 24 '21 05:11 swankjesse

Closing as low priority. OkHttp avoids knobs, there are other frameworks that allow infinite configurability, but it's not OkHttp.

May 21 '23 08:05 yschimke