Exceptions when benchmarking: ERROR NIO1SocketServerGroup
Hi,
during benchmarking a http4s service implementation I ran into some issues. Occasionally the service errored producing the following kinds of exceptions in the logs.
First kind
After this exception the service continued to reply but errors kept on ramping up.
ERROR NIO1SocketServerGroup - Error handling client channel. Closing.
java.util.concurrent.RejectedExecutionException: This SelectorLoop is closed.
at org.http4s.blaze.channel.nio1.SelectorLoop.enqueueTask(SelectorLoop.scala:118)
at org.http4s.blaze.channel.nio1.SelectorLoop.initChannel(SelectorLoop.scala:139)
at org.http4s.blaze.channel.nio1.NIO1SocketServerGroup.org$http4s$blaze$channel$nio1$NIO1SocketServerGroup$$handleClientChannel(NIO1SocketServerGroup.scala:290)
at org.http4s.blaze.channel.nio1.NIO1SocketServerGroup$SocketAcceptor.acceptNewConnections(NIO1SocketServerGroup.scala:148)
at org.http4s.blaze.channel.nio1.NIO1SocketServerGroup$SocketAcceptor.opsReady(NIO1SocketServerGroup.scala:119)
at org.http4s.blaze.channel.nio1.SelectorLoop.processKeys(SelectorLoop.scala:200)
at org.http4s.blaze.channel.nio1.SelectorLoop.org$http4s$blaze$channel$nio1$SelectorLoop$$runLoop(SelectorLoop.scala:171)
at org.http4s.blaze.channel.nio1.SelectorLoop$$anon$1.run(SelectorLoop.scala:68)
at java.base/java.lang.Thread.run(Thread.java:834)
Second kind
After this exception the service stopped responding.
ERROR SelectorLoop - Unhandled exception in selector loop
java.io.IOException: Connection reset by peer
at java.base/sun.nio.ch.FileDispatcherImpl.close0(Native Method)
at java.base/sun.nio.ch.SocketDispatcher.close(SocketDispatcher.java:55)
at java.base/sun.nio.ch.SocketChannelImpl.kill(SocketChannelImpl.java:907)
at java.base/sun.nio.ch.SelectorImpl.processDeregisterQueue(SelectorImpl.java:267)
at java.base/sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:116)
at java.base/sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:124)
at java.base/sun.nio.ch.SelectorImpl.select(SelectorImpl.java:141)
at org.http4s.blaze.channel.nio1.SelectorLoop.org$http4s$blaze$channel$nio1$SelectorLoop$$runLoop(
SelectorLoop.scala:163)
at org.http4s.blaze.channel.nio1.SelectorLoop$$anon$1.run(SelectorLoop.scala:68)
at java.base/java.lang.Thread.run(Thread.java:834)
ERROR NIO1SocketServerGroup - Listening socket(/0.0.0.0:53248) closed forcibly.
java.nio.channels.ShutdownChannelGroupException: null
at org.http4s.blaze.channel.nio1.SelectorLoop.killSelector(SelectorLoop.scala:225)
at org.http4s.blaze.channel.nio1.SelectorLoop.org$http4s$blaze$channel$nio1$SelectorLoop$$runLoop(
SelectorLoop.scala:186)
at org.http4s.blaze.channel.nio1.SelectorLoop$$anon$1.run(SelectorLoop.scala:68)
at java.base/java.lang.Thread.run(Thread.java:834)
ERROR NIO1HeadStage - Abnormal NIO1HeadStage termination
java.nio.channels.ShutdownChannelGroupException: null
at org.http4s.blaze.channel.nio1.SelectorLoop.killSelector(SelectorLoop.scala:225)
at org.http4s.blaze.channel.nio1.SelectorLoop.org$http4s$blaze$channel$nio1$SelectorLoop$$runLoop(
SelectorLoop.scala:186)
at org.http4s.blaze.channel.nio1.SelectorLoop$$anon$1.run(SelectorLoop.scala:68)
at java.base/java.lang.Thread.run(Thread.java:834)
As mentioned these are hard to reproduce. In general I could observe that once a service started erroring then leaving the JVM running it would continue to produce errors.
System environment
The code for the service can be found in the following repository: https://github.com/jan0sch/pfhais
It is located within the pure folder. The configuration files for the jmeter benchmarks can be found in the jmeter folder.
Service workstation
- CPU : Core i5-9600K, 6 Cores, 3,7 GHz
- RAM : 32 GB
- HDD : 2x Samsung SSD 860 PRO 512GB, SATA
- OS : FreeBSD 12 (HT disabled)
- JDK : 11.0.4+11-2
- DB : PostgreSQL 11.3
Client workstation
- CPU : AMD Ryzen Threadripper 2950X
- RAM : 32 GB
- HDD : 2x Samsung SSD 970 PRO 512GB, M.2
- OS : FreeBSD 12 (HT disabled)
- JDK : 11.0.4+11-2
Apache JMeter 5.1.1 was used to run the benchmark.
@jan0sch @rossabaker I also have this issue with the exact same error message.
The error only occurs using Websockets and NIO1. Using NIO2 resolves the issue.
Unfortunately, the NIO2 server is deprecated in blaze. Performance was worse by all measures, and we didn't backport the CVE fix.
As of 0.23.10 I guess there is no fix for this issue, is that right? What is the preferred workaround? Would you recommend switching to a different server impl?
The ember server is the new default one since some time now. I did not try to re-create the bug with the ember one but so far had no trouble with it (using it in production although only on light/moderate service loads).