blaze Exceptions when benchmarking: ERROR NIO1SocketServerGroup

Hi,

during benchmarking a http4s service implementation I ran into some issues. Occasionally the service errored producing the following kinds of exceptions in the logs.

First kind

After this exception the service continued to reply but errors kept on ramping up.

ERROR NIO1SocketServerGroup - Error handling client channel. Closing.
java.util.concurrent.RejectedExecutionException: This SelectorLoop is closed.
        at org.http4s.blaze.channel.nio1.SelectorLoop.enqueueTask(SelectorLoop.scala:118)
        at org.http4s.blaze.channel.nio1.SelectorLoop.initChannel(SelectorLoop.scala:139)
        at org.http4s.blaze.channel.nio1.NIO1SocketServerGroup.org$http4s$blaze$channel$nio1$NIO1SocketServerGroup$$handleClientChannel(NIO1SocketServerGroup.scala:290)
        at org.http4s.blaze.channel.nio1.NIO1SocketServerGroup$SocketAcceptor.acceptNewConnections(NIO1SocketServerGroup.scala:148)
        at org.http4s.blaze.channel.nio1.NIO1SocketServerGroup$SocketAcceptor.opsReady(NIO1SocketServerGroup.scala:119)
        at org.http4s.blaze.channel.nio1.SelectorLoop.processKeys(SelectorLoop.scala:200)
        at org.http4s.blaze.channel.nio1.SelectorLoop.org$http4s$blaze$channel$nio1$SelectorLoop$$runLoop(SelectorLoop.scala:171)
        at org.http4s.blaze.channel.nio1.SelectorLoop$$anon$1.run(SelectorLoop.scala:68)
        at java.base/java.lang.Thread.run(Thread.java:834)

Second kind

After this exception the service stopped responding.

ERROR SelectorLoop - Unhandled exception in selector loop
java.io.IOException: Connection reset by peer
        at java.base/sun.nio.ch.FileDispatcherImpl.close0(Native Method)
        at java.base/sun.nio.ch.SocketDispatcher.close(SocketDispatcher.java:55)
        at java.base/sun.nio.ch.SocketChannelImpl.kill(SocketChannelImpl.java:907)
        at java.base/sun.nio.ch.SelectorImpl.processDeregisterQueue(SelectorImpl.java:267)
        at java.base/sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:116)
        at java.base/sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:124)
        at java.base/sun.nio.ch.SelectorImpl.select(SelectorImpl.java:141)
        at org.http4s.blaze.channel.nio1.SelectorLoop.org$http4s$blaze$channel$nio1$SelectorLoop$$runLoop(
SelectorLoop.scala:163)
        at org.http4s.blaze.channel.nio1.SelectorLoop$$anon$1.run(SelectorLoop.scala:68)
        at java.base/java.lang.Thread.run(Thread.java:834)
ERROR NIO1SocketServerGroup - Listening socket(/0.0.0.0:53248) closed forcibly.
java.nio.channels.ShutdownChannelGroupException: null
        at org.http4s.blaze.channel.nio1.SelectorLoop.killSelector(SelectorLoop.scala:225)
        at org.http4s.blaze.channel.nio1.SelectorLoop.org$http4s$blaze$channel$nio1$SelectorLoop$$runLoop(
SelectorLoop.scala:186)
        at org.http4s.blaze.channel.nio1.SelectorLoop$$anon$1.run(SelectorLoop.scala:68)
        at java.base/java.lang.Thread.run(Thread.java:834)
ERROR NIO1HeadStage - Abnormal NIO1HeadStage termination
java.nio.channels.ShutdownChannelGroupException: null
        at org.http4s.blaze.channel.nio1.SelectorLoop.killSelector(SelectorLoop.scala:225)
        at org.http4s.blaze.channel.nio1.SelectorLoop.org$http4s$blaze$channel$nio1$SelectorLoop$$runLoop(
SelectorLoop.scala:186)
        at org.http4s.blaze.channel.nio1.SelectorLoop$$anon$1.run(SelectorLoop.scala:68)
        at java.base/java.lang.Thread.run(Thread.java:834)

As mentioned these are hard to reproduce. In general I could observe that once a service started erroring then leaving the JVM running it would continue to produce errors.

System environment

The code for the service can be found in the following repository: https://github.com/jan0sch/pfhais

It is located within the pure folder. The configuration files for the jmeter benchmarks can be found in the jmeter folder.

Service workstation

CPU : Core i5-9600K, 6 Cores, 3,7 GHz
RAM : 32 GB
HDD : 2x Samsung SSD 860 PRO 512GB, SATA
OS : FreeBSD 12 (HT disabled)
JDK : 11.0.4+11-2
DB : PostgreSQL 11.3

Client workstation

CPU : AMD Ryzen Threadripper 2950X
RAM : 32 GB
HDD : 2x Samsung SSD 970 PRO 512GB, M.2
OS : FreeBSD 12 (HT disabled)
JDK : 11.0.4+11-2

Apache JMeter 5.1.1 was used to run the benchmark.

Sep 10 '19 14:09 jan0sch

@jan0sch @rossabaker I also have this issue with the exact same error message.

The error only occurs using Websockets and NIO1. Using NIO2 resolves the issue.

May 04 '21 01:05 CharlesAHunt

Unfortunately, the NIO2 server is deprecated in blaze. Performance was worse by all measures, and we didn't backport the CVE fix.

May 06 '21 15:05 rossabaker

As of 0.23.10 I guess there is no fix for this issue, is that right? What is the preferred workaround? Would you recommend switching to a different server impl?

Feb 14 '22 11:02 sergiojoker11

The ember server is the new default one since some time now. I did not try to re-create the bug with the ember one but so far had no trouble with it (using it in production although only on light/moderate service loads).

Feb 14 '22 12:02 jan0sch