cloudflare-elastic icon indicating copy to clipboard operation
cloudflare-elastic copied to clipboard

Lambda: Failure opening selector

Open davispalomino opened this issue 5 years ago • 2 comments

Dear, there is an error in the execution of the lambda, when you see it, it may be an overload of the function for reading a large number of registers.

Error:

org.apache.http.nio.reactor.IOReactorException: Failure opening selector
java.lang.IllegalStateException: org.apache.http.nio.reactor.IOReactorException: Failure opening selector

Metric lambda: https://ibb.co/tZpZmYt

Thank you for your reply.

Regards

davispalomino avatar Sep 22 '20 17:09 davispalomino

We were just hitting the same issue:

Oct 07, 2020 12:41:25 PM org.apache.http.impl.nio.client.InternalHttpAsyncClient runorg.apache.http.impl.nio.client.InternalHttpAsyncClient run 
SEVERE: I/O reactor terminated abnormally 
org.apache.http.nio.reactor.IOReactorException: Failure opening selector
at org.apache.http.impl.nio.reactor.AbstractIOReactor.<init>(AbstractIOReactor.java:103)
at org.apache.http.impl.nio.reactor.BaseIOReactor.<init>(BaseIOReactor.java:85) 
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:321)
at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:221)
at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.io.IOException: Too many open files 
at java.base/sun.nio.ch.EPoll.create(Native Method)
at java.base/sun.nio.ch.EPollSelectorImpl.<init>(Unknown Source)
at java.base/sun.nio.ch.EPollSelectorProvider.openSelector(Unknown Source)
at java.base/java.nio.channels.Selector.open(Unknown Source)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.<init>(AbstractIOReactor.java:101)
... 5 more
Finished processing; flushing any remaining logs...
Elasticsearch processor shut down

It seems, that it "crashed" while the cluster was under heavy pressure, but was not able to recover from this. It only started working again, after I've changed the lambda and it implicitly restarted.

The lambda was running fine for months now, so i think "too many open files" is a symptom/side effect, but not a actual issue. I don't think a lambda can run long enough to accumulate this many files.

lobeck avatar Oct 07 '20 13:10 lobeck

Hi, I applied your solution in PR #22, and I still see these errors. I tried to add more workers, but it doesn't help. Do you know what else it can be? Thanks.

netanel246 avatar Oct 13 '20 15:10 netanel246