Java errors on webdav-door when upgrade to java-17 on ALMA9
We're runnig :
- dcache 9.2.25
- ALMA9
- java-11
When we tried to move to java-17, the webdav door is starting correctly, but on some requests the trace below is appearing:
21 Jan 2025 10:12:24 (WebDAV-ATLAST1-door01) [door:WebDAV-ATLAST1-door01@webdav-at1-https-door01Domain:AAYsM8Z3wHg] exception sending redirect
org.eclipse.jetty.io.EofException: null
at org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:280)
at org.eclipse.jetty.io.ssl.SslConnection.networkFlush(SslConnection.java:489)
at org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.flush(SslConnection.java:1112)
at org.eclipse.jetty.io.WriteFlusher.flush(WriteFlusher.java:422)
at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:277)
at org.eclipse.jetty.io.AbstractEndPoint.write(AbstractEndPoint.java:381)
at org.eclipse.jetty.server.HttpConnection$SendCallback.process(HttpConnection.java:828)
at org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:248)
at org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:229)
at org.eclipse.jetty.server.HttpConnection.send(HttpConnection.java:555)
at org.eclipse.jetty.server.HttpChannel.sendResponse(HttpChannel.java:1009)
at org.eclipse.jetty.server.HttpChannel.write(HttpChannel.java:1086)
at org.eclipse.jetty.server.HttpOutput.channelWrite(HttpOutput.java:285)
at org.eclipse.jetty.server.HttpOutput.close(HttpOutput.java:638)
at org.eclipse.jetty.server.Response.closeOutput(Response.java:909)
at org.eclipse.jetty.server.Response.sendRedirect(Response.java:574)
at org.eclipse.jetty.server.Response.sendRedirect(Response.java:505)
at org.eclipse.jetty.server.Response.sendRedirect(Response.java:580)
at javax.servlet.http.HttpServletResponseWrapper.sendRedirect(HttpServletResponseWrapper.java:176)
at io.milton.servlet.ServletResponse.sendRedirect(ServletResponse.java:139)
at io.milton.http.http11.DefaultHttp11ResponseHandler.respondRedirect(DefaultHttp11ResponseHandler.java:180)
at io.milton.http.webdav.DefaultWebDavResponseHandler.respondRedirect(DefaultWebDavResponseHandler.java:137)
at io.milton.http.AbstractWrappingResponseHandler.respondRedirect(AbstractWrappingResponseHandler.java:145)
at org.dcache.webdav.AcceptAwareResponseHandler.respondRedirect(AcceptAwareResponseHandler.java:187)
at io.milton.http.AbstractWrappingResponseHandler.respondRedirect(AbstractWrappingResponseHandler.java:145)
at io.milton.http.AbstractWrappingResponseHandler.respondRedirect(AbstractWrappingResponseHandler.java:145)
at io.milton.http.AbstractWrappingResponseHandler.respondRedirect(AbstractWrappingResponseHandler.java:145)
at io.milton.http.HandlerHelper.doCheckRedirect(HandlerHelper.java:161)
at io.milton.http.ResourceHandlerHelper.processResource(ResourceHandlerHelper.java:161)
at io.milton.http.http11.GetHandler.processResource(GetHandler.java:66)
at io.milton.http.ResourceHandlerHelper.process(ResourceHandlerHelper.java:127)
at org.dcache.webdav.DcacheResourceHandlerHelper.process(DcacheResourceHandlerHelper.java:42)
at io.milton.http.http11.GetHandler.process(GetHandler.java:60)
at org.dcache.webdav.DcacheStandardFilter.process(DcacheStandardFilter.java:50)
at io.milton.http.FilterChain.process(FilterChain.java:46)
at org.dcache.webdav.transfer.CopyFilter.process(CopyFilter.java:276)
at io.milton.http.FilterChain.process(FilterChain.java:46)
at io.milton.http.HttpManager.process(HttpManager.java:158)
at org.dcache.webdav.MiltonHandler.handle(MiltonHandler.java:77)
at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:59)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.dcache.http.AuthenticationHandler.access$001(AuthenticationHandler.java:55)
at org.dcache.http.AuthenticationHandler.lambda$handle$0(AuthenticationHandler.java:157)
at java.base/java.security.AccessController.doPrivileged(AccessController.java:399)
at java.base/javax.security.auth.Subject.doAs(Subject.java:376)
at org.dcache.http.AuthenticationHandler.handle(AuthenticationHandler.java:154)
at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:59)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.dcache.http.AbstractLoggingHandler.handle(AbstractLoggingHandler.java:110)
at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:59)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.eclipse.jetty.server.Server.handle(Server.java:516)
at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)
at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
at org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:555)
at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:410)
at org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:164)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: java.io.IOException: Broken pipe
at java.base/sun.nio.ch.FileDispatcherImpl.writev0(Native Method)
at java.base/sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:66)
at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:217)
at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:153)
at java.base/sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:563)
at java.base/java.nio.channels.SocketChannel.write(SocketChannel.java:642)
at org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:274)
... 72 common frames omitted
The request in the access log is
level=INFO ts=2025-01-21T10:12:24.973+0100 event=org.dcache.webdav.request request.method=GET request.url=davs://webdav-at1.pic.es:8446/atlasdatadisk/rucio/mc23_13p6TeV/eb/e3/EVNT.42157338._000211.pool.root.1 response.code=302 location=http://[2001:67c:1148:200:0:0:0:48]:22605/atlasdatadisk/rucio/mc23_13p6TeV/eb/e3/EVNT.42157338._000211.pool.root.1?dcache-http-uuid=b25a58c7-58b8-4d57-bf08-8a9c466404f8&dcache-http-ref=davs%3A%2F%2Fwebdav-at1.pic.es%3A8446 socket.remote=[2001:67c:1148:500::73]:54854 user-agent=ARC user.dn="CN=1737450670,CN=1747088637,CN=834825844,CN=Robot: ATLAS aCT 1,CN=727357,CN=atlact1,OU=Users,OU=Organic Units,DC=cern,DC=ch" user.mapped=42001:1307 transaction=door:WebDAV-ATLAST1-door01@webdav-at1-https-door01Domain:AAYsM8Z3wHg:1737450744961000 duration=17
The door redirects de transfer to the pool, and the mover is created, but no connection beetween the pool and the client is established and is giving up
21 Jan 2025 10:17:24 (dc048_2) [door:WebDAV-ATLAST1-door01@webdav-at1-https-door01Domain:AAYsM8Z3wHg WebDAV-ATLAST1-door01 PoolDeliverFile 0000D8DAB4AE844640D181178DC471293EC0] Transfer failed: No connection from client after 300 seconds. Giving up.
On the door, we've set :
update-crypto-policies DEFAULT:SHA1
To bypass the problem, we've downgroaded to java-11 and now it's running everything correctly. We've planned to upgrade to java-17 at the same time that the upgrade to dcache 10.2. This issue is only to provide information about the problem to help if someone else faces the same error.
Cheers, Elena
NDGF now suffer from this now in production with dCache 10.2. We don't have the possibility to downgrade Java I think.
Ref: RT ticket [www.dcache.org #10690]
@elenamplanas Do you run this through an HAProxy or is this a direct connection?
@elenamplanas And did you find an easy way to trigger the issue? All my test transfers works, so it's a bit hard to diagnose.
We're using a proxy for http and https connections. We've done test and they worked correctly. What I've not tested are third party copies. I'll do more tests trying to reproduce de problem.
If you can reproduce the problem yourself, check what OS you use. On big issue site here upgraded from Rocky 8 to Rock 9 and the problem went away. But there could be a million other reasons too I guess, since we have no idea of what triggers this.
So, our prime sources for broken transfers upgraded their operating systems. They went from Rocky 8 to Rock 9 and this completely stopped the java tracebacks we saw. They day before the upgrade we had 274354 tracebacks on ONE of the two active headnodes. They day and since then 0.
This is on the client using ARC. I have no information about if they upgraded ARC too (except for build platform version). I have yet to reproduce the problem but I am now looking for a Rocky 8 machine. Very rare around here unfortunately.
It's also possibly that an older Ubuntu version caused issues like this. These were also upgraded at the same time. No version info here.
Hi @nsc-jens @paurkedal, the client must as well update the security policy with update-crypto-policies DEFAULT:SHA1
See: https://twiki.cern.ch/twiki/bin/view/LCG/EL9vsSHA1CAs