dcache icon indicating copy to clipboard operation
dcache copied to clipboard

Java errors on webdav-door when upgrade to java-17 on ALMA9

Open elenamplanas opened this issue 11 months ago • 7 comments

We're runnig :

  • dcache 9.2.25
  • ALMA9
  • java-11

When we tried to move to java-17, the webdav door is starting correctly, but on some requests the trace below is appearing:

21 Jan 2025 10:12:24 (WebDAV-ATLAST1-door01) [door:WebDAV-ATLAST1-door01@webdav-at1-https-door01Domain:AAYsM8Z3wHg] exception sending redirect
org.eclipse.jetty.io.EofException: null
	at org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:280)
	at org.eclipse.jetty.io.ssl.SslConnection.networkFlush(SslConnection.java:489)
	at org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.flush(SslConnection.java:1112)
	at org.eclipse.jetty.io.WriteFlusher.flush(WriteFlusher.java:422)
	at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:277)
	at org.eclipse.jetty.io.AbstractEndPoint.write(AbstractEndPoint.java:381)
	at org.eclipse.jetty.server.HttpConnection$SendCallback.process(HttpConnection.java:828)
	at org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:248)
	at org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:229)
	at org.eclipse.jetty.server.HttpConnection.send(HttpConnection.java:555)
	at org.eclipse.jetty.server.HttpChannel.sendResponse(HttpChannel.java:1009)
	at org.eclipse.jetty.server.HttpChannel.write(HttpChannel.java:1086)
	at org.eclipse.jetty.server.HttpOutput.channelWrite(HttpOutput.java:285)
	at org.eclipse.jetty.server.HttpOutput.close(HttpOutput.java:638)
	at org.eclipse.jetty.server.Response.closeOutput(Response.java:909)
	at org.eclipse.jetty.server.Response.sendRedirect(Response.java:574)
	at org.eclipse.jetty.server.Response.sendRedirect(Response.java:505)
	at org.eclipse.jetty.server.Response.sendRedirect(Response.java:580)
	at javax.servlet.http.HttpServletResponseWrapper.sendRedirect(HttpServletResponseWrapper.java:176)
	at io.milton.servlet.ServletResponse.sendRedirect(ServletResponse.java:139)
	at io.milton.http.http11.DefaultHttp11ResponseHandler.respondRedirect(DefaultHttp11ResponseHandler.java:180)
	at io.milton.http.webdav.DefaultWebDavResponseHandler.respondRedirect(DefaultWebDavResponseHandler.java:137)
	at io.milton.http.AbstractWrappingResponseHandler.respondRedirect(AbstractWrappingResponseHandler.java:145)
	at org.dcache.webdav.AcceptAwareResponseHandler.respondRedirect(AcceptAwareResponseHandler.java:187)
	at io.milton.http.AbstractWrappingResponseHandler.respondRedirect(AbstractWrappingResponseHandler.java:145)
	at io.milton.http.AbstractWrappingResponseHandler.respondRedirect(AbstractWrappingResponseHandler.java:145)
	at io.milton.http.AbstractWrappingResponseHandler.respondRedirect(AbstractWrappingResponseHandler.java:145)
	at io.milton.http.HandlerHelper.doCheckRedirect(HandlerHelper.java:161)
	at io.milton.http.ResourceHandlerHelper.processResource(ResourceHandlerHelper.java:161)
	at io.milton.http.http11.GetHandler.processResource(GetHandler.java:66)
	at io.milton.http.ResourceHandlerHelper.process(ResourceHandlerHelper.java:127)
	at org.dcache.webdav.DcacheResourceHandlerHelper.process(DcacheResourceHandlerHelper.java:42)
	at io.milton.http.http11.GetHandler.process(GetHandler.java:60)
	at org.dcache.webdav.DcacheStandardFilter.process(DcacheStandardFilter.java:50)
	at io.milton.http.FilterChain.process(FilterChain.java:46)
	at org.dcache.webdav.transfer.CopyFilter.process(CopyFilter.java:276)
	at io.milton.http.FilterChain.process(FilterChain.java:46)
	at io.milton.http.HttpManager.process(HttpManager.java:158)
	at org.dcache.webdav.MiltonHandler.handle(MiltonHandler.java:77)
	at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:59)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
	at org.dcache.http.AuthenticationHandler.access$001(AuthenticationHandler.java:55)
	at org.dcache.http.AuthenticationHandler.lambda$handle$0(AuthenticationHandler.java:157)
	at java.base/java.security.AccessController.doPrivileged(AccessController.java:399)
	at java.base/javax.security.auth.Subject.doAs(Subject.java:376)
	at org.dcache.http.AuthenticationHandler.handle(AuthenticationHandler.java:154)
	at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:59)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
	at org.dcache.http.AbstractLoggingHandler.handle(AbstractLoggingHandler.java:110)
	at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:59)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
	at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
	at org.eclipse.jetty.server.Server.handle(Server.java:516)
	at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)
	at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)
	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479)
	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
	at org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:555)
	at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:410)
	at org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:164)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
	at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
	at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: java.io.IOException: Broken pipe
	at java.base/sun.nio.ch.FileDispatcherImpl.writev0(Native Method)
	at java.base/sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:66)
	at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:217)
	at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:153)
	at java.base/sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:563)
	at java.base/java.nio.channels.SocketChannel.write(SocketChannel.java:642)
	at org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:274)
	... 72 common frames omitted

The request in the access log is

level=INFO ts=2025-01-21T10:12:24.973+0100 event=org.dcache.webdav.request request.method=GET request.url=davs://webdav-at1.pic.es:8446/atlasdatadisk/rucio/mc23_13p6TeV/eb/e3/EVNT.42157338._000211.pool.root.1 response.code=302 location=http://[2001:67c:1148:200:0:0:0:48]:22605/atlasdatadisk/rucio/mc23_13p6TeV/eb/e3/EVNT.42157338._000211.pool.root.1?dcache-http-uuid=b25a58c7-58b8-4d57-bf08-8a9c466404f8&dcache-http-ref=davs%3A%2F%2Fwebdav-at1.pic.es%3A8446 socket.remote=[2001:67c:1148:500::73]:54854 user-agent=ARC user.dn="CN=1737450670,CN=1747088637,CN=834825844,CN=Robot: ATLAS aCT 1,CN=727357,CN=atlact1,OU=Users,OU=Organic Units,DC=cern,DC=ch" user.mapped=42001:1307 transaction=door:WebDAV-ATLAST1-door01@webdav-at1-https-door01Domain:AAYsM8Z3wHg:1737450744961000 duration=17

The door redirects de transfer to the pool, and the mover is created, but no connection beetween the pool and the client is established and is giving up

21 Jan 2025 10:17:24 (dc048_2) [door:WebDAV-ATLAST1-door01@webdav-at1-https-door01Domain:AAYsM8Z3wHg WebDAV-ATLAST1-door01 PoolDeliverFile 0000D8DAB4AE844640D181178DC471293EC0] Transfer failed: No connection from client after 300 seconds. Giving up.

On the door, we've set :

update-crypto-policies DEFAULT:SHA1

To bypass the problem, we've downgroaded to java-11 and now it's running everything correctly. We've planned to upgrade to java-17 at the same time that the upgrade to dcache 10.2. This issue is only to provide information about the problem to help if someone else faces the same error.

Cheers, Elena

elenamplanas avatar Feb 05 '25 16:02 elenamplanas

NDGF now suffer from this now in production with dCache 10.2. We don't have the possibility to downgrade Java I think.

Ref: RT ticket [www.dcache.org #10690]

nsc-jens avatar Feb 12 '25 12:02 nsc-jens

@elenamplanas Do you run this through an HAProxy or is this a direct connection?

nsc-jens avatar Feb 13 '25 13:02 nsc-jens

@elenamplanas And did you find an easy way to trigger the issue? All my test transfers works, so it's a bit hard to diagnose.

nsc-jens avatar Feb 13 '25 15:02 nsc-jens

We're using a proxy for http and https connections. We've done test and they worked correctly. What I've not tested are third party copies. I'll do more tests trying to reproduce de problem.

elenamplanas avatar Feb 13 '25 20:02 elenamplanas

If you can reproduce the problem yourself, check what OS you use. On big issue site here upgraded from Rocky 8 to Rock 9 and the problem went away. But there could be a million other reasons too I guess, since we have no idea of what triggers this.

nsc-jens avatar Feb 14 '25 12:02 nsc-jens

So, our prime sources for broken transfers upgraded their operating systems. They went from Rocky 8 to Rock 9 and this completely stopped the java tracebacks we saw. They day before the upgrade we had 274354 tracebacks on ONE of the two active headnodes. They day and since then 0.

This is on the client using ARC. I have no information about if they upgraded ARC too (except for build platform version). I have yet to reproduce the problem but I am now looking for a Rocky 8 machine. Very rare around here unfortunately.

It's also possibly that an older Ubuntu version caused issues like this. These were also upgraded at the same time. No version info here.

nsc-jens avatar Feb 18 '25 15:02 nsc-jens

Hi @nsc-jens @paurkedal, the client must as well update the security policy with update-crypto-policies DEFAULT:SHA1

See: https://twiki.cern.ch/twiki/bin/view/LCG/EL9vsSHA1CAs

kofemann avatar Feb 25 '25 13:02 kofemann