FATE-Serving icon indicating copy to clipboard operation
FATE-Serving copied to clipboard

unable to create new native thread

Open yuanbw opened this issue 2 years ago • 0 comments

使用kubefate模拟部署了两个在线服务集群,分别为 guest kubectl get pods -n fate-serving-10005 NAME READY STATUS RESTARTS AGE serving-admin-744f988bc-2mh2l 1/1 Running 0 16h serving-proxy-59957b497d-vztml 1/1 Running 0 16h serving-redis-7fbb959b6c-bxcqt 1/1 Running 0 16h serving-server-65bccf659b-bqd6t 1/1 Running 0 16h serving-zookeeper-0 1/1 Running 0 16h

host kubectl get pods -n fate-serving-10006 NAME READY STATUS RESTARTS AGE serving-admin-69975d8d54-qhf7t 1/1 Running 0 16h serving-proxy-59bbb6b4fb-49xrw 1/1 Running 0 16h serving-redis-6894c69dfc-b4cbf 1/1 Running 0 16h serving-server-56dc9dd5b8-qpwkr 1/1 Running 0 16h serving-zookeeper-0 1/1 Running 0 16h

功能性测试验证通过,但是当性能测试时,QPS为100-200,guest和host的serving-proxy均报“unable to create new native thread”: guest: 2023-12-06 01:39:29,930 [ERROR] c.w.a.f.s.c.b.GrpcConnectionPool(GrpcConnectionPool.java:103) - grpc channel 10.73.99.153:30106 status is TRANSIENT_FAILURE 2023-12-06 01:39:39,930 [ERROR] c.w.a.f.s.c.b.GrpcConnectionPool(GrpcConnectionPool.java:103) - grpc channel 10.73.99.153:30106 status is TRANSIENT_FAILURE 2023-12-06 01:39:49,930 [ERROR] c.w.a.f.s.c.b.GrpcConnectionPool(GrpcConnectionPool.java:103) - grpc channel 10.73.99.153:30106 status is TRANSIENT_FAILURE 2023-12-06 01:39:58,790 [INFO ] i.g.n.s.i.g.n.N.connections(NettyServerTransport.java:203) - Transport failed java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) ~[?:1.8.0_192] at java.lang.Thread.start(Thread.java:717) ~[?:1.8.0_192] at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957) ~[?:1.8.0_192] at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1367) ~[?:1.8.0_192] at org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor.execute(ThreadPoolTaskExecutor.java:336) ~[spring-context-5.3.20.jar:5.3.20] at io.grpc.internal.SerializingExecutor.schedule(SerializingExecutor.java:102) ~[grpc-core-1.45.1.jar:1.45.1] at io.grpc.internal.SerializingExecutor.execute(SerializingExecutor.java:95) ~[grpc-core-1.45.1.jar:1.45.1] at io.grpc.internal.ServerImpl$ServerTransportListenerImpl.streamCreatedInternal(ServerImpl.java:643) ~[grpc-core-1.45.1.jar:1.45.1] at io.grpc.internal.ServerImpl$ServerTransportListenerImpl.streamCreated(ServerImpl.java:465) ~[grpc-core-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler.onHeadersRead(NettyServerHandler.java:476) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler.access$1000(NettyServerHandler.java:106) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler$FrameListener.onHeadersRead(NettyServerHandler.java:856) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onHeadersRead(DefaultHttp2ConnectionDecoder.java:409) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2InboundFrameLogger$1.onHeadersRead(Http2InboundFrameLogger.java:65) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader$1.processFragment(DefaultHttp2FrameReader.java:450) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readHeadersFrame(DefaultHttp2FrameReader.java:457) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.processPayloadState(DefaultHttp2FrameReader.java:253) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readFrame(DefaultHttp2FrameReader.java:159) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2InboundFrameLogger.readFrame(Http2InboundFrameLogger.java:41) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder.decodeFrame(DefaultHttp2ConnectionDecoder.java:173) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler$FrameDecoder.decode(Http2ConnectionHandler.java:378) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:438) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:507) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:446) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_192]

host: 2023-12-05 15:06:57,352 [INFO ] c.w.a.f.s.p.r.r.BaseServingRouter(BaseServingRouter.java:69) - caseid 1701788817352 get route info 10.42.5.237:8000 2023-12-05 15:06:57,354 [INFO ] c.w.a.f.s.p.u.FederatedModelUtils(FederatedModelUtils.java:59) - get model route key by version: 216 namespace: host#10006#guest-10005#host-10006#model tablename: 202312050912098047800, key : 202312050912098047800&host#10006#guest-10005#host-10006#model 2023-12-05 15:06:57,354 [INFO ] c.w.a.f.s.p.r.r.ZkServingRouter(ZkServingRouter.java:64) - try to find zk ,serving:ab548b8776d2bbb24dc3cfb3a901e255:unaryCall, result [grpc://10.42.5.237:8000/serving/ab548b8776d2bbb24dc3cf b3a901e255/unaryCall?router_mode=ALL_ALLOWED&timestamp=1701767649580&version=216] 2023-12-05 15:06:57,354 [INFO ] c.w.a.f.s.p.r.r.BaseServingRouter(BaseServingRouter.java:69) - caseid 1701788817353 get route info 10.42.5.237:8000 2023-12-05 18:28:33,102 [INFO ] i.g.n.s.i.g.n.N.connections(NettyServerTransport.java:203) - Transport failed io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2Exception: Unexpected HTTP/1.x request: GET / at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2Exception.connectionError(Http2Exception.java:108) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler$PrefaceDecoder.readClientPrefaceString(Http2ConnectionHandler.java:302) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler$PrefaceDecoder.decode(Http2ConnectionHandler.java:239) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:438) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:507) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:446) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_192] 2023-12-05 18:28:33,191 [INFO ] i.g.n.s.i.g.n.N.connections(NettyServerTransport.java:203) - Transport failed io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2Exception: Unexpected HTTP/1.x request: GET / at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2Exception.connectionError(Http2Exception.java:108) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler$PrefaceDecoder.readClientPrefaceString(Http2ConnectionHandler.java:302) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler$PrefaceDecoder.decode(Http2ConnectionHandler.java:239) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:438) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:507) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:446) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_192] 2023-12-05 18:30:03,152 [INFO ] i.g.n.s.i.g.n.N.connections(NettyServerTransport.java:203) - Transport failed java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) ~[?:1.8.0_192] at java.lang.Thread.start(Thread.java:717) ~[?:1.8.0_192] at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957) ~[?:1.8.0_192] at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1367) ~[?:1.8.0_192] at org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor.execute(ThreadPoolTaskExecutor.java:336) ~[spring-context-5.3.20.jar:5.3.20] at io.grpc.internal.SerializingExecutor.schedule(SerializingExecutor.java:102) ~[grpc-core-1.45.1.jar:1.45.1] at io.grpc.internal.SerializingExecutor.execute(SerializingExecutor.java:95) ~[grpc-core-1.45.1.jar:1.45.1] at io.grpc.internal.ServerImpl$ServerTransportListenerImpl.streamCreatedInternal(ServerImpl.java:643) ~[grpc-core-1.45.1.jar:1.45.1] at io.grpc.internal.ServerImpl$ServerTransportListenerImpl.streamCreated(ServerImpl.java:465) ~[grpc-core-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler.onHeadersRead(NettyServerHandler.java:476) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler.access$1000(NettyServerHandler.java:106) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler$FrameListener.onHeadersRead(NettyServerHandler.java:856) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onHeadersRead(DefaultHttp2ConnectionDecoder.java:409) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2InboundFrameLogger$1.onHeadersRead(Http2InboundFrameLogger.java:65) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader$1.processFragment(DefaultHttp2FrameReader.java:450) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readHeadersFrame(DefaultHttp2FrameReader.java:457) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.processPayloadState(DefaultHttp2FrameReader.java:253) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readFrame(DefaultHttp2FrameReader.java:159) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2InboundFrameLogger.readFrame(Http2InboundFrameLogger.java:41) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder.decodeFrame(DefaultHttp2ConnectionDecoder.java:173) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler$FrameDecoder.decode(Http2ConnectionHandler.java:378) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:438) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:507) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:446) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_192] 2023-12-05 18:30:33,130 [INFO ] i.g.n.s.i.g.n.N.connections(NettyServerTransport.java:203) - Transport failed java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) ~[?:1.8.0_192] at java.lang.Thread.start(Thread.java:717) ~[?:1.8.0_192] at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957) ~[?:1.8.0_192] at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1367) ~[?:1.8.0_192] at org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor.execute(ThreadPoolTaskExecutor.java:336) ~[spring-context-5.3.20.jar:5.3.20] at io.grpc.internal.SerializingExecutor.schedule(SerializingExecutor.java:102) ~[grpc-core-1.45.1.jar:1.45.1] at io.grpc.internal.SerializingExecutor.execute(SerializingExecutor.java:95) ~[grpc-core-1.45.1.jar:1.45.1] at io.grpc.internal.ServerImpl$ServerTransportListenerImpl.streamCreatedInternal(ServerImpl.java:643) ~[grpc-core-1.45.1.jar:1.45.1] at io.grpc.internal.ServerImpl$ServerTransportListenerImpl.streamCreated(ServerImpl.java:465) ~[grpc-core-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler.onHeadersRead(NettyServerHandler.java:476) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler.access$1000(NettyServerHandler.java:106) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler$FrameListener.onHeadersRead(NettyServerHandler.java:856) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onHeadersRead(DefaultHttp2ConnectionDecoder.java:409) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2InboundFrameLogger$1.onHeadersRead(Http2InboundFrameLogger.java:65) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader$1.processFragment(DefaultHttp2FrameReader.java:450) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readHeadersFrame(DefaultHttp2FrameReader.java:457) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.processPayloadState(DefaultHttp2FrameReader.java:253) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readFrame(DefaultHttp2FrameReader.java:159) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2InboundFrameLogger.readFrame(Http2InboundFrameLogger.java:41) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder.decodeFrame(DefaultHttp2ConnectionDecoder.java:173) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler$FrameDecoder.decode(Http2ConnectionHandler.java:378) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:438) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:507) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:446) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) ~[grpc-netty-shaded-1.45.1.jar:1.45.1]

其中,两方的proxy的路由表为: `[fate@yp-tgppc-ppc01 ~]$ kubectl get cm serving-proxy-config -n fate-serving-10005 -o yaml apiVersion: v1 data: application.properties: | # # Copyright 2019 The FATE Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # # coordinator same as Party ID coordinator=10005 server.port=8059 #inference.service.name=serving #random, consistent #routeType=random #route.table=/data/projects/fate-serving/serving-proxy/conf/route_table.json #auth.file=/data/projects/fate-serving/serving-proxy/conf/auth_config.json # zk router #useZkRouter=true zk.url=serving-zookeeper:2181 useZkRouter=true # zk acl #acl.enable=false #acl.username= #acl.password= # intra-partyid port #proxy.grpc.intra.port=8879 # inter-partyid port #proxy.grpc.inter.port=8869

# grpc
# only support PLAINTEXT, TLS(we use Mutual TLS here), if use TSL authentication
#proxy.grpc.inter.negotiationType=PLAINTEXT
# only needs to be set when negotiationType is TLS
#proxy.grpc.inter.CA.file=/data/projects/fate-serving/serving-proxy/conf/ssl/ca.crt
# negotiated client side certificates
#proxy.grpc.inter.client.certChain.file=/data/projects/fate-serving/serving-proxy/conf/ssl/client.crt
#proxy.grpc.inter.client.privateKey.file=/data/projects/fate-serving/serving-proxy/conf/ssl/client.pem
# negotiated server side certificates
#proxy.grpc.inter.server.certChain.file=/data/projects/fate-serving/serving-proxy/conf/ssl/server.crt
#proxy.grpc.inter.server.privateKey.file=/data/projects/fate-serving/serving-proxy/conf/ssl/server.pem

#proxy.grpc.inference.timeout=3000
#proxy.grpc.inference.async.timeout=1000
#proxy.grpc.unaryCall.timeout=3000
proxy.grpc.threadpool.coresize=5000
proxy.grpc.threadpool.maxsize=10000
proxy.grpc.threadpool.queuesize=1000
#proxy.async.timeout=5000
proxy.async.coresize=1000
proxy.async.maxsize=10000
#proxy.grpc.batch.inference.timeout=10000

route_table.json: | { "route_table": { "default": { "default": [ { "ip": "serving-proxy", "port": 8869 } ] }, "10006": { "default": [ { "ip": "10.73.99.153", "port": "30106" } ] }, "10005": { "default": [ { "ip": "serving-proxy", "port": 8059 } ], "serving": [ { "ip": "serving-server", "port": 8000 } ] } }, "permission": { "default_allow": true } } kind: ConfigMap metadata: annotations: meta.helm.sh/release-name: fate-serving-10005 meta.helm.sh/release-namespace: fate-serving-10005 creationTimestamp: "2023-12-05T09:10:02Z" labels: app.kubernetes.io/managed-by: Helm cluster: fate-serving fateMoudle: serving-proxy name: fate-serving-9999 owner: kubefate partyId: "10005" name: serving-proxy-config namespace: fate-serving-10005 resourceVersion: "116781084" selfLink: /api/v1/namespaces/fate-serving-10005/configmaps/serving-proxy-config uid: f9c36c83-c661-4148-8449-9a23a86ccf47 [fate@yp-tgppc-ppc01 ~]$ kubectl get cm serving-proxy-config -n fate-serving-10006 -o yaml apiVersion: v1 data: application.properties: | # # Copyright 2019 The FATE Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # # coordinator same as Party ID coordinator=10006 server.port=8059 #inference.service.name=serving #random, consistent #routeType=random #route.table=/data/projects/fate-serving/serving-proxy/conf/route_table.json #auth.file=/data/projects/fate-serving/serving-proxy/conf/auth_config.json # zk router #useZkRouter=true zk.url=serving-zookeeper:2181 useZkRouter=true # zk acl #acl.enable=false #acl.username= #acl.password= # intra-partyid port #proxy.grpc.intra.port=8879 # inter-partyid port #proxy.grpc.inter.port=8869

# grpc
# only support PLAINTEXT, TLS(we use Mutual TLS here), if use TSL authentication
#proxy.grpc.inter.negotiationType=PLAINTEXT
# only needs to be set when negotiationType is TLS
#proxy.grpc.inter.CA.file=/data/projects/fate-serving/serving-proxy/conf/ssl/ca.crt
# negotiated client side certificates
#proxy.grpc.inter.client.certChain.file=/data/projects/fate-serving/serving-proxy/conf/ssl/client.crt
#proxy.grpc.inter.client.privateKey.file=/data/projects/fate-serving/serving-proxy/conf/ssl/client.pem
# negotiated server side certificates
#proxy.grpc.inter.server.certChain.file=/data/projects/fate-serving/serving-proxy/conf/ssl/server.crt
#proxy.grpc.inter.server.privateKey.file=/data/projects/fate-serving/serving-proxy/conf/ssl/server.pem

#proxy.grpc.inference.timeout=3000
#proxy.grpc.inference.async.timeout=1000
#proxy.grpc.unaryCall.timeout=3000
proxy.grpc.threadpool.coresize=5000
proxy.grpc.threadpool.maxsize=10000
proxy.grpc.threadpool.queuesize=1000
#proxy.async.timeout=5000
proxy.async.coresize=1000
proxy.async.maxsize=10000
#proxy.grpc.batch.inference.timeout=10000

route_table.json: | { "route_table": { "default": { "default": [ { "ip": "serving-proxy", "port": 8869 } ] }, "10005": { "default": [ { "ip": "10.73.99.153", "port": "30096" } ] }, "10006": { "default": [ { "ip": "serving-proxy", "port": 8059 } ], "serving": [ { "ip": "serving-server", "port": 8000 } ] } }, "permission": { "default_allow": true } } kind: ConfigMap metadata: annotations: meta.helm.sh/release-name: fate-serving-10006 meta.helm.sh/release-namespace: fate-serving-10006 creationTimestamp: "2023-12-05T09:02:25Z" labels: app.kubernetes.io/managed-by: Helm cluster: fate-serving fateMoudle: serving-proxy name: fate-serving-9999 owner: kubefate partyId: "10006" name: serving-proxy-config namespace: fate-serving-10006 resourceVersion: "116776658" selfLink: /api/v1/namespaces/fate-serving-10006/configmaps/serving-proxy-config uid: 6f9c1943-3fb0-4c10-9ec2-f629b3c82186`

注: 1.上述路由是kubefate安装Fate Serving 2.1.6后默认生成的。 2.K8S节点为48核/64G/597G。

yuanbw avatar Dec 06 '23 02:12 yuanbw