HDFS-17357. NioInetPeer.close() should close socket connection.
Description of PR
JIRA: HDFS-17357
NioInetPeer.close() now do not close socket connection.
And I found 3w+ connections leakage in datanode . And I found many warn message as blew.
2024-01-22 15:27:57,500 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: hostname:50010:DataXceiverServer
java.io.IOException: Xceiver count 8198 exceeds the limit of concurrent xcievers: 8192
at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:234)
at java.lang.Thread.run(Thread.java:748)
When any Exception is found in DataXceiverServer, it will execute clostStream.
IOUtils.closeStream(peer) -> Peer.close() -> NioInetPeer.close()
But NioInetPeer.close() is not invoked with close socket connection. And this will lead to connection leakage.
And Other subClass of Peer's close() is implemented with socket.close(). See EncryptedPeer, DomainPeer, BasicInetPeer
This solution can be reporduced as following: (1) Client write data to HDFS (2) datanode Xceiver count max to DFS_DATANODE_MAX_RECEIVER_THREADS_KEY , the new Xceiver will fail and throw IOException . And the socket will not release. (3) Client crash for that no new data will be added or client.close is executed. (4) There will be socket connection leakage between datanodes.
The connection leakage like this dn1 dn1:57042 dn2:50010 ESTABLISHED
dn2 dn2:50010 dn1:57042 ESTABLISHED
:broken_heart: -1 overall
| Vote | Subsystem | Runtime | Logfile | Comment |
|---|---|---|---|---|
| +0 :ok: | reexec | 0m 20s | Docker mode activated. | |
| _ Prechecks _ | ||||
| +1 :green_heart: | dupname | 0m 0s | No case conflicting files found. | |
| +0 :ok: | codespell | 0m 0s | codespell was not available. | |
| +0 :ok: | detsecrets | 0m 0s | detect-secrets was not available. | |
| +1 :green_heart: | @author | 0m 0s | The patch does not contain any @author tags. | |
| -1 :x: | test4tests | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | |
| _ trunk Compile Tests _ | ||||
| +1 :green_heart: | mvninstall | 31m 13s | trunk passed | |
| +1 :green_heart: | compile | 0m 32s | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | |
| +1 :green_heart: | compile | 0m 28s | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | |
| +1 :green_heart: | checkstyle | 0m 21s | trunk passed | |
| +1 :green_heart: | mvnsite | 0m 33s | trunk passed | |
| +1 :green_heart: | javadoc | 0m 31s | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | |
| +1 :green_heart: | javadoc | 0m 28s | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | |
| +1 :green_heart: | spotbugs | 1m 25s | trunk passed | |
| +1 :green_heart: | shadedclient | 20m 8s | branch has no errors when building and testing our client artifacts. | |
| _ Patch Compile Tests _ | ||||
| +1 :green_heart: | mvninstall | 0m 29s | the patch passed | |
| +1 :green_heart: | compile | 0m 28s | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | |
| +1 :green_heart: | javac | 0m 28s | the patch passed | |
| +1 :green_heart: | compile | 0m 26s | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | |
| +1 :green_heart: | javac | 0m 26s | the patch passed | |
| +1 :green_heart: | blanks | 0m 0s | The patch has no blanks issues. | |
| +1 :green_heart: | checkstyle | 0m 12s | the patch passed | |
| +1 :green_heart: | mvnsite | 0m 27s | the patch passed | |
| +1 :green_heart: | javadoc | 0m 21s | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | |
| +1 :green_heart: | javadoc | 0m 22s | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | |
| +1 :green_heart: | spotbugs | 1m 24s | the patch passed | |
| +1 :green_heart: | shadedclient | 19m 55s | patch has no errors when building and testing our client artifacts. | |
| _ Other Tests _ | ||||
| +1 :green_heart: | unit | 1m 50s | hadoop-hdfs-client in the patch passed. | |
| +1 :green_heart: | asflicense | 0m 24s | The patch does not generate ASF License warnings. | |
| 84m 12s |
| Subsystem | Report/Notes |
|---|---|
| Docker | ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6502/1/artifact/out/Dockerfile |
| GITHUB PR | https://github.com/apache/hadoop/pull/6502 |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
| uname | Linux c26e1bbd7a7a 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | dev-support/bin/hadoop.sh |
| git revision | trunk / a1c48f705b7f6726982b19caf2737a38ed936c68 |
| Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
| Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
| Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6502/1/testReport/ |
| Max. process+thread count | 551 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-client U: hadoop-hdfs-project/hadoop-hdfs-client |
| Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6502/1/console |
| versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
| Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
This message was automatically generated.
@slfan1989 Hi, sir. Do you have time to review this ,Thanks!
@LiuGuH Thanks for your report. Would you mind to add unit test to cover this case?
@LiuGuH Hi, sir. I have one question here, Could you please explain more detailed how the connection leak? I see we invoke setKeepAlive(true) in DataStreamer#createSocketForPipeline and DataXceiver#writeBlock. Thanks a lot.
Thanks. And I modify this solution is not only in EC.
@LiuGuH Thanks for your report. Would you mind to add unit test to cover this case?
OK. This will need some time. And the test case only for peer release test. @Hexiaoqiao
@LiuGuH Hi, sir. I have one question here, Could you please explain more detailed how the connection leak? I see we invoke setKeepAlive(true) in
DataStreamer#createSocketForPipelineandDataXceiver#writeBlock. Thanks a lot.
Client(Crash) -> DN1 -> DN2 ( Xceiver count full ,the throw IOException , and Server Peer is alive) -> DN3
In this case, the connection between DN1 <-> DN2 will never release. Thanks
Thanks for review. @zhangshuyan0
The solution may be happened with datanodes that have heavy load IO. But the unit test case I can not reproduce. It has no relation with DFS_DATANODE_MAX_RECEIVER_THREADS_KEY.
With datanodes that have heavy load IO, in.close() and out.close() may be also throw IOException when close() is invoked and the socket may be not really closed.
:broken_heart: -1 overall
| Vote | Subsystem | Runtime | Logfile | Comment |
|---|---|---|---|---|
| _ Prechecks _ | ||||
| +1 :green_heart: | dupname | 0m 00s | No case conflicting files found. | |
| +0 :ok: | spotbugs | 0m 00s | spotbugs executables are not available. | |
| +0 :ok: | codespell | 0m 01s | codespell was not available. | |
| +0 :ok: | detsecrets | 0m 01s | detect-secrets was not available. | |
| +1 :green_heart: | @author | 0m 00s | The patch does not contain any @author tags. | |
| -1 :x: | test4tests | 0m 00s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | |
| _ trunk Compile Tests _ | ||||
| +1 :green_heart: | mvninstall | 85m 14s | trunk passed | |
| +1 :green_heart: | compile | 5m 07s | trunk passed | |
| +1 :green_heart: | checkstyle | 4m 21s | trunk passed | |
| +1 :green_heart: | mvnsite | 5m 11s | trunk passed | |
| +1 :green_heart: | javadoc | 4m 36s | trunk passed | |
| +1 :green_heart: | shadedclient | 139m 27s | branch has no errors when building and testing our client artifacts. | |
| _ Patch Compile Tests _ | ||||
| +1 :green_heart: | mvninstall | 3m 01s | the patch passed | |
| +1 :green_heart: | compile | 2m 35s | the patch passed | |
| +1 :green_heart: | javac | 2m 35s | the patch passed | |
| +1 :green_heart: | blanks | 0m 00s | The patch has no blanks issues. | |
| +1 :green_heart: | checkstyle | 2m 05s | the patch passed | |
| +1 :green_heart: | mvnsite | 2m 47s | the patch passed | |
| +1 :green_heart: | javadoc | 2m 15s | the patch passed | |
| +1 :green_heart: | shadedclient | 147m 52s | patch has no errors when building and testing our client artifacts. | |
| _ Other Tests _ | ||||
| +1 :green_heart: | asflicense | 5m 07s | The patch does not generate ASF License warnings. | |
| 396m 17s |
| Subsystem | Report/Notes |
|---|---|
| GITHUB PR | https://github.com/apache/hadoop/pull/6502 |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
| uname | MINGW64_NT-10.0-17763 0a31c1737811 3.4.10-87d57229.x86_64 2024-02-14 20:17 UTC x86_64 Msys |
| Build tool | maven |
| Personality | /c/hadoop/dev-support/bin/hadoop.sh |
| git revision | trunk / a1c48f705b7f6726982b19caf2737a38ed936c68 |
| Default Java | Azul Systems, Inc.-1.8.0_332-b09 |
| Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6502/1/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-client U: hadoop-hdfs-project/hadoop-hdfs-client |
| Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6502/1/console |
| versions | git=2.44.0.windows.1 |
| Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
This message was automatically generated.