HDDS-10108. [hsync] Adopt RATIS-1994 to reduce hsync ops latency
What changes were proposed in this pull request?
Adopt the new Ratis API added in RATIS-1994 to (hopefully) reduce hsync latency.
Basically the idea is that ratis client send() call alone would suffice. And we could ditch watchForCommit in hsync (writeChunk + putBlock) path.
(!) Uses Ratis 3.1.0 snapshot build. Can be merged to feature branch.
Note:
- In order for
failedServerslist to be properly appended when a datanode lags behind in Ratis serverwatch()(in which case server throwsNotReplicatedException), as checked byTestFailureHandlingByClient.testDatanodeExclusionWithMajorityCommit, significant adjustments has been made aroundsendRequest()andsendCommand(). - Datanodes
raft.server.watch.timeoutconfig has been lowered from 180s to 10s (Ratis default), as it has to be lower thanraft.client.rpc.request.timeout(currently 60s in Ozone) forNotReplicatedExceptionto be propagated
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-10108
How was this patch tested?
- Existing tests must all pass. Manual latency test pending.
@smengcl , thanks for working on this!
TestBlockOutputStreamCorrectness can pass if we override watchForCommit(..) in CommitWatcher; see https://issues.apache.org/jira/secure/attachment/13066441/6014_idea.patch (Need to update TestCommitWatcher.testReleaseBuffersOnException)
@smengcl , thanks for working on this!
TestBlockOutputStreamCorrectnesscan pass if we overridewatchForCommit(..)inCommitWatcher; see issues.apache.org/jira/secure/attachment/13066441/6014_idea.patch (Need to updateTestCommitWatcher.testReleaseBuffersOnException)
Thanks @szetszwo for the help. It works!
I noticed some refactoring (HDDS-10173) done in TestBlockOutputStreamCorrectness in master but not in the feature branch so I didn't change it at the moment to reduce merge conflict (test would still pass, the key is CommitWatcher).
3 test failures remaining at this point, not including flaky ones:
-
TestRandomKeyGenerator.testKeyLargerThan2GB-- somehow key write always fails with this large 2.01 GB key in this test. It can be reproduced locally. If I change the key size to 0.01 GB it passes. I suspect this has something to do with client write buffer management as well. But haven't quite figured out why. -
TestFailureHandlingByClient.testDatanodeExclusionWithMajorityCommit-- it fails because the test is expecting the datanode exclusion list to be populated with the datanode the test just shutdown. But it is not doing so because I modifiedwatchForCommitto NOT actually do the Ratis request (remember? one major goal of this PR is to eliminate the second roundtrip). Now I'm trying to put the exclude datanode logic somewhere else while maintaining the overall client behavior. -
TestOzoneClientRetriesOnExceptions.testMaxRetriesByOzoneClient--AssertionFailedError: Expected java.io.IOException to be thrown, but nothing was thrownlooks to be the same type of issue asTestCommitWatcher#testReleaseBuffersOnExceptionthat I fixed already in 5c68ead09ca3c6235c998c338e9a64db957479fc. Maybe not.
For 2: XceiverClientRatis#watchForCommit had this logic where it initially attempts ALL_COMMITTED, but when it failed to do a 3-way commit, it falls back to MAJORITY_COMMITTED, which is where it adds the failed datanode to the list.
XceiverClientRatis#sendCommandAsync might have to do exactly the same to keep the same behavior now that watchForCommit Ratis request is omitted. (Or is it worth keeping such behavior?)
TestRandomKeyGenerator.testKeyLargerThan2GB -- somehow key write always fails with this large 2.01 GB key ...
2GB seems like a 32-bit signed int problem. We should try 1.01 GB.