HDDS-11043. Explore client retry optimizations after write() and hsync() are desynced
What changes were proposed in this pull request?
-
Introduced
RetryRequestBatcher, a sliding-window planner that keeps failed writeChunk requests sorted by end offset, retains only the most recent putBlock offset, and produces an optimized retry plan (combined chunk list + putBlock flag). -
Wired the batcher into
BlockOutputStream: every outgoing writeChunk/putBlock updates the window,writeOnRetrynow replays the optimized plan (piggybacking the final chunk when supported), and acknowledgements/clears shrink the window once putBlock succeeds. -
Added
TestRetryRequestBatcherto exercise the batching logic across basic, duplicate putBlock, acknowledgement, complex, and bookkeeping scenarios.
Benefit:
-
Shared setup: Every writeChunk/putBlock RPC now flows through
RetryRequestBatcher. On the happy path we track each write’s end-offset and the latest putBlock offset. If an RPC fails, the window already knows exactly which buffers still need to be retried and in what order; when a putBlock succeeds,acknowledgeUpTo(flushPos)removes all requests the datanodes have committed. -
Retry without piggyback:
- Old sequence:
writeOnRetryblindly replayed each allocated chunk, issuing awriteChunk, immediately followed by a standaloneputBlock. That meantnfailed chunks produced2nretry RPCs, even if multiple writes could be coalesced before the next metadata update. - New sequence: we call
retryRequestBatcher.optimizeForRetry(). This collapses all outstanding chunks into a single ordered list and keeps just the highest putBlock offset. The retry loop now issues each chunk exactly once and sends a singleputBlockat the end. Result: fewer network round-trips, less checksum/compression work, and shorter retry latency.
- Old sequence:
-
Retry with piggyback enabled:
- Before: we still replayed every chunk one-by-one, and each chunk triggered a piggybacked
writeChunkAndPutBlock, so we ended up sending a putBlock for every chunk in the window. - After: we write the combined chunk list sequentially; when we reach the last outstanding chunk, we piggyback the final putBlock on that single RPC (
writeChunkAndPutBlock). All preceding chunks are sent as plainwriteChunkcalls. Effectively we collapse the retries to “N chunk writes + 1 piggybacked flush” instead of “N piggybacked writes”, reducing both network chatter and datanode commit work while preserving the benefits of piggyback (no extra standalone putBlock).
- Before: we still replayed every chunk one-by-one, and each chunk triggered a piggybacked
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-11043
How was this patch tested?
TestRetryRequestBatcher UT
Hi @jojochuang and @smengcl — just checking in. If you have a moment, I’d love any initial thoughts on this. Thanks!
@jojochuang @smengcl please take a look
This PR has been marked as stale due to 21 days of inactivity. Please comment or remove the stale label to keep it open. Otherwise, it will be automatically closed in 7 days.
Thank you for your contribution. This PR is being closed due to inactivity. If needed, feel free to reopen it.