ZOOKEEPER-4327: Fix flaky RequestThrottlerTest
This PR tries to fix several test failures in RequestThrottlerTest.
First, RequestThrottlerTest#testDropStaleRequests.
Place Thread.sleep(200) after submittedRequests.take() in RequestThrottler#run will fail two assertions:
-
assertEquals(2L, (long) metrics.get("prep_processor_request_queued")) -
assertEquals(1L, (long) metrics.get("request_throttle_wait_count"))
This happens due to setStale chould happen before throttle handling.
This commit solves this by introducing an interception point RequestThrottler.throttleSleep to build happen-before relations:
-
throttling.countDownhappens beforesetStale, this ensures that unthrottled request are processed as usual. -
setStalehappens beforethrottled.await, this defendsRequestThrottler.throttleSleepagainst spurious wakeup.
Second, RequestThrottlerTest#testRequestThrottler.
-
RequestThrottlerTest.testRequestThrottler:197 expected: <2> but was: <1>ZooKeeperServer#submitRequestandPrepRequestProcessor#processRequestrun in different threads, thus there is no guarantee on metricprep_processor_request_queuedaftersubmitted.await(5, TimeUnit.SECONDS). PlaceThread.sleep(200)beforezks.submitRequestNow(request)inRequestThrottler#runwill incur this failure. -
RequestThrottlerTest.testRequestThrottler:206 expected: <5> but was: <4>entered.await(STALL_TIME, TimeUnit.MILLISECONDS)could returnfalsedue to almost same timeout asRequestThrottler#throttleSleep. PlaceThread.sleep(500)aroundthrottleSleepwill increase failure possibility.
Third, RequestThrottlerTest#testGlobalOutstandingRequestThrottlingWithRequestThrottlerDisabled.
-
RequestThrottlerTest.testGlobalOutstandingRequestThrottlingWithRequestThrottlerDisabled:340 expected: <3> but was: <4>ZooKeeperServer#shouldThrottledepends on consistent sum ofgetInflightandgetInProcess. But it is no true. PlaceThread.sleep(200)beforezks.submitRequestNow(request)inRequestThrottler#runcould reproduce this.
Sees also https://github.com/apache/zookeeper/pull/1739, https://github.com/apache/zookeeper/pull/1821.
This fixes failed ci after #1821 merged(https://github.com/apache/zookeeper/commit/c05ca0ee0deedd50d0ef32e3e01ad7f691d27009) https://github.com/apache/zookeeper/runs/6568872986?check_suite_focus=true
P.S. Place Thread.sleep(200) after submittedRequests.take() in
RequestThrottler#run will fail these assertions also before #1821, but for different reason due to changes in #1821.
@maoling @eolivelli @symat PTAL
I overlooked test failures unrelated to testLargeRequestThrottling in #1821. I try to fix all remaining test failures in RequestThrottlerTest in this pr. All of these failures could be reproduced by placing Thread.sleep(200) at some places.
@eolivelli @maoling @symat Could you please take a look ?
- @kezhuw Great work. Sorry for our late.Thanks for your contribution.
- Commit to the following branches: master branch-3.8 branch-3.7 , except branch-3.6
@maoling @symat Thank you for merging and reviewing.