zookeeper icon indicating copy to clipboard operation
zookeeper copied to clipboard

ZOOKEEPER-4327: Fix flaky RequestThrottlerTest

Open kezhuw opened this issue 3 years ago • 3 comments

This PR tries to fix several test failures in RequestThrottlerTest.

First, RequestThrottlerTest#testDropStaleRequests.

Place Thread.sleep(200) after submittedRequests.take() in RequestThrottler#run will fail two assertions:

  1. assertEquals(2L, (long) metrics.get("prep_processor_request_queued"))
  2. assertEquals(1L, (long) metrics.get("request_throttle_wait_count"))

This happens due to setStale chould happen before throttle handling.

This commit solves this by introducing an interception point RequestThrottler.throttleSleep to build happen-before relations:

  1. throttling.countDown happens before setStale, this ensures that unthrottled request are processed as usual.
  2. setStale happens before throttled.await, this defends RequestThrottler.throttleSleep against spurious wakeup.

Second, RequestThrottlerTest#testRequestThrottler.

  • RequestThrottlerTest.testRequestThrottler:197 expected: <2> but was: <1>

    ZooKeeperServer#submitRequest and PrepRequestProcessor#processRequest run in different threads, thus there is no guarantee on metric prep_processor_request_queued after submitted.await(5, TimeUnit.SECONDS). Place Thread.sleep(200) before zks.submitRequestNow(request) in RequestThrottler#run will incur this failure.

  • RequestThrottlerTest.testRequestThrottler:206 expected: <5> but was: <4>

    entered.await(STALL_TIME, TimeUnit.MILLISECONDS) could return false due to almost same timeout as RequestThrottler#throttleSleep. Place Thread.sleep(500) around throttleSleep will increase failure possibility.

Third, RequestThrottlerTest#testGlobalOutstandingRequestThrottlingWithRequestThrottlerDisabled.

  • RequestThrottlerTest.testGlobalOutstandingRequestThrottlingWithRequestThrottlerDisabled:340 expected: <3> but was: <4>

    ZooKeeperServer#shouldThrottle depends on consistent sum of getInflight and getInProcess. But it is no true. Place Thread.sleep(200) before zks.submitRequestNow(request) in RequestThrottler#run could reproduce this.

Sees also https://github.com/apache/zookeeper/pull/1739, https://github.com/apache/zookeeper/pull/1821.

kezhuw avatar May 24 '22 13:05 kezhuw

This fixes failed ci after #1821 merged(https://github.com/apache/zookeeper/commit/c05ca0ee0deedd50d0ef32e3e01ad7f691d27009) https://github.com/apache/zookeeper/runs/6568872986?check_suite_focus=true

P.S. Place Thread.sleep(200) after submittedRequests.take() in RequestThrottler#run will fail these assertions also before #1821, but for different reason due to changes in #1821.

kezhuw avatar May 24 '22 14:05 kezhuw

@maoling @eolivelli @symat PTAL

kezhuw avatar May 24 '22 14:05 kezhuw

I overlooked test failures unrelated to testLargeRequestThrottling in #1821. I try to fix all remaining test failures in RequestThrottlerTest in this pr. All of these failures could be reproduced by placing Thread.sleep(200) at some places.

@eolivelli @maoling @symat Could you please take a look ?

kezhuw avatar Jun 28 '22 12:06 kezhuw

maoling avatar Sep 25 '22 08:09 maoling

@maoling @symat Thank you for merging and reviewing.

kezhuw avatar Oct 10 '22 03:10 kezhuw