TestRegionLabelDenyScheduler is flaky
Flaky Test
Which jobs are failing
TestRegionLabelDenyScheduler
CI link
https://do.pingcap.net/jenkins/blue/organizations/jenkins/tikv%2Fpd%2Fpull_integration_realcluster_test/detail/pull_integration_realcluster_test/130/pipeline
Reason for failure (if possible)
grant_leader scheduler did not grant all regions except one denied region.
Following is the scheduled region id.
grant_leader scheduler does not schedule region(26, 94)
comm -3 <(sort evict_leader.log) <(sort grant_leader.log)
26
94
Anything else
https://do.pingcap.net/jenkins/blue/organizations/jenkins/tikv%2Fpd%2Fpull_integration_realcluster_test/detail/pull_integration_realcluster_test/144/pipeline/ test2.log test.log
comm -3 <(grep "op finish duration less than 10s" test.log | grep -oP '\[region-id=\K\d+' | sort) <(grep "op finish duration less than 10s" test2.log | grep -oP '\[region-id=\K\d+' | sort)
114
28
38
/assign
meet again https://do.pingcap.net/jenkins/blue/organizations/jenkins/tikv%2Fpd%2Fpull_integration_realcluster_test/detail/pull_integration_realcluster_test/250/pipeline
=== RUN TestRegionLabelDenyScheduler
[2024/07/05 16:06:36.782 +08:00] [INFO] [pd_service_discovery.go:1018] ["[pd] switch leader"] [new-leader=http://127.0.0.1:2382] [old-leader=http://127.0.0.1:2384]
testutil.go:56:
Error Trace: /home/jenkins/agent/workspace/tikv/pd/pull_integration_realcluster_test/pd/client/testutil/testutil.go:56
/home/jenkins/agent/workspace/tikv/pd/pull_integration_realcluster_test/pd/tests/integrations/realcluster/scheduler_test.go:105
Error: Condition never satisfied
Test: TestRegionLabelDenyScheduler
meet again https://do.pingcap.net/jenkins/blue/organizations/jenkins/tikv%2Fpd%2Fpull_integration_realcluster_test/detail/pull_integration_realcluster_test/250/pipeline
=== RUN TestRegionLabelDenyScheduler [2024/07/05 16:06:36.782 +08:00] [INFO] [pd_service_discovery.go:1018] ["[pd] switch leader"] [new-leader=http://127.0.0.1:2382] [old-leader=http://127.0.0.1:2384] testutil.go:56: Error Trace: /home/jenkins/agent/workspace/tikv/pd/pull_integration_realcluster_test/pd/client/testutil/testutil.go:56 /home/jenkins/agent/workspace/tikv/pd/pull_integration_realcluster_test/pd/tests/integrations/realcluster/scheduler_test.go:105 Error: Condition never satisfied Test: TestRegionLabelDenyScheduler
This failure was caused by the previous test failure, which I have added in another issue https://github.com/tikv/pd/issues/8348#issuecomment-2219696341. So we can still close this issue, and we will discuss the instability of TestTransferLeader in another issue.
https://do.pingcap.net/jenkins/blue/organizations/jenkins/tikv%2Fpd%2Fpull_integration_realcluster_test/detail/pull_integration_realcluster_test/310/pipeline/
It seems like the 'stream not found' affected the grant-leader process, causing a timeout.
Still grant-leader in progress until timeout.
https://do.pingcap.net/jenkins/blue/organizations/jenkins/tikv%2Fpd%2Fpull_integration_realcluster_test/detail/pull_integration_realcluster_test/310/pipeline/
It seems like the 'stream not found' affected the grant-leader process, causing a timeout.
Still grant-leader in progress until timeout.
![]()
fixed by https://github.com/tikv/pd/pull/8394/commits/5941965e3ffcf694f395671217284e2f2a17730a
meet again https://do.pingcap.net/jenkins/blue/organizations/jenkins/tikv%2Fpd%2Fpull_integration_realcluster_test/detail/pull_integration_realcluster_test/467/
--- PASS: TestReloadLabel (63.86s)
=== RUN TestTransferLeader
--- PASS: TestTransferLeader (3.07s)
=== RUN TestRegionLabelDenyScheduler
testutil.go:56:
Error Trace: /home/jenkins/agent/workspace/tikv/pd/pull_integration_realcluster_test/pd/client/testutil/testutil.go:56
/home/jenkins/agent/workspace/tikv/pd/pull_integration_realcluster_test/pd/tests/integrations/realcluster/scheduler_test.go:178
Error: Condition never satisfied
Test: TestRegionLabelDenyScheduler
Haven't seen this issue in a while, close it