Failure in `RackAwarePlacementTest.test_replica_placement`
RackAwarePlacementTest.test_replica_placement.rack_layout_str=ABCDEF.num_partitions=400.replication_factor=5.num_topics=2 (1/19 runs) Build: https://buildkite.com/redpanda/redpanda/builds/10387#26508f1e-8f34-4296-b5db-32a2896321b2
Error:
rptest.tests.rack_aware_replica_placement_test.RackAwarePlacementTest.test_replica_placement.rack_layout_str=ABCDEF.num_partitions=400.replication_factor=5.num_topics=2
--
| status: FAIL
| run time: 1 minute 22.757 seconds
|
|
| TimeoutError('Cluster membership did not stabilize')
| Traceback (most recent call last):
| File "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", line 135, in run
| data = self.run_test()
| File "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
| return self.test_context.function(self.test)
| File "/usr/local/lib/python3.9/dist-packages/ducktape/mark/_mark.py", line 476, in wrapper
| return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
| File "/root/tests/rptest/services/cluster.py", line 35, in wrapped
| r = f(self, *args, **kwargs)
| File "/root/tests/rptest/tests/rack_aware_replica_placement_test.py", line 120, in test_replica_placement
| self.redpanda.start()
| File "/root/tests/rptest/services/redpanda.py", line 618, in start
| wait_until(lambda: {n
| File "/usr/local/lib/python3.9/dist-packages/ducktape/utils/util.py", line 58, in wait_until
| raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from last_exception
| ducktape.errors.TimeoutError: Cluster membership did not stabilize
+1 RackAwarePlacementTest.test_replica_placement.rack_layout_str=ooooFF.num_partitions=400.replication_factor=5.num_topics=2 https://buildkite.com/redpanda/redpanda/builds/10563#0180fc0e-840d-4667-8c61-1d0dc93c45a5
Also seen with a slightly different error: https://ci-artifacts.dev.vectorized.cloud/redpanda/01824a74-e4e4-4dab-8fef-9a54328b65d5/vbuild/ducktape/results/2022-07-29--001/report.html
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 135, in run
data = self.run_test()
File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
return self.test_context.function(self.test)
File "/usr/local/lib/python3.10/dist-packages/ducktape/mark/_mark.py", line 476, in wrapper
return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
File "/root/tests/rptest/services/cluster.py", line 35, in wrapped
r = f(self, *args, **kwargs)
File "/root/tests/rptest/tests/rack_aware_replica_placement_test.py", line 138, in test_replica_placement
self._validate_placement(topic, rack_layout, replication_factor)
File "/root/tests/rptest/tests/rack_aware_replica_placement_test.py", line 64, in _validate_placement
m = self.client().describe_topic(topic.name)
File "/root/tests/rptest/clients/default.py", line 101, in describe_topic
td = self.describe_topics([topic])
File "/root/tests/rptest/clients/default.py", line 94, in describe_topics
client = KafkaAdminClient(
File "/usr/local/lib/python3.10/dist-packages/kafka/admin/client.py", line 218, in __init__
self._refresh_controller_id()
File "/usr/local/lib/python3.10/dist-packages/kafka/admin/client.py", line 278, in _refresh_controller_id
controller_version = self._client.check_version(controller_id, timeout=(self.config['api_version_auto_timeout_ms'] / 1000))
File "/usr/local/lib/python3.10/dist-packages/kafka/client_async.py", line 901, in check_version
self._maybe_connect(try_node)
File "/usr/local/lib/python3.10/dist-packages/kafka/client_async.py", line 372, in _maybe_connect
assert broker, 'Broker id %s not in current metadata' % (node_id,)
AssertionError: Broker id 2 not in current metadata
I see no failures like this in last 30 days on dev.
I think we can switch this one back on https://github.com/redpanda-data/redpanda/pull/7089