redpanda icon indicating copy to clipboard operation
redpanda copied to clipboard

Failure in `RackAwarePlacementTest.test_replica_placement`

Open ZeDRoman opened this issue 3 years ago • 2 comments

RackAwarePlacementTest.test_replica_placement.rack_layout_str=ABCDEF.num_partitions=400.replication_factor=5.num_topics=2 (1/19 runs) Build: https://buildkite.com/redpanda/redpanda/builds/10387#26508f1e-8f34-4296-b5db-32a2896321b2

Error:

rptest.tests.rack_aware_replica_placement_test.RackAwarePlacementTest.test_replica_placement.rack_layout_str=ABCDEF.num_partitions=400.replication_factor=5.num_topics=2
--
  | status:     FAIL
  | run time:   1 minute 22.757 seconds
  |  
  |  
  | TimeoutError('Cluster membership did not stabilize')
  | Traceback (most recent call last):
  | File "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", line 135, in run
  | data = self.run_test()
  | File "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
  | return self.test_context.function(self.test)
  | File "/usr/local/lib/python3.9/dist-packages/ducktape/mark/_mark.py", line 476, in wrapper
  | return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  | File "/root/tests/rptest/services/cluster.py", line 35, in wrapped
  | r = f(self, *args, **kwargs)
  | File "/root/tests/rptest/tests/rack_aware_replica_placement_test.py", line 120, in test_replica_placement
  | self.redpanda.start()
  | File "/root/tests/rptest/services/redpanda.py", line 618, in start
  | wait_until(lambda: {n
  | File "/usr/local/lib/python3.9/dist-packages/ducktape/utils/util.py", line 58, in wait_until
  | raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from last_exception
  | ducktape.errors.TimeoutError: Cluster membership did not stabilize

ZeDRoman avatar May 23 '22 15:05 ZeDRoman

+1 RackAwarePlacementTest.test_replica_placement.rack_layout_str=ooooFF.num_partitions=400.replication_factor=5.num_topics=2 https://buildkite.com/redpanda/redpanda/builds/10563#0180fc0e-840d-4667-8c61-1d0dc93c45a5

ZeDRoman avatar May 26 '22 09:05 ZeDRoman

Also seen with a slightly different error: https://ci-artifacts.dev.vectorized.cloud/redpanda/01824a74-e4e4-4dab-8fef-9a54328b65d5/vbuild/ducktape/results/2022-07-29--001/report.html

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/usr/local/lib/python3.10/dist-packages/ducktape/mark/_mark.py", line 476, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/root/tests/rptest/services/cluster.py", line 35, in wrapped
    r = f(self, *args, **kwargs)
  File "/root/tests/rptest/tests/rack_aware_replica_placement_test.py", line 138, in test_replica_placement
    self._validate_placement(topic, rack_layout, replication_factor)
  File "/root/tests/rptest/tests/rack_aware_replica_placement_test.py", line 64, in _validate_placement
    m = self.client().describe_topic(topic.name)
  File "/root/tests/rptest/clients/default.py", line 101, in describe_topic
    td = self.describe_topics([topic])
  File "/root/tests/rptest/clients/default.py", line 94, in describe_topics
    client = KafkaAdminClient(
  File "/usr/local/lib/python3.10/dist-packages/kafka/admin/client.py", line 218, in __init__
    self._refresh_controller_id()
  File "/usr/local/lib/python3.10/dist-packages/kafka/admin/client.py", line 278, in _refresh_controller_id
    controller_version = self._client.check_version(controller_id, timeout=(self.config['api_version_auto_timeout_ms'] / 1000))
  File "/usr/local/lib/python3.10/dist-packages/kafka/client_async.py", line 901, in check_version
    self._maybe_connect(try_node)
  File "/usr/local/lib/python3.10/dist-packages/kafka/client_async.py", line 372, in _maybe_connect
    assert broker, 'Broker id %s not in current metadata' % (node_id,)
AssertionError: Broker id 2 not in current metadata

BenPope avatar Aug 01 '22 13:08 BenPope

I see no failures like this in last 30 days on dev.

jcsp avatar Oct 05 '22 12:10 jcsp

I think we can switch this one back on https://github.com/redpanda-data/redpanda/pull/7089

jcsp avatar Nov 04 '22 14:11 jcsp