CI Failure (Attempted to perform operation: 'offset_data_stream()' on a closed segment) in `ShadowIndexingWhileBusyTest.test_create_or_delete_topics_while_busy`
https://buildkite.com/redpanda/redpanda/builds/44921#018d8b3e-c497-4199-963c-c72f2a2d127b
Module: rptest.tests.e2e_shadow_indexing_test
Class: ShadowIndexingWhileBusyTest
Method: test_create_or_delete_topics_while_busy
Arguments:
{
"cloud_storage_type": 1,
"short_retention": false
}
====================================================================================================
test_id: rptest.tests.e2e_shadow_indexing_test.ShadowIndexingWhileBusyTest.test_create_or_delete_topics_while_busy.short_retention=False.cloud_storage_type=CloudStorageType.S3
status: FAIL
run time: 2 minutes 44.294 seconds
<BadLogLines nodes=docker-rp-7(1) example="ERROR 2024-02-09 00:40:06,124 [shard 0:fetc] kafka - fetch.cc:1171 - unknown exception thrown: std::runtime_error (Attempted to perform operation: 'offset_data_stream()' on a closed segment: {offset_tracker:{term:1, base_offset:1919, committed_offset:2002, dirty_offset:2002}, compacted_segment=0, finished_self_compaction=0, finished_windowed_compaction=0, generation={59}, reader={/var/lib/redpanda/data/kafka/topic-hnegwjyeyf/8_52/1919-1-v1.log, (21242491 bytes)}, writer=nullptr, cache={cache_size=7}, compaction_index:nullopt, closed=1, tombstone=1, index={file:/var/lib/redpanda/data/kafka/topic-hnegwjyeyf/8_52/1919-1-v1.base_index, offsets:{1919}, index:{header_bitflags:0, base_offset:{1919}, max_offset:{2002}, base_timestamp:{timestamp: 1707439192434}, max_timestamp:{timestamp: 1707439193269}, batch_timestamps_are_monotonic:0, with_offset:true, non_data_timestamps:0, broker_timestamp:{{timestamp: 1707439194397}}, num_compactible_records_appended:{81}, index(27,27,27)}, step:32768, needs_persistence:0}})">
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 184, in _do_run
data = self.run_test()
File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 269, in run_test
return self.test_context.function(self.test)
File "/usr/local/lib/python3.10/dist-packages/ducktape/mark/_mark.py", line 481, in wrapper
return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
File "/root/tests/rptest/services/cluster.py", line 173, in wrapped
redpanda.raise_on_bad_logs(
File "/root/tests/rptest/services/redpanda.py", line 1412, in raise_on_bad_logs
raise BadLogLines(bad_lines)
rptest.services.utils.BadLogLines: <BadLogLines nodes=docker-rp-7(1) example="ERROR 2024-02-09 00:40:06,124 [shard 0:fetc] kafka - fetch.cc:1171 - unknown exception thrown: std::runtime_error (Attempted to perform operation: 'offset_data_stream()' on a closed segment: {offset_tracker:{term:1, base_offset:1919, committed_offset:2002, dirty_offset:2002}, compacted_segment=0, finished_self_compaction=0, finished_windowed_compaction=0, generation={59}, reader={/var/lib/redpanda/data/kafka/topic-hnegwjyeyf/8_52/1919-1-v1.log, (21242491 bytes)}, writer=nullptr, cache={cache_size=7}, compaction_index:nullopt, closed=1, tombstone=1, index={file:/var/lib/redpanda/data/kafka/topic-hnegwjyeyf/8_52/1919-1-v1.base_index, offsets:{1919}, index:{header_bitflags:0, base_offset:{1919}, max_offset:{2002}, base_timestamp:{timestamp: 1707439192434}, max_timestamp:{timestamp: 1707439193269}, batch_timestamps_are_monotonic:0, with_offset:true, non_data_timestamps:0, broker_timestamp:{{timestamp: 1707439194397}}, num_compactible_records_appended:{81}, index(27,27,27)}, step:32768, needs_persistence:0}})">
https://buildkite.com/redpanda/vtools/builds/11755
*https://buildkite.com/redpanda/vtools/builds/11875 *https://buildkite.com/redpanda/vtools/builds/11880
*https://buildkite.com/redpanda/vtools/builds/11886 *https://buildkite.com/redpanda/vtools/builds/11891
*https://buildkite.com/redpanda/vtools/builds/11901
*https://buildkite.com/redpanda/vtools/builds/11905 *https://buildkite.com/redpanda/redpanda/builds/45108
The offset_data_stream is opened when we open a log reader. The error is thrown because it looks like the segment is closed. Coudl be anything from the topic/partitoin being deleted, to being moved by the balancer, to being shutdown, to the segment being removed because of retention...
Overall this looks like a race condition and the log reader being created needs to synchronize with operations that affect the segment.
in the log we can see the segment being closed
TRACE 2024-02-09 00:40:06,123 [shard 1:main] storage - segment.cc:90 - closing segment: {offset_tracker:{term:1, base_offset:1919, committed_offset:2002, dirty_offset:2002}, compacted_segment=0, finished_self_compaction=0, finished_windowed_compaction=0, generation={59}, reader={/var/lib/redpanda/data/kafka/topic-hnegwjyeyf/8_52/1919-1-v1.log, (21242491 bytes)}, writer=nullptr, cache={cache_size=7}, compaction_index:nullopt, closed=1, tombstone=1, index={file:/var/lib/redpanda/data/kafka/topic-hnegwjyeyf/8_52/1919-1-v1.base_index, offsets:{1919}, index:{header_bitflags:0, base_offset:{1919}, max_offset:{2002}, base_timestamp:{timestamp: 1707439192434}, max_timestamp:{timestamp: 1707439193269}, batch_timestamps_are_monotonic:0, with_offset:true, non_data_timestamps:0, broker_timestamp:{{timestamp: 1707439194397}}, num_compactible_records_appended:{81}, index(27,27,27)}, step:32768, needs_persistence:0}}
But then after this we see that readers to the same segment are being created, leading to the problem.
For what it is worth, it also looks like this might be a remote partition.
I have a hunch that this may also be related to https://github.com/redpanda-data/redpanda/issues/16582 which also shows an exception from storage leaking up to fetch.cc.
*https://buildkite.com/redpanda/redpanda/builds/45279 *https://buildkite.com/redpanda/vtools/builds/11971
*https://buildkite.com/redpanda/vtools/builds/11978
Given this stopped happening around the time #16582 got fixed and Noah's comment above, going to close as presumed fixed.