redpanda icon indicating copy to clipboard operation
redpanda copied to clipboard

cloud_storage: bucket scrub

Open jcsp opened this issue 2 years ago • 2 comments

By design, Redpanda will sometimes leave orphan objects in its object storage bucket. This happens when a node writes a segment, but then unexpectedly loses leadership before it can update the manifest. We do our best to avoid it (https://github.com/redpanda-data/redpanda/pull/8560) but it will happen from time to time.

Like any storage system, to ensure good data hygiene over long storage periods, Redpanda needs a data scrubbing feature. This can be more or less extensive depending on the needs of a given system:

  • The most lightweight scrub consists of reconciling an object listing with the contents of the topic table and of manifests:
    • all segments should either exist in a manifest, or correspond to a manifest spill range (in infinite storage) for a known partition.
    • all segments referenced by a manifest should exist in the object store
  • The most heavyweight scrub requires reading every byte of every object and validating the CRCs on every batch.

The extreme scrubbing is probably only useful on less-trusted object stores (e.g. if someone uses minio with its basic filesystem backend) -- there is less value in scrubbing a more highly trusted backend like AWS S3.

JIRA Link: CORE-1177

jcsp avatar Feb 23 '23 16:02 jcsp

There's a functional draft of updating the scrubber to clean up orphan segments here: https://github.com/redpanda-data/redpanda/tree/orphan-cleanup

jcsp avatar Jul 17 '23 14:07 jcsp

We should ensure this can be disabled, for customers that prefer to have their buckets immutable.

pmw-rp avatar May 21 '24 09:05 pmw-rp