pinot icon indicating copy to clipboard operation
pinot copied to clipboard

Observed issues with check.crc.on.segment.load flag

Open cypherean opened this issue 2 months ago • 4 comments

The recently added check.crc.on.segment.load flag (#16432) has been helpful for handling realtime segments with mismatched CRCs despite identical data. However, during testing we observed a couple of side effects when this flag is enabled at the server level:

Offline table backfill: In our setup (Uber), offline segment names are date based. When backfilling for an older date, the uploaded segment has the same name but a different CRC. With check.crc.on.segment.load disabled, the controller updates to the new CRC, but the server does not load the updated segment.

// Steps to replicate:
// 1. Start quickstart batch with config set to false
// 2. update the crc of a completed segment locally
// 3. use the upload segment script to re-upload the existing segment

Handling message: ZnRecord=57211ff0-eb7d-4227-86dd-1d634a37a961, {CREATE_TIMESTAMP=1763398232543, EXECUTE_START_TIM ESTAMP=1763398232574, MSG_ID=57211ff0-eb7d-4227-86dd-1d634a37a961, MSG_STATE=new, MSG_SUBTYPE=REFRESH_SEGMENT, MSG_ TYPE=USER_DEFINE_MSG, PARTITION_NAME=airlineStats_OFFLINE_16071_16071_0, RESOURCE_NAME=airlineStats_OFFLINE, RETRY_ COUNT=0, SRC_CLUSTER=QuickStartCluster, SRC_INSTANCE_TYPE=PARTICIPANT, SRC_NAME=Controller_127.0.0.1_9002, TGT_NAME =Server_127.0.0.1_7050, TGT_SESSION_ID=10056a43eb20018, TIMEOUT=-1, segmentName=airlineStats_OFFLINE_16071_16071_0, tableName=airlineStats_OFFLINE}H}, Stat=Stat {_version=0, _creationTime=1763398232552,_modifiedTime=17633982325
52, _ephemeralOwner=0}
Replacing segment: airlineStats_OFFLINE_16071_16071_0 in table: airlineStats_OFFLINE
Replacing segment: airlineStats_OFFLINE_16071_16071_0
.....
Waiting for lock to reload/refresh: airlineStats_OFFLINE_16071_16071_0, queue-length: 0
Acquired lock to reload/refresh segment: airlineStats_OFFLINE_16071_16071_0 (lock-time=0ms, queue-length=0)
Skipping replacing segment: airlineStats_OFFLINE_16071_16071_0 even though its CRC has changed from: 912377941 to:
912377942 because instance.check.crc.on.segment.load is set to false
Message 57211ff0-eb7d-4227-86dd-1d634a37a961 completed.

Upsert compaction: When upsert compaction is enabled, Pinot attempts to commit a compacted segment that retains the same name but has a different CRC. This segment is also ignored by the server when the check.crc.on.segment.load flag is disabled.

// With upsert compaction enabled on testing table
Skipping replacing segment: redacted_tablename__14__458__20251118T1630Z even though its CRC has changed from: 3709360175 to: 533870250 because instance.check.crc.on.segment.load is set to false

These behaviors suggest that skipping CRC check may inadvertently block valid segment reloads in scenarios where segment names remain constant but CRCs legitimately change (backfill, compaction).

cypherean avatar Nov 18 '25 18:11 cypherean