pinot icon indicating copy to clipboard operation
pinot copied to clipboard

Set skipCrcCheckOnLoad in table config

Open cypherean opened this issue 2 months ago • 2 comments

Fix for #17233

This PR adds skipCrcCheckOnLoad as table config as an alternate to server level config to allow for fine grained control on disabling CRC check for tables.

New config added: skipCrcCheckOnLoad under segmentsConfig in table config

Test Plan

Tested with Pinot batch quickstart:

  1. With skipCrcCheckOnLoad config enabled in airlineStats table:
Handling message: ZnRecord=cb7bee95-dde7-41c9-a0ad-53e4a26e6381, {CREATE_TIMESTAMP=1763551164828, EXECUTE_START_TIMESTAMP=1763551164866, MSG_ID=cb7bee95-dde7-41c9-a0ad-53e4a26e6381, MSG_STATE=new, MSG_SUBTYPE=REFRESH_SEGMENT, MSG_TYPE=USER_DEFINE_MSG, PARTITION_NAME=airlineStats_OFFLINE_16074_16074_0, RESOURCE_NAME=airlineStats_OFFLINE, RETRY_COUNT=0, SRC_CLUSTER=QuickStartCluster, SRC_INSTANCE_TYPE=PARTICIPANT, SRC_NAME=Controller_127.0.0.1_9002, TGT_NAME=Server_127.0.0.1_7050, TGT_SESSION_ID=1005c2cb4d30018, TIMEOUT=-1, segmentName=airlineStats_OFFLINE_16074_16074_0, tableName=airlineStats_OFFLINE}{}{}, Stat=Stat {_version=0, _creationTime=1763551164837, _modifiedTime=1763551164837, _ephemeralOwner=0}
Replacing segment: airlineStats_OFFLINE_16074_16074_0 in table: airlineStats_OFFLINE
Replacing segment: airlineStats_OFFLINE_16074_16074_0
END: GenericClusterController.onMessage() for cluster QuickStartCluster
144 END:INVOKE CallbackHandler 26, /QuickStartCluster/INSTANCES/Broker_127.0.0.1_8000/MESSAGES listener: org.apache.helix.controller.GenericHelixController@10bdfbcc type: CALLBACK Took: 1ms
The latency of message e3a32ea1-8025-40c7-ac3f-7bfb9f2bf8a7 is 38 ms
END: InstanceMessagesCache.refresh(), 2 of Messages read from ZooKeeper. took 1 ms.
Start to refresh stale message cache
END: InstanceMessagesCache.refresh(), 2 of Messages read from ZooKeeper. took 1 ms.
Start to refresh stale message cache
Waiting for lock to reload/refresh: airlineStats_OFFLINE_16074_16074_0, queue-length: 0
Acquired lock to reload/refresh segment: airlineStats_OFFLINE_16074_16074_0 (lock-time=0ms, queue-length=0)
Skipping replacing segment: airlineStats_OFFLINE_16074_16074_0 even though its CRC has changed from: 2686808396 to: 2686808519 because skipCrcCheckOnLoad is enabled
Message cb7bee95-dde7-41c9-a0ad-53e4a26e6381 completed.
Scheduling message e3a32ea1-8025-40c7-ac3f-7bfb9f2bf8a7: brokerResource:airlineStats_OFFLINE, null->null
Submit task: e3a32ea1-8025-40c7-ac3f-7bfb9f2bf8a7 to pool: java.util.concurrent.ThreadPoolExecutor@6350aec6[Running, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
Message: e3a32ea1-8025-40c7-ac3f-7bfb9f2bf8a7 handling task scheduled
377 END:INVOKE CallbackHandler 28, /QuickStartCluster/INSTANCES/Broker_127.0.0.1_8000/MESSAGES listener: org.apache.helix.messaging.handling.HelixTaskExecutor@702c26cd type: CALLBACK Took: 7ms
handling task: e3a32ea1-8025-40c7-ac3f-7bfb9f2bf8a7 begin, at: 1763551164872
Refreshing segment: airlineStats_OFFLINE_16074_16074_0 for table: airlineStats_OFFLINE
Refreshed segment: airlineStats_OFFLINE_16074_16074_0 for table: airlineStats_OFFLINE
Message e3a32ea1-8025-40c7-ac3f-7bfb9f2bf8a7 completed.
Handled request from 127.0.0.1 POST http://localhost:9002/v2/segments?tableName=airlineStats&tableType=OFFLINE, content-type multipart/form-data; charset=ISO-8859-1; boundary=httpclient_boundary_8c757373-6c11-4c17-a0c3-2d3fad879992 status code 200 OK
Delete message cb7bee95-dde7-41c9-a0ad-53e4a26e6381 from zk!
message finished: cb7bee95-dde7-41c9-a0ad-53e4a26e6381, took 11
Message: cb7bee95-dde7-41c9-a0ad-53e4a26e6381 (parent: null) handling task for airlineStats_OFFLINE:airlineStats_OFFLINE_16074_16074_0 completed at: 1763551164877, results: true. FrameworkTime: 11 ms; HandlerTime: 1 ms.
Delete message e3a32ea1-8025-40c7-ac3f-7bfb9f2bf8a7 from zk!
message finished: e3a32ea1-8025-40c7-ac3f-7bfb9f2bf8a7, took 10
  1. With skipCrcCheckOnLoad disabled on billing table by default:
Handling message: ZnRecord=c5bb3f12-3937-4761-99ed-b7833f02789b, {CREATE_TIMESTAMP=1763552941563, EXECUTE_START_TIMESTAMP=1763552941592, MSG_ID=c5bb3f12-3937-4761-99ed-b7833f02789b, MSG_STATE=new, MSG_SUBTYPE=REFRESH_SEGMENT, MSG_TYPE=USER_DEFINE_MSG, PARTITION_NAME=billing_OFFLINE_0, RESOURCE_NAME=billing_OFFLINE, RETRY_COUNT=0, SRC_CLUSTER=QuickStartCluster, SRC_INSTANCE_TYPE=PARTICIPANT, SRC_NAME=Controller_127.0.0.1_9002, TGT_NAME=Server_127.0.0.1_7050, TGT_SESSION_ID=1005c43051f0018, TIMEOUT=-1, segmentName=billing_OFFLINE_0, tableName=billing_OFFLINE}{}{}, Stat=Stat {_version=0, _creationTime=1763552941572, _modifiedTime=1763552941572, _ephemeralOwner=0}
CallbackHandler26, Subscribing to path: /QuickStartCluster/INSTANCES/Broker_127.0.0.1_8000/MESSAGES took: 0
START: GenericClusterController.onMessage() for cluster QuickStartCluster
Replacing segment: billing_OFFLINE_0 in table: billing_OFFLINE
Replacing segment: billing_OFFLINE_0
CallbackHandler28, Subscribing to path: /QuickStartCluster/INSTANCES/Broker_127.0.0.1_8000/MESSAGES took: 0
END: GenericClusterController.onMessage() for cluster QuickStartCluster
144 END:INVOKE CallbackHandler 26, /QuickStartCluster/INSTANCES/Broker_127.0.0.1_8000/MESSAGES listener: org.apache.helix.controller.GenericHelixController@382d549a type: CALLBACK Took: 0ms
START: InstanceMessagesCache.refresh()
The latency of message dd9dfcb1-3f9c-4c3e-b009-9ba5f24ef8bb is 30 ms
END: InstanceMessagesCache.refresh(), 2 of Messages read from ZooKeeper. took 1 ms.
Start to refresh stale message cache
Waiting for lock to reload/refresh: billing_OFFLINE_0, queue-length: 0
Acquired lock to reload/refresh segment: billing_OFFLINE_0 (lock-time=0ms, queue-length=0)
Replacing segment: billing_OFFLINE_0 because its CRC has changed from: 2262953635 to: 2262953636
Downloading and loading segment: billing_OFFLINE_0
Downloading segment: billing_OFFLINE_0 from: http://127.0.0.1:9002/segments/billing/billing_OFFLINE_0
Acquiring instance level segment download semaphore for segment: billing_OFFLINE_0, queue-length: 0
Acquired instance level segment download semaphore for segment: billing_OFFLINE_0 (lock-time=0ms, queue-length=0).

Also tested with realtime tables with lucene index and upsert compaction on our test clusters.

cypherean avatar Nov 20 '25 17:11 cypherean