Stuart Millholland

Results 5 comments of Stuart Millholland

@kishoreg this is a similar issues to what we are experiencing during our RealtimeToOfflineSegmentsTask. It's taking almost 8 hours to process 175m rows of data. Initialized mapper with 56 record...

Here is the relevant portion of the configuration: "taskTypeConfigsMap": { "RealtimeToOfflineSegmentsTask": { "bucketTimePeriod": "1d", "bufferTimePeriod": "1d", "roundBucketTimePeriod": "1d", "mergeType": "rollup", "user_raw_risk_score.aggregationType": "max", "user_risk_score.aggregationType": "max", "user_threat_score.aggregationType": "max", "maxNumRecordsPerSegment": "5000000", "schedule": "$REALTIME_TO_OFFLINE_SEGMENT_TASK_SCHEDULE"...

@Jackie-Jiang , I used this as my guide: https://docs.pinot.apache.org/operators/operating-pinot/pinot-managed-offline-flows

@Jackie-Jiang the process finished in 9.5 hours last night and produced 18 new offline rolled up segments.

I think we are ok for now @Jackie-Jiang, with our production level ingestion we only need to run the job 1x every 24 hours and it's running in ~10 hours....