pinot
pinot copied to clipboard
Batch Ingestion from Delta Table
Currently, Pinot cannot perform batch ingestion from a delta table. It would be an excellent feature since the project is open-sourced and has many users.
We can build a new RecordReader interface for delta tables utilizing the delta-standalone library.
Yes, this would be a great feature to add. cc: @xiangfu0
I just read a bit about the delta lib. A simple flow may look like below, where we can open the delta table with the lib and loop through all the records. The lib also supports data filtering, and that can be some advanced options for data ingestion.
import io.delta.standalone.data.RowRecord;
import io.delta.standalone.Snapshot;
DeltaLog log = DeltaLog.forTable(new Configuration(), "/data/sales");
CloseableIterator<RowRecord> dataIter = log.update().open();
try {
while (dataIter.hasNext()) {
// We get a delta record here, and can convert to pinot GenericRow as far as I can tell
RowRecord row = dataIter.next();
int year = row.getInt("year");
String customer = row.getString("customer");
float totalCost = row.getFloat("total_cost");
}
} finally {
dataIter.close();
}