iceberg
iceberg copied to clipboard
ORCFileAppender should fetch the stripe offsets from the Writer instead of opening the written file
Feature Request / Improvement
Currently the ORCFileAppender opens a written ORC file just to derive the Stripe offsets
@Override
public List<Long> splitOffsets() {
Preconditions.checkState(isClosed, "File is not yet closed");
try (Reader reader = ORC.newFileReader(file.toInputFile(), conf)) {
List<StripeInformation> stripes = reader.getStripes();
return Collections.unmodifiableList(Lists.transform(stripes, StripeInformation::getOffset));
} catch (IOException e) {
throw new RuntimeIOException(e, "Can't close ORC reader %s", file.location());
}
}
Starting with ORC 1.7 we have added a public API to retrieve offset information from the writer. This when called after close gives the complete stripe information that is written out to the file. With this we can avoid opening the written file.
Query engine
No response