iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

ORCFileAppender should fetch the stripe offsets from the Writer instead of opening the written file

Open pavibhai opened this issue 3 years ago • 0 comments

Feature Request / Improvement

Currently the ORCFileAppender opens a written ORC file just to derive the Stripe offsets

  @Override
  public List<Long> splitOffsets() {
    Preconditions.checkState(isClosed, "File is not yet closed");
    try (Reader reader = ORC.newFileReader(file.toInputFile(), conf)) {
      List<StripeInformation> stripes = reader.getStripes();
      return Collections.unmodifiableList(Lists.transform(stripes, StripeInformation::getOffset));
    } catch (IOException e) {
      throw new RuntimeIOException(e, "Can't close ORC reader %s", file.location());
    }
  }

Starting with ORC 1.7 we have added a public API to retrieve offset information from the writer. This when called after close gives the complete stripe information that is written out to the file. With this we can avoid opening the written file.

Query engine

No response

pavibhai avatar Sep 16 '22 18:09 pavibhai