hive icon indicating copy to clipboard operation
hive copied to clipboard

HIVE-28441: NPE in ORC tables when hive.orc.splits.include.file.footer is enabled

Open Aggarwal-Raghav opened this issue 1 year ago • 5 comments

What changes were proposed in this pull request?

Check HIVE-28441 for steps to reproduce this issue and stacktrace

Why are the changes needed?

NullPointerException is thrown when hive.orc.splits.include.file.footer is enabled in ORC tables

Does this PR introduce any user-facing change?

NO

Is the change a dependency upgrade?

NO

How was this patch tested?

Using a q file present in the commits.

mvn clean test -Dtest=TestMiniLlapLocalCliDriver -Dqfile=orc_footer_enabled.q -pl itests/qtest -Pitests -Dtest.output.overwrite=true
mvn clean test -Dtest=TestMiniTezCliDriver -Dqfile=orc_footer_enabled.q -pl itests/qtest -Pitests -Dtest.output.overwrite=true

Aggarwal-Raghav avatar Aug 08 '24 05:08 Aggarwal-Raghav

As per my understanding:

  1. One of the benefit of enabling hive.orc.splits.include.file.footer is to reduce fs calls as explained in HIVE-15038. In ORC code, extractFileTail https://github.com/apache/orc/blob/7878691befc66ecc372ff41715cbdff97ec7aafd/java/core/src/java/org/apache/orc/impl/ReaderImpl.java#L569 make a fs call for creating OrcTail but with the config enabled, it was optimized and we were creating OrcTail object in OrcSplit.java https://github.com/apache/hive/blob/d0d5d6d7d11b3eece0d0bc17b429cb30dec5dc79/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java#L230

  2. In HIVE-15665 with hive.orc.splits.include.file.footer enabled, it requires the OrcTail to have serializedTail present (passing null or empty BufferChunk won't help as it will throw NPE) https://github.com/apache/hive/blob/d0d5d6d7d11b3eece0d0bc17b429cb30dec5dc79/llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/OrcEncodedDataReader.java#L669

  3. Possible fix is while creating OrcTail in OrcSplit.java, we "somehow" get the serializedTail without making additional fs call or we need to revert HIVE-15038, doing so will force the orcReader in OrcEncodedDataReader.java to get perform extractFileTail which will have the serializedTail.

  4. I have gone with reverting the HIVE-15038. Looking forward for suggestions on this.

Aggarwal-Raghav avatar Aug 12 '24 14:08 Aggarwal-Raghav

@pgaref, can you please provide your insights on this?

Aggarwal-Raghav avatar Aug 12 '24 14:08 Aggarwal-Raghav

@Aggarwal-Raghav, is there still some benefit of hive.orc.splits.include.file.footer without HIVE-15038?

deniskuzZ avatar Aug 24 '24 10:08 deniskuzZ

@deniskuzZ, thanks for looking into this. I think in Tez on Yarn, we can still prevent an additional fs call with this config.

Aggarwal-Raghav avatar Aug 24 '24 14:08 Aggarwal-Raghav

@zhangbutao / @deniskuzZ , can you please suggest the next step that can help here?

Aggarwal-Raghav avatar Oct 21 '24 11:10 Aggarwal-Raghav

@Aggarwal-Raghav I did some codes debug. Found that the Tez Application Master has already initialize the OrcTail when creating orc splits. I want to know if we can pass the OrcTail from Tez AM to Tez Task? If ok, we can solve this issue by the way. Here i provided some related codes, maybe we can try to do some code debug to explore a better way to fix the issue?

  • Tez AM related codes: create orc split with the orctail

https://github.com/apache/hive/blob/13dfae1c0a7d4540f4bc5edc50bc922f0cfc83e8/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java#L1672-L1673

https://github.com/apache/hive/blob/13dfae1c0a7d4540f4bc5edc50bc922f0cfc83e8/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java#L1497-L1498

  • Tez Task related code: get orc split(But it can not get the orctail now, we can think about how to get it here?)

https://github.com/apache/hive/blob/13dfae1c0a7d4540f4bc5edc50bc922f0cfc83e8/ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java#L216-L223

https://github.com/apache/hive/blob/13dfae1c0a7d4540f4bc5edc50bc922f0cfc83e8/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java#L204-L205

zhangbutao avatar Oct 25 '24 07:10 zhangbutao

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the [email protected] list if the patch is in need of reviews.

github-actions[bot] avatar Dec 25 '24 00:12 github-actions[bot]