hive icon indicating copy to clipboard operation
hive copied to clipboard

HIVE-29012: NPE in ExplainTask when protologging posthook is enabled

Open Aggarwal-Raghav opened this issue 7 months ago • 3 comments

What changes were proposed in this pull request?

Check HIVE-29012 for the stacktrace.

Why are the changes needed?

When protologging hook is enabled, it stores the query plan for a query as well. The conf object in ExplainTask is null for the "Hive Hook Proto Log Writer" thread.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Will see CI output + local single node setup/cluster testing

Aggarwal-Raghav avatar Jun 14 '25 18:06 Aggarwal-Raghav

Based on the debugging, the Task#initialize() method creates conf object in HiveServer2-Handler-Pool thread and when the post hook for protologging is triggered, it created a new thread (Hive Hook Proto Log Writer) which get the query plan and at that time, ExplainTask#outputPlan() has conf as null which causes NPE.

https://github.com/apache/hive/blob/af59f241a07fb9960412529be71c6bfa1fcd042e/ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java#L1006

Aggarwal-Raghav avatar Jun 14 '25 18:06 Aggarwal-Raghav

@zabetak, can you please help with the review as the changes are around HIVE-22173's code.

Aggarwal-Raghav avatar Jun 14 '25 18:06 Aggarwal-Raghav

Just info: HIVE-27240, was very similar to this issue based on stacktrace, which shows the need for UT :-)

Aggarwal-Raghav avatar Jun 23 '25 14:06 Aggarwal-Raghav

Hi @zabetak, apologies for late reply, just wanted to let you know that I'm working on this PR and found some new issues like HIVE-29053 along the way when coming up with UT.

Writing UT for this PR is not straightforward and need your opinion on this. There are 2 things to note:

  1. In case of NPE, its just log the error and continue as intended, which means using TestHiveHookEventProtoPartialBuilder or TestHiveProtoLoggingHook won't cut it IMO (tried with mockito as well, didn't work).
  2. Writes/flush of proto data are not consistent based on ScheduledThreadPoolExecutor https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java#L277, one way to force flush is to stop the HS2.

What I/m thinking is, writing a itest/hive-unit integration test as below code. Because of [2], I'm not able to make qfile work as the proto data is written post test completition.

CREATE TABLE tbl1 (id INT, name STRING);
INSERT INTO tbl1 VALUES (1, 'Alice'), (2, 'Bob');
SELECT * FROM tbl1;

msck repair table sys.proto_hive_query_data;

select count(*) from sys.proto_hive_query_data where lower(trim(regexp_replace(get_json_object(otherinfo[1].value, '$.queryText'), '\n|\\\s+', ' '))) = "select * from tbl1";

Because of NPE the QUERY Object will not be part of proto data so without fix the output should be 0, with fix it should be 1.

Aggarwal-Raghav avatar Jul 01 '25 14:07 Aggarwal-Raghav

The end to end approach (2.) seems a bit of an overkill. Even if the NPE is caught and logged you still have some missing entries (e.g., queryPlan) in the otherInfo map. You could probably base your test on constructing a simple ExplainWork object and reason about missing entries in the info map. Would that work as a unit test?

zabetak avatar Jul 04 '25 13:07 zabetak

The end to end approach (2.) seems a bit of an overkill. Even if the NPE is caught and logged you still have some missing entries (e.g., queryPlan) in the otherInfo map. You could probably base your test on constructing a simple ExplainWork object and reason about missing entries in the info map. Would that work as a unit test?

Thanks for the suggestions/help @zabetak. I have come up with a test case highlighting the problem observed in local cluster. Can you please review and confirm if its up to the standards?

Aggarwal-Raghav avatar Jul 05 '25 20:07 Aggarwal-Raghav

Test failure org.apache.hadoop.hive.metastore.security.TestZookeeperTokenStoreSSLEnabled, Seems irrelevant to change.

Aggarwal-Raghav avatar Jul 07 '25 08:07 Aggarwal-Raghav

@zabetak , I have addressed the review comments. can you please take a look?

Aggarwal-Raghav avatar Jul 11 '25 02:07 Aggarwal-Raghav

Thanks for the PR @Aggarwal-Raghav !

zabetak avatar Jul 13 '25 20:07 zabetak