HIVE-28191: Upgrade Hadoop Version to 3.4.0
What changes were proposed in this pull request?
Why are the changes needed?
Does this PR introduce any user-facing change?
Is the change a dependency upgrade?
How was this patch tested?
@zhangbutao LGTM +1.
@zhangbutao, thanks for driving this forward.
For tez project:
- We need to exclude logback jar from hadoop transitive dependency in tez project or move to hadoop 3.4.1, Otherwise it can cause classloading issues. IIRC, I had faced issue because logback jar was getting picked first and hive-log4j2.properties not getting honoured. If possible please go through the following: a. https://github.com/apache/hadoop/pull/6582#issue-2151551682 b. HADOOP-19153: hadoop-common exports logback as a dependency (This fix is not in hadoop 3.4.0)
- zookeeper version, I would prefer to keep it in sync: a. hive => 3.8.4 b. hadoop3.4.0 => 3.8.3 c. hadoop3.4.1 => 3.8.4
For hive:
- Please fill out the PR template and attach the dependecy tree.
- I think we can upgrade guava version also and keep it in sync with hadoop, there was some discussion to keep it in sync with the version used in hadoop. Please check #4271
@Aggarwal-Raghav Thanks for your insightful thought! Will check this later.
Quality Gate passed
Issues
1 New issue
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code
should consider 3.4.1 now :-)
Quality Gate passed
Issues
1 New issue
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code
@zhangbutao Is this PR going to go in , does this affect the upgrade of guava in hive?
@zhangbutao Is this PR going to go in , does this affect the upgrade of guava in hive?
@devaspatikrishnatri I will check this next week. :)
@zhangbutao, please check by updating the guava version to 27.0-jre (used in Hadoop 3.4.1) instead of 32.0.1-jre.
please check by updating the guava version to 27.0-jre (used in Hadoop 3.4.1) instead of 32.0.1-jre.
@Aggarwal-Raghav Sure, Let me try.
Interestingly, faild tests show that guava-27.0-jre would change some qtests(sql explain).
I haven't figured out why the some qtests changed after upgrading guava. But these changes https://github.com/apache/hive/pull/5500/commits/78357d29a269049531d218395ab7af3b3700ab2c are just the names of the columns in the explain, so I think the guava upgrade is acceptable.
I am not sure we should chase Guava upgrade as part of Hadoop upgrade. We can track that separately I believe.
btw. Hadoop doesn't use guava version specified in its POM, that is kept only for its transitive dependency. It uses the Guava coming from hadoop-thirdparty: HADOOP-17288 and that is 30+ as of today, should be 30+ for 3.4.1 as well If I am not mistaken
https://github.com/apache/hadoop-thirdparty/blob/trunk/pom.xml#L101
"Hadoop doesn't use guava version specified in its POM, that is kept only for its transitive dependency."
Oh, I was not aware of this. Then maybe we can track it in separate ticket.
Just info: In our codebase, we have guava version 32.0.1-jre in tez (0.10.3), hadoop(3.3.6) and hive(4.0.0) and I didn't observed any UT failures there. Something to investigate on my end.
guava 32.0.1-jre would cause lots of qtests failure, including some class not found exeception.
guava 27.0-jre would casue some minor explain qtests changes. https://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-5500/9/pipeline/
So, Maybe we can upgrade to this 27.0 version first and then consider upgrading to 32.0 version.
In short, it makes more sense to study the guava version carefully in subsequent ticket before upgrading.
I will revert the guava upgrade in this PR. @Aggarwal-Raghav @ayushtkn
Quality Gate passed
Issues
1 New issue
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code