HIVE-26473: Hive JDK-17 support for compile and runtime
What changes were proposed in this pull request?
HIVE-26473: Hive JDK-17 support for compile and runtime
- Various version upgrades for jdk-17 compatibility
- Various test fixes for runtime issues
- CI config changes for jdk-17
- Java 17 coding convention upgrades
- Datanucleus upgrade with patached modules for rdbms
Why are the changes needed?
To support JDK-17 migration for Hive
Does this PR introduce any user-facing change?
Yes
Is the change a dependency upgrade?
Yes
How was this patch tested?
CI Tests and Sanity
@akshat0395 @deniskuzZ @kokila-19 , I have raised the issue on datanucleus rdbms community some time back regarding the issue which hive is facing post upgrade of datanucleus rdbms to 6.x verison. Today, datanucleus community has provided the fix for it. I have tested the issue in my local and it is passing. If possible can you guys also test the same.
We can skip this HIVE-28458 and test with the datanucleus rdbms 6.0.8-SNAPSHOT version.
Datanucleus Issue: https://github.com/datanucleus/datanucleus-rdbms/issues/496
@akshat0395 @deniskuzZ @kokila-19 , I have raised the issue on datanucleus rdbms community some time back regarding the issue which hive is facing post upgrade of datanucleus rdbms to 6.x verison. Today, datanucleus community has provided the fix for it. I have tested the issue in my local and it is passing. If possible can you guys also test the same.
We can skip this HIVE-28458 and test with the datanucleus rdbms 6.0.8-SNAPSHOT version.
Datanucleus Issue: datanucleus/datanucleus-rdbms#496
The plan was to revert the Patched datanucleus once the fix was ready in datanucleus. Thanks @Aggarwal-Raghav for helping with the fix.
@kokila-19, FYI, datanucleus rdbms 6.0.8 has been released.
@Aggarwal-Raghav Datanuleus rdbms 6.0.8 is yet to be available on maven central: https://mvnrepository.com/artifact/org.datanucleus/datanucleus-rdbms
As i can see the tag in the repo was also created 2 hour ago, I think this should be available soon as well, We can upgrade once the artifacts are available. Thanks
@kokila-19 , in the latest changes to datanucleus the hardcoded version of 6.0.0-release is still present in standalone-metastore/metastore-server/pom.xml. Can you please address that? Also as there are lot of dependency upgrade, can we have a dependency tree attached to the PR.
Could you please squash the 45 commit into 1. I think it makes more sense to have this in 1, instead of 45, since the goal for all is one and I think they only work together.
Could you please squash the 45 commit into 1. I think it makes more sense to have this in 1, instead of 45, since the goal for all is one and I think they only work together.
@zratkai, there are multiple tickets and specific issues being addressed in this effort, and keeping these issues isolated makes it easier to delegate, track, modify, or revert changes as needed. Managing such a large-scale effort through a single commit could make future modifications more challenging.
Additionally, this approach acknowledges the significant contributions from various team members. We want to ensure that each contributor’s work is properly recognized, just as it is with every other PR. While it wasn't feasible to separate these into individual PRs, having them under a single PR still maintains that sense of ownership and collaboration.
I hope this clarifies the reasoning behind our approach!
This change has 48 commits. Each commit has its meaningful fix. I was wondering if we should squash the 48 commits into one commit, or keep the 48 commits when merging the PR.
This change has 48 commits. Each commit has its meaningful fix. I was wondering if we should squash the 48 commits into one commit, or keep the 48 commits when merging the PR.
I just noticed that we only support squash and merge github action, so the 48 commits will be combined one commit when merging PR. :(
@kokila-19 , in the latest changes to datanucleus the hardcoded version of 6.0.0-release is still present in
standalone-metastore/metastore-server/pom.xml. Can you please address that? Also as there are lot of dependency upgrade, can we have a dependency tree attached to the PR.
This change was first merged into the older PR to verify that it did not introduce any breaking changes. The change is addressed here as well.
https://github.com/apache/hive/blob/master/README.md Should we change the README.md as well? Add come jdk17 information.
LGTM non-binding :) Please rebase to get the green ci.
We can go ahead with the pr. If no more critical issues, we can merge it first. Some minor issues can be addressed later.
@nikunjagarwal321 The patch https://github.com/apache/hive/pull/5527 that resolved serde flaky tests is not working with JDK17.
The solution utilizes the slot variable in the Field object to maintain the order, which helps prevent flaky tests. However, with JDK 17, all tests that rely on this slot are failing with the error: "java.lang.RuntimeException: Error getting a slot value". Debugging reveals a NoSuchFieldException, despite the fact that JDK 17 includes the slot variable. This issue is likely related to reflection, even though the --add-opens flag is used to allow access to private fields.
Could you provide us with more context on this solution?
You will have to update the java version here as well:
https://github.com/apache/hive/blob/master/.github/workflows/docker-GA-images.yml#L55
name: 'Set up JDK 8' uses: actions/setup-java@v1 with: java-version: 8
@simhadri-g Addressed this in https://github.com/apache/hive/pull/5404/commits/b45a226403f153c8a60fcec2574549d1c3ecebdd
I tried to build this PR and deploy it in my local pc. I found i can't start the HMS/HS2 out of box, and i need to change some startup scripts & add some JDK17 parameters manually. I think if we want to switch to JDK17, we should change some scripts to make sure users start the HMS/HS2 without changing any files.
d to build this PR and deploy it in my local pc. I
@tanishq-chugh, have you checked this?
In addition, if hadoop&tez uses jdk8 but hive uses jdk17, what should users pay attention to? Could you provide some suggesstion? It would be good we can provide some guidance based on actual local deployment.
But before adding this big feature to Hive 4.1.x, I want to resolve as many issues as possible that affect user deployment or use. If we find some diffculties about deployment&configuring this feature, we can not add this feature in 4.1.x version.
Thanks.
Quality Gate passed
Issues
460 New issues
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code
I tried to build this PR and deploy it in my local pc. I found i can't start the HMS/HS2 out of box, and i need to change some startup scripts & add some JDK17 parameters manually. I think if we want to switch to JDK17, we should change some scripts to make sure users start the HMS/HS2 without changing any files.
This has been addressed in 43a1145
Quality Gate passed
Issues
461 New issues
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code
Can we decompose this PR into multiple PRs based on the sub-tasks? I understand that some rewrites require the real JDK 17, but others don't. https://issues.apache.org/jira/browse/HIVE-26473
I also wonder if we can separate cosmetic changes like this
@okumin, Thank you for your suggestion. Unfortunately, separating the changes as proposed isn't feasible at this time due to the extensive interdependencies within this pull request. While we might be able to isolate a few cosmetic code refactors, they represent only a small part of the overall changes. The majority of the work is highly interdependent, which is critical for ensuring a green build and passing tests.
We did consider splitting the PR initially; however, given the scale and complexity of the changes, it made more sense to consolidate the work into a single PR with contributions from multiple team members.
@akshat0395 Thanks! I'm willing to merge this one, but I have two challenges.
The number of changed files is huge
This pull request changes 178 files. I manually counted the number of files, which included only nonessential changes. I guess the first and the second ones don't belong to this PR. We would have already merged 40% of this PR if we had separated them. The third one can also be separated.
- Add
@SuppressFBWarnings: 58 - JavaDoc: 21
- Applying Java 17 syntaxes(multiline string,
String#formatted, pattern matching): 33
It is hard to identify why we have to change test results
I can see this PR changes various .q.out. I see some comments explaining the root cause, such as the new container size, but I don't have an economical way to validate the statements. For example, if we agreed(I'm not confident with the necessity of the changes. BsoBird gave us additional information) with the new container size and extracted them into another PR, we would quickly review and merge that part without doubting any other changes.
- Stats
- The number of output files
- Timezone
- Floating Point
P.S. We can keep the current pull request if you and any review agree. In my opinion, it would be easier if we could finalize the part that looked good than we keep on maintaining all of this long-lived branch
What's the latest status here? Are we chasing datanucleus perf issue with Oracle?
Also seems that PR is linked to the EPIC and not to relevant JIRA: HIVE-28379, could we please update
What's the latest status here? Are we chasing datanucleus perf issue with Oracle?
Working with @saihemanth-cloudera to investigate the performance issue. We have identified the root cause and are now working on a fix and testing it.
Quality Gate passed
Issues
490 New issues
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code
I tried to use this PR as the dependency in the Apache Spark. It does not work compared to using master.
Quality Gate passed
Issues
362 New issues
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code