spark icon indicating copy to clipboard operation
spark copied to clipboard

[SPARK-51699][BUILD] Upgrade to Apache parent pom 34

Open vrozov opened this issue 9 months ago • 3 comments

What changes were proposed in this pull request?

Upgrade Apache parent pom from version 18 to the latest version 34 along with upgrading few maven plugins versions to match those from the latest parent pom.

  • maven-enforcer-plugin from 3.3.0 to 3.5.0
  • maven-compiler-plugin from 3.11.0 to 3.14.0
  • maven-surefire-plugin from 3.1.2 to 3.5.2
  • maven-jar-plugin from 3.3.0 to 3.4.2
  • maven-source-plugin from 3.3.0 to 3.3.1
  • maven-clean-plugin from 3.3.1 to 3.4.1
  • maven-javadoc-plugin from 3.5.0 to 3.11.2
  • maven-assembly-plugin from 3.6.0 to 3.7.1
  • maven-shade-plugin from 3.5.0 to 3.6.0
  • maven-install-plugin from 3.1.1 to 3.1.4
  • maven-deploy-plugin from 3.1.1 to 3.1.4
  • maven-dependency-plugin from 3.6.0 to 3.8.1

Why are the changes needed?

Apache parent pom version 18 is 15 years old.

Does this PR introduce any user-facing change?

No

How was this patch tested?

maven build with Java 8 and Java 17

Was this patch authored or co-authored using generative AI tooling?

No

vrozov avatar Apr 17 '25 18:04 vrozov

Hi @vrozov, thanks for the PR.

However, I have already merged https://github.com/apache/spark/pull/50500 to master/4.1.0. I guess this PR won't be necessary as those related artifacts are all related to the build and release phases.

yaooqinn avatar Apr 18 '25 07:04 yaooqinn

Hi @yaooqinn, the PR targets 3.5.x releases (branch-3.5).

vrozov avatar Apr 18 '25 15:04 vrozov

@yaooqinn IMO, it will be beneficial to update 3.5 branch and build to use Apache parent pom that is not 15 years old.

vrozov avatar Apr 24 '25 06:04 vrozov

For the record, @vrozov seems to be unaware of Apache Spark backporting policy.

  • Apache Spark has a backporting policy from the latest to the oldest live release branches. In this case, Spark 4.1.0 (master) -> Spark 4.0.x (branch-4.0) -> Spark 3.5 (branch-3.5).
  • This PR is a violation of Apache Spark backporting policy because branch-4.0 also didn't have this.

the PR targets 3.5.x releases (branch-3.5).

As a release manager of Apache Spark 4.0.1, I don't think this PR is applicable to branch-4.0 for 4.0.1 or future 4.0.x. Given that, I close this PR to prevent accidental merging to branch-3.5.

dongjoon-hyun avatar Sep 01 '25 22:09 dongjoon-hyun

@dongjoon-hyun This PR was open at the same time as #50500 and as commit from the master can be cleanly applied to 4.0 branch but not to branch-3.5, I have not open a separate PR for branch-4.0 as a committer who merged PRs to master would also consider clean cherry-pick to 4.0. I followed similar process in #50594 and #50810.

Can you please clarify why do you think that it should not be applied to branch-4.0? Will it be beneficial to keep build (versions of maven plugins, their configurations and etc) in sync between all long term support branches?

vrozov avatar Sep 02 '25 04:09 vrozov

@dongjoon-hyun This PR was open at the same time as #50500 and as commit from the master can be cleanly applied to 4.0 branch but not to branch-3.5, I have not open a separate PR for branch-4.0 as a committer who merged PRs to master would also consider clean cherry-pick to 4.0. I followed similar process in #50594 and #50810.

Can you please clarify why do you think that it should not be applied to branch-4.0? Will it be beneficial to keep build (versions of maven plugins, their configurations and etc) in sync between all long term support branches?

To @vrozov , Apache Spark community follows Semantic Versioning which means the committers are not allowed to backport any improvements or features to branch-4.0 . Only bug-fix patches are allowed.

  • https://spark.apache.org/versioning-policy.html
  • https://semver.org

cc @sarutak because AWS seems to be unaware of Apache Spark versioning policy.

dongjoon-hyun avatar Sep 02 '25 15:09 dongjoon-hyun

@dongjoon-hyun semver usually applies to API. This PR does not have any impact on the Spark API and/or Spark features. It only impacts Spark maven build. This seems to be in line with the Spark policy "However, higher level libraries may introduce small features, such as a new algorithm, provided they are entirely additive and isolated from existing code paths." Please also check the following commit that changed maven version on branch-3.5 without violating semver.

vrozov avatar Sep 03 '25 04:09 vrozov

To the other reviewers:

To give you a better context, both @vrozov and I remembered the case of revert of Apache parent pom 34 of Apache ORC 2.0 community. @vrozov proposed the similar Apache parent pom 34 upgrade patch to Apache ORC community and it almost blocked Apache ORC 2.0 initial release. Luckily, the release manager found the failure during the preparation and recovered it by reverting like the following.

  • https://github.com/apache/orc/pull/2214
$ git log --oneline branch-2.2 | grep ORC-1891
c85abb05c Revert "ORC-1891: Upgrade to Apache parent pom 34"
6a3ac1f02 ORC-1891: Upgrade to Apache parent pom 34

As a member of Apache ORC PMC, I can say that we were lucky at that time because we manage to handle it before the initial release from ORC branch-2.2. However, I can say that the incident is the real evidence that Apache parent pom change is one of the highly risky changes.

Given the above bad experience, I'm negative for trying to change Apache parent pom at the release branches because it could break the release process during a transition from 4.0.1 to 4.0.2 or from 3.5.6 to 3.6.7. That's the reason why @vrozov and I have been continuing a long discussion.

Moreover, this PR contains more than that. It added only complexity.

maven-enforcer-plugin from 3.3.0 to 3.5.0
maven-compiler-plugin from 3.11.0 to 3.14.0
maven-surefire-plugin from 3.1.2 to 3.5.2
maven-jar-plugin from 3.3.0 to 3.4.2
maven-source-plugin from 3.3.0 to 3.3.1
maven-clean-plugin from 3.3.1 to 3.4.1
maven-javadoc-plugin from 3.5.0 to 3.11.2
maven-assembly-plugin from 3.6.0 to 3.7.1
maven-shade-plugin from 3.5.0 to 3.6.0
maven-install-plugin from 3.1.1 to 3.1.4
maven-deploy-plugin from 3.1.1 to 3.1.4
maven-dependency-plugin from 3.6.0 to 3.8.1

To @vrozov ,

  1. What I meant is not Apache Spark's versioning. What I pointed was the version of the Apache pom and the above plugins: 3.11.0 to 3.14.0 or 3.5.0 to 3.11.2 is not considered a bug-fixed version change.

semver usually applies to API. This PR does not have any impact on the Spark API and/or Spark features.

  1. It seems that you are underestimated Maven build. The main artifact of ASF project is the source code as you know. Maven is the essential foundation for compilation/test/packaging/doc generation. We should not break anything of Maven build during the maintenance releases. Here, you are suggesting a very risky idea for the branches which are not supposed to accept those kind of experiments.

It only impacts Spark maven build.

dongjoon-hyun avatar Sep 03 '25 06:09 dongjoon-hyun

@dongjoon-hyun Thank you for ccing me. I believe members in my team have already read the versioning-policy doc but I'll make sure to inform to read the doc. Also, thank you for giving us a context that upgrading parent pom for maintenance releases is risky.

sarutak avatar Sep 03 '25 06:09 sarutak

Thank you, @sarutak .

dongjoon-hyun avatar Sep 03 '25 06:09 dongjoon-hyun

Yeah, I second @dongjoon-hyun.

Changing the parent pom is likely to impact the release artifacts w/ or w/o awareness. It's too risky to upgrade it w/o a clearly identified problem that needs to be fixed for maintenance branches, especially branches like 3.5 that already have 7 releases. Not mentioning how big the step is to jump from v18 to v34.

yaooqinn avatar Sep 03 '25 06:09 yaooqinn