Enable -Xsource:3 compiler flag
What changes were proposed in this pull request?
Enable -Xsource:3-cross compiler flag for future migration to Scala 3. https://docs.scala-lang.org/scala3/guides/migration/tooling-scala2-xsource3.html
This PR makes sure this code base compiled with Scala 2.13 aligns with Scala 3. This allows for an easier migration to Scala 3.
It introduces very minor binary incompatibility for APIs that were supposed to be private anyway. The impact of those incompatibilities should be zero. Unless people used those private constructors illegally.
If we exclude the future Scala 3 benefit and strictly focus on the immediate benefits of this PR: it removes wrongfully provided access to private constructors via the apply functions in their companion object.
Why Scala 3?
- Scala 3 was released in May 2021 and pretty much the entire community supports it now. This puts the Spark project in an embarrassing position, as it regularly shows up in jokes pointing how Spark is late on upgrading Scala. We can see, we already got reactions on this PR from people hoping to have Scala 3 support one day in Spark.
- Spark showing up in a lot of Scala codebases is causing a lot of pain for users who are also trying to use libraries that can only work on Scala 3.
- Scala 2 has become a complex language, but Scala 3 is more straightforward. It has been redesigned to be significantly simpler, with the removal of complex features considered warts. Some of which have been back ported to Scala 2.13 as we can see in this PR.
- Most learning materials now focus on Scala 3, as Scala 2 is largely being phased out. This leaves Spark users in a bad position to start with Spark, as they need to rely on old learning materials outside of the Spark documentation.
- Support for Scala 2 is in maintenance mode. It does not feel right for an advanced piece of tech such as Spark to rely on a language version that is now being phased out.
- Scala 3 ends the binary incompatibility madness. Scala 2.x versions were breaking on every minor versions. As a result, we relied on cross compiling the codebase between those versions. This is painful. Moving to Scala 3 means 1 last break/cross compile and no more breakage ever.
- A potential benefit to Spark could be the more Pythony syntax of Scala 3. Since Spark has a large Python user base, the optional indentation based syntax might help Python users feel like home with Scala.
- Scala 3 adds a lot of features that catch up with other languages such as extension methods or enums and more...
- Scala 2 is slow to compile. In our repo Scala 3 is 3x faster to compile than Scala 2 (1min instead of 3min), for the same code!
Why are the changes needed?
This not only eases potential future Scala 3 migration but also make the compiler stricter with features that have proven to be warts.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Using the existing test suite
Was this patch authored or co-authored using generative AI tooling?
No
Thanks
There are some MiMa failures, I was expecting that based on the doc. Can someone advise on these?
also cc @rednaxelafx
Spark 4.0.0 has been announced https://spark.apache.org/news/spark-4-0-0-released.html Does that mean this PR can move forward? It would be great to see to move ever closer to Scala 3. @joan38 you're the real hero, thank you!
Time to move on!
So much looking forward to using Scala 3 in Spark 4.x!
@LuciferYang let me know if we are ready to get back on this.
@LuciferYang any update? cc @dongjoon-hyun @HyukjinKwon @cloud-fan @zhengruifeng
Can someone explain what is blocking this PR now please?
@joan38 I take it this cross compiles fine to 2.13?
Were you able to mitigate some of the binary issues or still need a "waiver" based on the areas of breakage?
@ekrich this PR is not cross compiling with Scala 3. It's only preparing the 2.13 codebase to later cross compile Scala 3.
Please refer to my previous comment as to what is breaking according to MiMa. But these are private constructors that were not actually private since we could instantiate the class via the apply in the companion object. This should be made private or we should make the class public to make is consistent.
That makes total sense. Those discussions about what to make public or not are potentially breaking but decisions should be made. I think you should be very close then if you can get agreement.
Any news on this guys?
Any news regarding this ? Also please post a JIRA link if this is tracked in jira so we can avoid further "Any news" follow up comment
Created SPARK-54150 for Umbrella Scala 3.