Improve MetricsReporter loading with class loader fallback
This pull request addresses the issue of class loader discrepancies when loading JAR files via the spark-submit command from remote locations (such as S3 or through Ivy). This discrepancy specifically impacts deployments on EMR clusters with the Iceberg feature enabled.
Result of this slack thread discussion.
Description:
When executing Spark jobs using the spark-submit command, JAR files loaded from remote locations (e.g., S3 or via Ivy) are placed into a different class loader known as org.apache.spark.util.MutableURLClassLoader. This class loader is a child class loader of the AppClassLoader.
When enabling the EMR Iceberg flag, as described in the AWS EMR documentation, the Iceberg JAR file resides in the AppClassLoader. In contrast, user code (such as a metric reporter) is placed in the MutableURLClassLoader. Consequently, the Iceberg code can't access classes from the user code because the parent class loader (AppClassLoader) doesn't have visibility into the child class loader (MutableURLClassLoader)
@bk-mz can you remove the
target/folder? We don't want to commit binaries to the repository.
Please see this comment: https://github.com/apache/iceberg/pull/10459#discussion_r1647665081
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.
This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.
dammit.
@bk-mz feel free to re-open the PR to mark it not stale