iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

Improve MetricsReporter loading with class loader fallback

Open bk-mz opened this issue 1 year ago • 1 comments

This pull request addresses the issue of class loader discrepancies when loading JAR files via the spark-submit command from remote locations (such as S3 or through Ivy). This discrepancy specifically impacts deployments on EMR clusters with the Iceberg feature enabled.

Result of this slack thread discussion.

Description:

When executing Spark jobs using the spark-submit command, JAR files loaded from remote locations (e.g., S3 or via Ivy) are placed into a different class loader known as org.apache.spark.util.MutableURLClassLoader. This class loader is a child class loader of the AppClassLoader.

When enabling the EMR Iceberg flag, as described in the AWS EMR documentation, the Iceberg JAR file resides in the AppClassLoader. In contrast, user code (such as a metric reporter) is placed in the MutableURLClassLoader. Consequently, the Iceberg code can't access classes from the user code because the parent class loader (AppClassLoader) doesn't have visibility into the child class loader (MutableURLClassLoader)

bk-mz avatar Jun 07 '24 07:06 bk-mz

@bk-mz can you remove the target/ folder? We don't want to commit binaries to the repository.

Please see this comment: https://github.com/apache/iceberg/pull/10459#discussion_r1647665081

bk-mz avatar Jun 20 '24 14:06 bk-mz

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

github-actions[bot] avatar Nov 06 '24 00:11 github-actions[bot]

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

github-actions[bot] avatar Nov 13 '24 00:11 github-actions[bot]

dammit.

bk-mz avatar Nov 13 '24 06:11 bk-mz

@bk-mz feel free to re-open the PR to mark it not stale

nastra avatar Nov 13 '24 06:11 nastra