Tyler Rendina
Tyler Rendina
@maziyarpanahi to reproduce: Dockerfile for ECR Image referenced by EMR Serverless Application ```dockerfile FROM public.ecr.aws/emr-serverless/spark/emr-6.14.0:20230928-x86_64 USER root RUN pip3 install spark-nlp==5.1.4 USER hadoop:hadoop ``` Spark Submit via Console ```json {...
Stack Trace while using the default cache location (adding permissions and ownership to user hadoop:hadoop included here, same as leaving it out) ``` 3.4.1-amzn-1 5.1.4 Internet is connected. sentence_detector_dl download...
## Update I've confirmed step 1 of a workaround with the below ```python session = Session() credentials = session.get_credentials() current_credentials = credentials.get_frozen_credentials() os.environ['AWS_ACCESS_KEY_ID'] = current_credentials.access_key os.environ['AWS_SECRET_ACCESS_KEY'] = current_credentials.secret_key old_spark: SparkSession...
# Final Note The request stands, my comments were an exercise to find a workaround. @maziyarpanahi the request can be more concisely articulated to "EMRFS pretrained model cache". What do...
Is there a way to manually add the class after importing the spark bundle?
I got it to compile, bootstrapped the spark bundle, hive sync, and aws bundle to emr. Now getting java.lang.ClassNotFoundException: Class com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory not found
Final note, apologies for the amount of posts, but this may help EMR users with Glue as their Hive service. Make sure to build Hudi using Java 8, if you...
While I can kick off backfills, they eventually fail along side streams with `java.lang.NoSuchMethodError: com.amazonaws.transform.JsonUnmarshallerContext.getCurrentToken()Lcom/amazonaws/thirdparty/jackson/core/JsonToken;` Per https://github.com/apache/hudi/issues/5053 I just added `/usr/share/aws/aws-java-sdk/aws-java-sdk-bundle-1.12.446.jar` to my jars, this is for emr 6.11.1, you...
> > why is it required to set these? is it really required? > > ``` > > > "--conf", > > > "spark.driver.extraClassPath=/usr/lib/hudi/hudi-aws-bundle-0.14.1.jar:/usr/lib/hudi/hudi-spark3.3-bundle_2.12-0.14.1.jar", > > > "--conf", > >...
EMR on EKS gave me issues and I switched to EMR on EC2 about a year ago, probably needed to do the same thing done here. Planning to use something...