SparkingFlow icon indicating copy to clipboard operation
SparkingFlow copied to clipboard

Getting error in the python job

Open JyotinP opened this issue 2 years ago • 4 comments

{base.py:73} INFO - Using connection ID 'spark-conn' for task execution. {spark_submit.py:351} INFO - Spark-Submit cmd: spark-submit --master spark://spark-master-1:7077 --name arrow-spark jobs/python/wordcountjob.py {spark_submit.py:521} INFO - /home//.local/lib/python3.11/site-packages/pyspark/bin/load-spark-env.sh: line 68: ps: command not found {spark_submit.py:521} INFO - /home//.local/lib/python3.11/site-packages/pyspark/bin/spark-class: line 71: /usr/lib/jvm/java-11-openjdk-arm64/bin/java: No such file or directory {spark_submit.py:521} INFO - /home/***/.local/lib/python3.11/site-packages/pyspark/bin/spark-class: line 97: CMD: bad array subscript {taskinstance.py:1935} ERROR - Task failed with exception Traceback (most recent call last): File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/apache/spark/operators/spark_submit.py", line 160, in execute self._hook.submit(self._application) File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/apache/spark/hooks/spark_submit.py", line 452, in submit raise AirflowException( airflow.exceptions.AirflowException: Cannot execute: spark-submit --master spark://spark-master-1:7077 --name arrow-spark jobs/python/wordcountjob.py. Error code is: 1.

JyotinP avatar Nov 28 '23 04:11 JyotinP

at Dockerfile add " RUN export JAVA_HOME" It works for me

Set JAVA_HOME environment variable ENV JAVA_HOME /usr/lib/jvm/java-11-openjdk-amd64/ RUN export JAVA_HOME

franceZa avatar Dec 17 '23 11:12 franceZa

at Dockerfile add " RUN export JAVA_HOME" It works for me

Set JAVA_HOME environment variable ENV JAVA_HOME /usr/lib/jvm/java-11-openjdk-amd64/ RUN export JAVA_HOME

I did as you advised, but in my case I still got the same error.

[2024-01-16, 20:01:50 UTC] {standard_task_runner.py:104} ERROR - Failed to execute job 7 for task python_job (Cannot execute: spark-submit --master spark://spark-master:7077 --name arrow-spark --deploy-mode client jobs/python/wordcountjob.py. Error code is: 1.; 239)

image

python and java compatible image

Versions image

variables image

docker-compose.yml image

Airflow connection image

** Airflow Job Error** image

On my machine local, spark-submit ran without errors. image

weldermartins avatar Jan 16 '24 20:01 weldermartins

@weldermartins Please check this issue link since this worked for me. And let me if this works for you.

yashraizb avatar Aug 04 '24 10:08 yashraizb

@JyotinP It looks like ps command is not found, try adding procps in the dockerfile where java sdk and other packages are getting installed. Refer the below code:

RUN apt-get update &&
apt-get install -y gcc python3-dev openjdk-11-jdk procps &&
apt-get clean

yashraizb avatar Aug 04 '24 10:08 yashraizb