[Bug] Spark jar task failed to run
Search before asking
- [X] I had searched in the issues and found no similar issues.
What happened
When configuring the cluster components and running the Spark jar task, it was found that it could not run successfully
What you expected to happen
How to reproduce
I ran a spark pi task with parameters of 10 or 100, and the Application Master would link the parameters as hosts
Application application_1693541457708_0007 failed 1 times (global limit =2; local limit is =1) due to AM Container for appattempt_1693541457708_0007_000001 exited with exitCode: 10 Failing this attempt.Diagnostics: [2023-09-01 15:56:35.718]Exception from container-launch. Container id: container_e130_1693541457708_0007_01_000001 Exit code: 10 [2023-09-01 15:56:35.719]Container exited with a non-zero exit code 10. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr : etrying ... 23/09/01 15:56:33 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ... 23/09/01 15:56:33 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ... 23/09/01 15:56:33 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ... 23/09/01 15:56:33 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ... 23/09/01 15:56:33 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ... 23/09/01 15:56:33 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ... 23/09/01 15:56:33 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ... 23/09/01 15:56:33 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ... 23/09/01 15:56:34 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ... 23/09/01 15:56:34 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ... 23/09/01 15:56:34 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ... 23/09/01 15:56:34 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ... 23/09/01 15:56:34 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ... 23/09/01 15:56:34 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ... 23/09/01 15:56:34 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ... 23/09/01 15:56:34 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ... 23/09/01 15:56:34 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ... 23/09/01 15:56:34 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ... 23/09/01 15:56:35 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ... 23/09/01 15:56:35 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ... 23/09/01 15:56:35 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ... 23/09/01 15:56:35 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ... 23/09/01 15:56:35 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ... 23/09/01 15:56:35 ERROR yarn.ApplicationMaster: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver! at org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkDriver(ApplicationMaster.scala:579) at org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:434) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:256) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:766) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:67) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:66) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:764) at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:787) at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala) 23/09/01 15:56:35 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver!) 23/09/01 15:56:35 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver!) 23/09/01 15:56:35 INFO yarn.ApplicationMaster: Deleting staging directory hdfs://lcc-ambari-server01:8020/user/admin/.sparkStaging/application_1693541457708_0007 23/09/01 15:56:35 INFO util.ShutdownHookManager: Shutdown hook called [2023-09-01 15:56:35.719]Container exited with a non-zero exit code 10. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr : etrying ... 23/09/01 15:56:33 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ... 23/09/01 15:56:33 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ... 23/09/01 15:56:33 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ... 23/09/01 15:56:33 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ... 23/09/01 15:56:33 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ... 23/09/01 15:56:33 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ... 23/09/01 15:56:33 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ... 23/09/01 15:56:33 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ... 23/09/01 15:56:34 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ... 23/09/01 15:56:34 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ... 23/09/01 15:56:34 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ... 23/09/01 15:56:34 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ... 23/09/01 15:56:34 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ... 23/09/01 15:56:34 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ... 23/09/01 15:56:34 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ... 23/09/01 15:56:34 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ... 23/09/01 15:56:34 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ... 23/09/01 15:56:34 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ... 23/09/01 15:56:35 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ... 23/09/01 15:56:35 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ... 23/09/01 15:56:35 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ... 23/09/01 15:56:35 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ... 23/09/01 15:56:35 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10:0, retrying ... 23/09/01 15:56:35 ERROR yarn.ApplicationMaster: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver! at org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkDriver(ApplicationMaster.scala:579) at org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:434) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:256) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:766) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:67) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:66) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:764) at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:787) at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala) 23/09/01 15:56:35 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver!) 23/09/01 15:56:35 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver!) 23/09/01 15:56:35 INFO yarn.ApplicationMaster: Deleting staging directory hdfs://lcc-ambari-server01:8020/user/admin/.sparkStaging/application_1693541457708_0007 23/09/01 15:56:35 INFO util.ShutdownHookManager: Shutdown hook called For more detailed output, check the application tracking page: http://lcc-ambari-server01:8188/applicationhistory/app/application_1693541457708_0007 Then click on links to logs of each attempt. . Failing the application.
Anything else
No response
Version
master
Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
@vainhope @mortalYoung
从日志中看,是spark任务的AppMaster无法连接至Driver,所以任务失败 确认下是否有网络不通的问题呢
它会拿taier上spark jar任务的输入参数,作为dirver的host, 0 作为port, 我试了不同的spark jar 任务,都是一样的问题 @vainhope