spark icon indicating copy to clipboard operation
spark copied to clipboard

[SPARK-38910][YARN] Clean spark staging before unregister

Open AngersZhuuuu opened this issue 3 years ago • 8 comments

What changes were proposed in this pull request?

After discussing about https://github.com/apache/spark/pull/36207 and re-check the whole logic, we should revert https://github.com/apache/spark/pull/36207 and do some change

  1. No matter whether it's client or cluster mode if it's the last attempt, anyway yarn won't rerun the job, we can clean staging dir first then we can avoid remaining staging dir if unregister failed.
  2. If it's cluster or client mode, and it's not the last attempt and the final status is SUCCESS, if unregister failed, YARN will rerun the job again, we can't clean the staging dir before unregistering success because if we clean the staging dir before rerunning, yarn can't download the related files and fail.
  3. If it's cluster unmanaged mode, if it failed, we can first delete the staging dir since it won't rerun.

Why are the changes needed?

Revert change and make it more accurate

Does this PR introduce any user-facing change?

No

How was this patch tested?

AngersZhuuuu avatar Jul 12 '22 03:07 AngersZhuuuu

waiting for @tgravescs back and review this

AngersZhuuuu avatar Jul 12 '22 03:07 AngersZhuuuu

items 2 and 3 in the description, is one of those supposed to be client mode? Otherwise they are the same.

tgravescs avatar Jul 19 '22 17:07 tgravescs

@tgravescs Seems latest twice GA failed not caused by this pr

AngersZhuuuu avatar Jul 22 '22 06:07 AngersZhuuuu

Test the failed UT in local, it can success.

AngersZhuuuu avatar Jul 25 '22 05:07 AngersZhuuuu

ping @dongjoon-hyun The latest GA failed caused by

* DONE (miniUI)
ERROR: dependency ‘pkgdown’ is not available for package ‘devtools’
* removing ‘/usr/local/lib/R/site-library/devtools’

The downloaded source packages are in
	‘/tmp/RtmpTvMfJ6/downloaded_packages’
Warning messages:
1: In install.packages(c("devtools"), repos = "https://cloud.r-project.org/") :
  installation of package ‘systemfonts’ had non-zero exit status
2: In install.packages(c("devtools"), repos = "https://cloud.r-project.org/") :
  installation of package ‘textshaping’ had non-zero exit status
3: In install.packages(c("devtools"), repos = "https://cloud.r-project.org/") :
  installation of package ‘ragg’ had non-zero exit status
4: In install.packages(c("devtools"), repos = "https://cloud.r-project.org/") :
  installation of package ‘pkgdown’ had non-zero exit status
5: In install.packages(c("devtools"), repos = "https://cloud.r-project.org/") :
  installation of package ‘devtools’ had non-zero exit status
Error in loadNamespace(x) : there is no package called ‘devtools’
Calls: loadNamespace -> withRestarts -> withOneRestart -> doWithOneRestart
Execution halted
Error: Process completed with exit code 1.

Any advise?

AngersZhuuuu avatar Jul 28 '22 10:07 AngersZhuuuu

can you try kicking the tests again?

tgravescs avatar Aug 09 '22 13:08 tgravescs

can you try kicking the tests again?

Yea

AngersZhuuuu avatar Aug 09 '22 13:08 AngersZhuuuu

@tgravescs All GA passed now

AngersZhuuuu avatar Aug 10 '22 02:08 AngersZhuuuu

merged to master

tgravescs avatar Aug 10 '22 14:08 tgravescs