wuyi
wuyi
@mridulm The massive disconnection issue is an intermittent issue that can't be reproduced. I tend to believe it's not a Spark's issue but due to the bad nodes. The current...
We should also update `ResourceProfileBuilder` to provide the API for user to create `TaskResourceProfile`, e.g., ``` ResourceProfileBuilder().taskOnly().require(taskReqs).build() ``` or we could also extend `ResourceProfileBuilder` to have `TaskResourceProfileBuilder`.
The change generally looks good to me. It'd be good if @dongjoon-hyun could take a look since he has more knowledge in K8s.
Thanks @kevin85421 @mridulm , merged to Master!
@vanzin please have a look, thx.
@mgaido91 Thank you for your review. I've updated it.
> If we are calling stop when it is not necessary, I think we should rather avoid calling it in those cases. I was thinking about that way, but I...
https://github.com/apache/spark/blob/39b65b414c4ba36ada478369149f54452d90dd7b/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala#L169-L176 The issue seems to be that `Executor` construction failed due to the fatal error thrown during plugin initialization. And the fatal error doesn't fail the executor process, which leaves...
> The throw should result in uncaught exception handler killing the jvm - and if it does not, then the re-enqueue in prev step will cause the message to be...
> Essentially, since we are leveraging a ThreadPoolExecutor, it does not result in killing the thread with the exception/error thrown - but rather, will call ThreadPoolExecutor.afterExecute with the cause for...