wuyi

Results 20 comments of wuyi

@mridulm The massive disconnection issue is an intermittent issue that can't be reproduced. I tend to believe it's not a Spark's issue but due to the bad nodes. The current...

We should also update `ResourceProfileBuilder` to provide the API for user to create `TaskResourceProfile`, e.g., ``` ResourceProfileBuilder().taskOnly().require(taskReqs).build() ``` or we could also extend `ResourceProfileBuilder` to have `TaskResourceProfileBuilder`.

The change generally looks good to me. It'd be good if @dongjoon-hyun could take a look since he has more knowledge in K8s.

@mgaido91 Thank you for your review. I've updated it.

> If we are calling stop when it is not necessary, I think we should rather avoid calling it in those cases. I was thinking about that way, but I...

https://github.com/apache/spark/blob/39b65b414c4ba36ada478369149f54452d90dd7b/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala#L169-L176 The issue seems to be that `Executor` construction failed due to the fatal error thrown during plugin initialization. And the fatal error doesn't fail the executor process, which leaves...

> The throw should result in uncaught exception handler killing the jvm - and if it does not, then the re-enqueue in prev step will cause the message to be...

> Essentially, since we are leveraging a ThreadPoolExecutor, it does not result in killing the thread with the exception/error thrown - but rather, will call ThreadPoolExecutor.afterExecute with the cause for...