Yuan, Man
Yuan, Man
See https://github.com/kubeflow/pytorch-operator/issues/125 for more details. https://github.com/kubeflow/arena/blob/master/charts/pytorchjob/templates/pytorchjob.yaml#L313 Maybe changing this line to `OnFailure` can help resolving this issue?
Hi @yongtang , can you take a look at this issue? Thanks.
cc @cheyang @WencongXiao
Thanks for your report, I will look into it.
@DelightRun Could you try the latest commit ?
> > @DelightRun Could you try the latest commit ? > > I use your pre-built v0.8.0 wheel package with TensorFlow 1.15.0. It's not very convenient for me to compile...