poetryben888

Results 8 comments of poetryben888

> Your job in waiting status. Can you click `Go to job event page` to get more information? Add please provide your job config yaml @Binyang2014 I use the guide...

@hzy46 thanks for your response。 Maybe the worker has no ROLES? ![image](https://user-images.githubusercontent.com/15098245/144162155-cf8623ce-186b-4a93-b233-b4703b2907d7.png) ![image](https://user-images.githubusercontent.com/15098245/144162221-28be0c83-ad21-4ca6-9006-b358831968fe.png) this is the pod definition: this pod will removed when the job stopped. ``` root@pai-master:~# kubectl describe...

@hzy46 Hi, Here is my results. I can not find any errors. ``` root@pai-master:/home/xubaishuai# kubectl get sa runtime-account -o yaml apiVersion: v1 kind: ServiceAccount metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"v1","kind":"ServiceAccount","metadata":{"annotations":{},"name":"runtime-account","namespace":"default"}} creationTimestamp:...

#siaimes 兄弟, The master`s IP is not available in browser. The port 80 has not any process. But port 8080 is up on webportal container, and 8080 is not available...

> All services will start automatically. One possible situation is that you are using a swap partition. In this case, the cluster will not start automatically because k8s does not...

> > > All services will start automatically. One possible situation is that you are using a swap partition. In this case, the cluster will not start automatically because k8s...

> > > All services will start automatically. One possible situation is that you are using a swap partition. In this case, the cluster will not start automatically because k8s...

你两台机器上都要执行一下这个: fleetrun --ips=10.130.19.203,10.130.17.157 --gpus=0,1 train_fleet_dygraph.py 10.130.17.157:这台机器上执行的时候最后会有个“**listen**”提示,等待10.130.19.203这台通信。