devendra ayalasomayajula

Results 10 comments of devendra ayalasomayajula

I have seen similar issue on ubuntu server ubuntu 14.04.5 docker-engine: 1.11.0-0~trusty docker-volume-netshare: 0.32 Steps to reproduce: 1. Create an image from following Docker file ``` FROM busybox RUN mkdir...

> @blackgold This is really interesting - have you got numbers for how long the TM calculation is taking? Does it impact your container startup time? It would be really...

> @blackgold I assume you also have other device plugin instances running in the same cluster that requires NUMA advertising, correct? so disabling NUMA policy in kubelet is not an...

Yup its a 8 NUMA zone node, 8 gpu, 8 RDMA devices and 255 cpus. https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/cm/topologymanager/policy.go#L142 Here the size of allProviderHints is 10x239. When I timed it took 220 seconds...

> was an issue filed against topology manager ? maybe the algorithm can be improved not yet. @klueska If you guys think its reasonable to control this using a cli...

Kind of using a sriov-cleaner daemon to clean up the device that ends up in bad shape on the host. Looking for suggestions on ideal solution.

I tried switching to init namespace, however the device is not visible. The device is visible on the host only after cmdDel command is called on all the devices.

> We're running into this currently. We have 4-6 interfaces in use by the CNI, but are often finding 1 or 2 left with a bad interface name, and various...

So from within the sriov-cni process when i tried to list devices in the init ns, the devices dont show up. Only after the last cmdDel invocation finishes the devices...

In our use case the job uses all IB devices for training. If even one device is not healthy the job will not run. It will be nice to have...