Yifan Xiong

Results 21 comments of Yifan Xiong

Hi, currently in OpenPAI, the GPU scheduling result is set by GPU ids in `NVIDIA_VISIBLE_DEVICES` env variable, e.g., `NVIDIA_VISIBLE_DEVICES=0,1`, and mounted by nvidia container runtime. However, to leverage MIG, it...

> However, we (NVIDIA) will be releasing details of MIG support on Kubernetes soon. We have a POC of K8s working with MIG, but we want to involve the community...

* For logs directly written to file (sb-run.log, sb-exec.log), there's no color by default * For logs written to stdout/stderr, you can disable it with `ANSIBLE_NOCOLOR=1` and `NO_COLOR=1`

did you try with the latest [release/0.6](https://github.com/microsoft/superbenchmark/tree/release/0.6) branch? will need to pull the code and rerun `python3 -m pip install .` on management node

When there're multiple Ethernet interfaces available, there won't be "default", OMPI will [automatically detect](https://www.open-mpi.org/faq/?category=tcp#tcp-multi-network) the usable interface using its [routability algorithm](https://www.open-mpi.org/faq/?category=tcp#tcp-routability). The detection is complex and it still cannot cover...

Because `ibstat -l` cannot guarantee a deterministic order (alphabetical, alphanumeric, or pcie order) of multiple ib devices (see [this thread](https://www.spinics.net/lists/linux-rdma/msg89308.html)), currently we support to specify it programmably (could be `ibstat...

It's required when using gpus in docker (here's [an example](https://forums.developer.nvidia.com/t/rootless-docker-error-no-supported-gpu-s-detected-to-run-this-container/210593)), and should be included as one step during [nvidia driver installation](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#runfile-verifications) If you cannot find /dev/nvidia-uvm in your system, you...