Tian Xia
Tian Xia
### 🐛 Describe the bug The [PyTorch mnist example](https://github.com/pytorch/examples/tree/main/mnist) failed to work on a machine with preinstalled cuDNN, and manually install PyTorch 2.2.0 (`pip install torch torchvision`). I found a...
Addresses the new comment under #3231. TODO: testing. Tested (run the relevant ones): - [x] Code formatting: `bash format.sh` - [ ] Any manual or new tests for this PR...
This PR adds support for custom images on Azure. Partially fixes #2910. Related: skypilot-org/skypilot-catalog#66 Tested (run the relevant ones): - [x] Code formatting: `bash format.sh` - [x] Any manual or...
https://github.com/skypilot-org/skypilot/pull/3231#discussion_r1505175836
TODO: Benchmark and get some numbers Fix bug using ```bash sudo jq '.["exec-opts"] = ["native.cgroupdriver=cgroupfs"]' /etc/docker/daemon.json > /tmp/daemon.json && sudo mv /tmp/daemon.json /etc/docker/daemon.json sudo systemctl restart docker ``` Tested (run...
Current native docker support (#1910 ) doesn't support Google's TPU accelerators. We should add support for docker to use TPU.
A simple example with authentication. Tested (run the relevant ones): - [x] Code formatting: `bash format.sh` - [x] Any manual or new tests for this PR (please specify below) -...
One user mentioned that pulling the docker container is slow, and one way to alleviate this is supporting image_id when docker as runtime env is used: we could use an...
Current native docker support (#1910 ) uses the same images (DL images) with regular clusters, which contains a lot of useless dependencies since all workloads are run in docker containers....
One user required to see the ray dashboard on each service replica.