Tian Xia

Results 63 issues of Tian Xia

### 🐛 Describe the bug The [PyTorch mnist example](https://github.com/pytorch/examples/tree/main/mnist) failed to work on a machine with preinstalled cuDNN, and manually install PyTorch 2.2.0 (`pip install torch torchvision`). I found a...

module: binaries
module: cuda
triaged
module: regression

Addresses the new comment under #3231. TODO: testing. Tested (run the relevant ones): - [x] Code formatting: `bash format.sh` - [ ] Any manual or new tests for this PR...

This PR adds support for custom images on Azure. Partially fixes #2910. Related: skypilot-org/skypilot-catalog#66 Tested (run the relevant ones): - [x] Code formatting: `bash format.sh` - [x] Any manual or...

https://github.com/skypilot-org/skypilot/pull/3231#discussion_r1505175836

TODO: Benchmark and get some numbers Fix bug using ```bash sudo jq '.["exec-opts"] = ["native.cgroupdriver=cgroupfs"]' /etc/docker/daemon.json > /tmp/daemon.json && sudo mv /tmp/daemon.json /etc/docker/daemon.json sudo systemctl restart docker ``` Tested (run...

Current native docker support (#1910 ) doesn't support Google's TPU accelerators. We should add support for docker to use TPU.

enhancement

A simple example with authentication. Tested (run the relevant ones): - [x] Code formatting: `bash format.sh` - [x] Any manual or new tests for this PR (please specify below) -...

One user mentioned that pulling the docker container is slow, and one way to alleviate this is supporting image_id when docker as runtime env is used: we could use an...

Current native docker support (#1910 ) uses the same images (DL images) with regular clusters, which contains a lot of useless dependencies since all workloads are run in docker containers....

enhancement

One user required to see the ray dashboard on each service replica.

serve