Ryan

Results 6 issues of Ryan

## ❓ Questions and Help ### Please note that this issue tracker is not a help form and this issue will be closed. Before submitting, please ensure you have gone...

## 🐛 Bug custom components using binop instead of Optional result in validation error. custom schedulers work as intended as there is no validation Module (check all that applies): *...

## 🐛 Bug Device Request capabilities should be updated to "gpu", not "compute" https://github.com/pytorch/torchx/blob/main/torchx/schedulers/docker_scheduler.py#L308 ``` c.kwargs["device_requests"] = [ DeviceRequest( count=resource.gpu, capabilities=[["compute"]], ) ] ``` Module (check all that applies): *...

## 🐛 Bug In DockerScheduler._submit_dryrun, the keyword argument for docker.containers.run hostname is set to name: https://github.com/pytorch/torchx/blob/main/torchx/schedulers/docker_scheduler.py#L280 name is set to ``` name = f"{app_id}-{role.name}-{replica_id}" ``` https://github.com/pytorch/torchx/blob/main/torchx/schedulers/docker_scheduler.py#L259C17-L260C1 It is typical/common for...

**Describe the bug** A clear and concise description of what the bug is. I have an mcore distributed checkpoint trained with PP=1, TP=1. When running inference with this distributed checkpoint,...

bug
stale

This is connecting to a persistent docker container I manually started, and then attaching to the container via the remote explorer. I am working inside this container for a lengthy...

bug