torchchat
torchchat copied to clipboard
[WIP] [Hackability Refactor]: Move the distributed folder from the top level repo to torchchat/torchchat
Moves the top level distributed folder into a separate distributed folder within the torchchat umbrella.
There are intentionally no code changes outside of the README and script path updates
:link: Helpful Links
:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1047
- :page_facing_up: Preview Python docs built from this PR
Note: Links to docs will display an error until the docs builds have been completed.
:x: 25 New Failures, 2 Cancelled Jobs
As of commit c99281959518ed5c556595ce1cde70cbe8d49699 with merge base fcadb14eeb9f944150e3ac8fc2221c7619dba80e ():
NEW FAILURES - The following jobs have failed:
-
pull / compile-gguf (macos-14) (gh)
ModuleNotFoundError: No module named 'distributed' -
pull / runner-aoti (macos-14-xlarge) (gh)
ModuleNotFoundError: No module named 'distributed' -
pull / runner-et (16-core-ubuntu) (gh)
ModuleNotFoundError: No module named 'distributed' -
pull / runner-et (macos-14-xlarge) (gh)
ModuleNotFoundError: No module named 'distributed' -
pull / test-cpu-aoti (aarch64, stories15M) (gh)
ModuleNotFoundError: No module named 'distributed' -
pull / test-cpu-aoti (x86_64, stories15M) (gh)
ModuleNotFoundError: No module named 'distributed' -
pull / test-cpu-compile (aarch64, stories15M) (gh)
ModuleNotFoundError: No module named 'distributed' -
pull / test-cpu-compile (x86_64, stories15M) (gh)
ModuleNotFoundError: No module named 'distributed' -
pull / test-cpu-eval-sanity-check (aarch64, stories15M) (gh)
ModuleNotFoundError: No module named 'distributed' -
pull / test-cpu-eval-sanity-check (x86_64, stories15M) (gh)
ModuleNotFoundError: No module named 'distributed' -
pull / test-cpu-eval-sanity-check-float16 (aarch64, stories15M) (gh)
ModuleNotFoundError: No module named 'distributed' -
pull / test-cpu-eval-sanity-check-float16 (x86_64, stories15M) (gh)
ModuleNotFoundError: No module named 'distributed' -
pull / test-cpu-eval-sanity-check-float32 (aarch64, stories15M) (gh)
ModuleNotFoundError: No module named 'distributed' -
pull / test-cpu-eval-sanity-check-float32 (x86_64, stories15M) (gh)
ModuleNotFoundError: No module named 'distributed' -
pull / test-gpu-aoti-bfloat16 (cuda, stories15M) / linux-job (gh)
RuntimeError: Command docker exec -t c2ab3c07e1c0f86dba11a49a5b4d101295c8cfc99a5f2743cfda8fd4dd5b0e36 /exec failed with exit code 1 -
pull / test-gpu-aoti-float16 (cuda, stories15M) / linux-job (gh)
RuntimeError: Command docker exec -t c295d727552262a51be6991bd8df384d8383176efd21f38a4cc37b45a55f93b4 /exec failed with exit code 1 -
pull / test-gpu-aoti-float32 (cuda, stories15M) / linux-job (gh)
RuntimeError: Command docker exec -t d73f2226f1e80a8938ae089f2fdc7eaf6f1322a502cc1c98c29fe2628a9e5fac /exec failed with exit code 1 -
pull / test-gpu-compile (cuda, stories15M) / linux-job (gh)
RuntimeError: Command docker exec -t 1900e5187ed818aa9ce34674938d406b67eb4363057df32fad934c82d52bb669 /exec failed with exit code 1 -
pull / test-gpu-eval-sanity-check (cuda, stories15M) / linux-job (gh)
RuntimeError: Command docker exec -t f24ebdc3bf02554c583ac39d9e31e75d1be26da4b3497f8461d91495c831be30 /exec failed with exit code 1 -
pull / test-mps / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1 -
pull / test-mps-dtype / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1 -
pull / test-tinystories-executorch (macos-14-xlarge) (gh)
ModuleNotFoundError: No module named 'distributed' -
pull / torchchat-command-load-test (macos-14) (gh)
ModuleNotFoundError: No module named 'distributed' -
Run parallel prefill / test-cuda / linux-job (gh)
RuntimeError: Command docker exec -t 77df9d137bcf18384aee9e92ab17de9a4a61bcd289a1c4455e40207d5de7fd1a /exec failed with exit code 1 -
Run the aoti runner with CUDA using stories / test-runner-aot-cuda / linux-job (gh)
RuntimeError: Command docker exec -t c57d3c27e1b984946c0a4c87202710bd6f03f0bed386b3b994f59377c7daebde /exec failed with exit code 1
CANCELLED JOBS - The following jobs were cancelled. Please retry:
-
pull / runner-aoti (16-core-ubuntu) (gh)
##[error]The operation was canceled. -
pull / test-tinystories-executorch (16-core-ubuntu) (gh)
##[error]The operation was canceled.
This comment was automatically generated by Dr. CI and updates every 15 minutes.
Thanks for confirming!
I'm planning to move the distributed code close to last so that it doesn't affect y'all as much, so it won't land for a few days/next week