Jiajie Li

Results 12 issues of Jiajie Li

**Environment:** 1. Framework: PyTorch 2. Framework version: '1.11.0+cu113' 3. Horovod version: 0.24.3 4. MPI version: - 5. CUDA version: - 6. NCCL version: - 7. Python version: 3.7 8. Spark...

bug
ray

Elasticity - the execution of placement groups are pending tasks that will be scheduled by GCS when resources become available. Related PR: #572 Test plan: Mock cluster scaling with `ray.cluster_utils`.

CLA Signed

Two features for Elastic Distributed Training are added to job launched by TorchX on Ray Cluster in this PR: 1. Fault Tolerance - Node failure throws RayActorError which can be...

CLA Signed

## Description Support elastic training on Ray Cluster. ## Motivation/Background Training can tolerate node failures. The number of worker nodes can expand as the size of the cluster grows. ##...

enhancement
ray

## 🐛 Bug Hi, I would like to know the current state of running elastic training on ray clusters. I tried to repeat some experiments([notebook](https://colab.research.google.com/drive/1vVCpgQ9z_1SN8K9CJxUT2LtvUDN0AlND?usp=sharing)) in this [blog](https://www.anyscale.com/blog/large-scale-distributed-training-with-torchx-and-ray) on my...

question
ray

Ray 2.0 made get_address_and_port no longer public + added a bunch of pyre changes. We need to find a replacement for get_address_and_port, fix the types and then upgrade to Ray...

CLA Signed

I was unable to run any pyre command with pyre-extensions==0.0.21, but when I upgrade to 0.0.29, it works. Dependencies are installed with `pip install -e '.[dev]'`. When use `pyre` It...

CLA Signed

The models in diffusers outputs sample directly, while models in guided diffusion outputs mean and variance. Relative issue https://github.com/openai/guided-diffusion/issues/93#issue-1584632267

I realize that the pre-trained model in this repository outputs mean and variance, but the models in [diffusers](https://github.com/huggingface/diffusers) output samples directly. Relative issue https://github.com/huggingface/diffusers/issues/2354.

I have modified the files under web/libs/editor/src/common, and expect to see some changes on the web. Following the README files, what I have tried is to: 1. Rerun ```shell python...