[Ray data] [stable diffussion batch inference] cpu resources in cluster cannot be fully utilized when running stable diffusion batch inference task.
What happened + What you expected to happen
Hi, i want to use the cluster's cpu resources to run the stable diffusion inference demo. i do not have GPUs. I thought through the ray framework, the cpus can also be used to execute some inference task.
I got two wsls to set up the ray cluster. wsl A has 12 cpus and is as the head node. wsl B has 12 cpus and is as the worker node. So run 'ray status' command, it shows: ======== Autoscaler status: 2024-03-22 00:59:14.244899 ======== Node status Active: 1 node_88349db0fa0ccd3086db2f5a4c79ab9a527acb4aca4c023cb8120c8b 1 node_5cb133607c13b47fa48631b86114996f49a7ced083a5bcbeafbc20b8 Pending: (no pending nodes) Recent failures: (no failures)
Resources Usage: 0.0/24.0 CPU 0B/43.54GiB memory 0B/21.04GiB object_store_memory
Demands: (no resource demands)
Then i run the stable diffusion batch inference demo, and set the pipe and device parameters to 'cpu' as below script shows. Then i set the num_cpus=16. In my opinion, the ray cluster may use the 16/24 cpus to run the task. However , it raise the error:
(autoscaler +6s) Error: No available node types can fulfill resource request {'CPU': 16.0}. Add suitable node types to this cluster to resolve this issue.
Only when i set the num_cpus <= 12 (the original wsl A's total cpu num), it will work and only one of the two worker will execute the task.
I saw the document says, the num_cpus is the number of CPUs to reserve for each parallel map worker and the concurrency is the number of ray workers to use concurrently. So i try to set the concurrency=2 and the num_cpus=8, i thought 2*8=16 cpus may work. However, when executing the inference process, the error occurred again.
So my point is ,how can i make use of the cpu resources in the cluster to execute one inference task?
Versions / Dependencies
ray 2.9.3 python3.10.12 wsl2
Reproduction script
model_id = "stabilityai/stable-diffusion-2-1" prompt = "a photo of an astronaut riding a horse on mars"
import ray import ray.data import pandas as pd
ds = ray.data.from_pandas(pd.DataFrame([prompt], columns=['prompt']))
class PredictCallable: def init(self, model_id: str, revision: str = None): from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler import torch
self.pipe = StableDiffusionPipeline.from_pretrained(
model_id, torch_dtype=torch.float
)
self.pipe.scheduler = DPMSolverMultistepScheduler.from_config(
self.pipe.scheduler.config
)
self.pipe = self.pipe.to("cpu")
def call(self, batch: pd.DataFrame) -> pd.DataFrame: import torch import numpy as np
# Set a different seed for every image in batch
self.pipe.generator = [
torch.Generator(device="cpu").manual_seed(i) for i in range(len(batch))
]
images = self.pipe(list(batch["prompt"])).images
return {"images": np.array(images, dtype=object)}
preds = ds.map_batches( PredictCallable, fn_constructor_kwargs=dict(model_id=model_id), concurrency=1, num_cpus=16, batch_size=1, batch_format='pandas' )
results = preds.take_all()
Issue Severity
High: It blocks me from completing my task.