HelloWorldBeginner issues

Results 10 issues of


                                            HelloWorldBeginner

I train controlnet_sdxl in bf16 datatype, got unsupported ERROR in datasets

### Describe the bug ``` Traceback (most recent call last): File "train_controlnet_sdxl.py", line 1252, in main(args) File "train_controlnet_sdxl.py", line 1013, in main train_dataset = train_dataset.map(compute_embeddings_fn, batched=True, new_fingerprint=new_fingerprint) File "/home/miniconda3/envs/mhh_df/lib/python3.8/site-packages/datasets/arrow_dataset.py", line...

Add Ascend NPU support for SDXL fine-tuning and fix the model saving bug when using DeepSpeed.

# What does this PR do? Added support for SDXL finetune on AscendNPU and fixed the bug causing the hang out when saving models using the deepspeed distributed framework. DeepSpeed...

Add NPU support for Llava

[NPU] Support Llava training and inference for Ascend NPU. I've modified some codes to add support for NPU, allowing LLAVA to perform both training and inference on NPU. It works...

Add Ascend NPU support for SDXL and fix bugs

# What does this PR do? 1. Adds NPU flash attention support for NPU, similar to #7816. 2. Fixes a bug related to saving the model when using deepspeed, also...

I use image_finetune.yaml to fine-tune the model, and I found that the first step of each epoch would get stuck for 5 minutes

I changed the train_step parameter inside image_finetune.yaml to 2000 steps, which will train multiple epochs, but the machine gets stuck for five minutes at the start of each epoch. ![image](https://github.com/guoyww/AnimateDiff/assets/25008090/966a43d6-6e14-4a8d-82be-559e47e01b04)...

Add AscendNPU support for Animatediff

I've modified some codes to add support for Ascend NPU, allowing Animatediff to train and inference on NPU. It works fine on NPU. NPU training ![image](https://github.com/guoyww/AnimateDiff/assets/25008090/f42a7686-73f5-480d-8825-3dee1da993cc) NPU Inference ![image](https://github.com/guoyww/AnimateDiff/assets/25008090/60bb4aa1-9b6e-4b93-af21-1d26738754d0)

[Question] Do you want to support Ascend NPU?

### Question LLava is a great work, I have adapted llava to Ascend NPU hardware, enabling pre-training, inference, and evaluation on the Ascend NPU. I'm wondering if NPU is also...

when I use async vllm rollout in dapo get ERROR

add **actor_rollout_ref.rollout.mode="async"** in recipe/dapo/run_dapo_qwen2.5_32b.sh get error ``` [36m(AsyncvLLMServer pid=361610)[0m instance_id: 6f66dda9-3270-44cf-823e-6bbb7a51c151:Hotrws:1:0 initializes with external actors: ['HotrwsWorkerDict_0:0'] Error executing job with overrides: ['data.train_files=/home//0723/data/dapo-math-17k.parquet', 'data.val_files=/home//0723/data/aime-2024.parquet', 'data.prompt_key=prompt', 'data.truncation=left', 'data.max_prompt_length=2048', 'data.max_response_length=2048', 'data.gen_batch_size=6', 'data.train_batch_size=2', 'actor_rollout_ref.rollout.n=16',...

[rollout] partial rollout

### What does this PR do? We conducted tests on the partial rollout feature using the DAPO algorithm with the qwen3-0.6B and qwen2.5-7B models. The blue curve represents the scenario...

[RFC]partial rollout

# Motivation During the reinforcement learning training process, as the model performance continues to improve, the output response sequences also keep lengthening — especially in the slow thinking mode, the...