zfr00

Results 9 comments of zfr00

Did you get the dataset, I can't download it.

I didn't find an official guide to the test set in the dataset

你好,请问您解决了嘛

dataset设置为### dataset dataset: structured_training_data # video: mllm_video_demo buffer_size: 128 preprocessing_batch_size: 64 streaming: true accelerator_config: dispatch_batches: false max_steps: 4000 template: qwen2_vl

Have you solved the problem, please?

那请问AOTU_SPLIT这个参数怎么设置呢,我在将原来的python换成torchrun --nproc_per_node=2会oom

我的sh脚本是这样的: export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 export AUTO_SPLIT=1 torchrun --nproc_per_node=4 run.py --data "$data" --model Qwen2.5-VL-72B-Instruct --verbose --mode infer --reuse,看起来只用了前四张卡,没有像python那样进行模型拆分

很抱歉,这样的代码也会报错 export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 export AUTO_SPLIT=1 torchrun --nproc_per_node=2 run.py --data "$data" --model Qwen2.5-VL-72B-Instruct --verbose --mode infer --reuse AUTO_SPLIT的设置只能为1嘛