Results 11 comments of youngrok cha

@ArthurZucker Oh you are right. Thanks.

I hope this feature being added any time soon!

Maybe WHISPER__DEVICE_INDEX or WHISPER__NUM_WORKERS could work?

It worked but looks like it is not tensor parallelized but just multiple model loaded on each GPU. Am I right?

Partitioned parameters are updated only when ds_secondary_partition_tensor is None by this commit (https://github.com/deepspeedai/DeepSpeed/pull/4906). And ds_secondary_partition_tensors only become None after optimizer.step function is called (that function contains logic that invalidate secondary...

I'm not sure because this logic is a bit complicated, but IMO, while HfArgumentParser.parse_args_into_dataclasses is executed deepspeed zero3 is enabled by this (https://github.com/huggingface/transformers/blob/v4.51.2/src/transformers/training_args.py#L2046). And while load model by from_pretrained method,...

This is a minimal reproducing code I can make. Running this code with the command at the bottom can reproduce the issue I encountered :) # deepspeed_init.py ```code: python from...