linyubupa comments

Results 11 comments of


                                            linyubupa

[BUG] deepspeed_stage_3 was used in pytorch_lightning。when initialize, it cost huge cpu memory which increase with the grow of gpu num。

the pytorch-lightning code was used : `trainer = Trainer( max_epochs=1, devices=args.num_devices, precision=16, strategy="deepspeed_stage_3", accelerator='gpu', num_nodes=args.num_nodes, limit_val_batches=0, # 添加plugins plugins=plugins, # 添加log和profile logger=lighting_logger, profiler=profiler, # 添加callback callbacks=callbacks, # 关闭官方进度条 enable_progress_bar=False )...

[BUG] deepspeed_stage_3 was used in pytorch_lightning。when initialize, it cost huge cpu memory which increase with the grow of gpu num。

the amount of cpu memory used = gpu_number * 2 * model_size

[BUG] deepspeed_stage_3 was used in pytorch_lightning。when initialize, it cost huge cpu memory which increase with the grow of gpu num。

> Hi @linyubupa, could you describe more details about reproducing this issue? Especially how you measured _cpu memory used_ and _model_size_ """ #this is the code that i used: """...

[BUG] deepspeed_stage_3 was used in pytorch_lightning。when initialize, it cost huge cpu memory which increase with the grow of gpu num。

i think the most cause of this result is `mp.spawn(main_worker, args=(sys.argv,), nprocs=gpu_count, join=True)`

[BUG] deepspeed_stage_3 was used in pytorch_lightning。when initialize, it cost huge cpu memory which increase with the grow of gpu num。

and this is the config of ds ` { "train_batch_size": "auto", "fp16": { "enabled": true, "min_loss_scale": 1, "opt_level": "auto" }, "zero_optimization": { "stage": 3, "allgather_partitions": true, "allgather_bucket_size": 5e8, "contiguous_gradients": true...

[BUG] deepspeed_stage_3 was used in pytorch_lightning。when initialize, it cost huge cpu memory which increase with the grow of gpu num。

> Hi @linyubupa, could you describe more details about reproducing this issue? Especially how you measured _cpu memory used_ and _model_size_ I measured cpu memory by using aistudio tools which...

when using huggingface pretrained model with multi-gpu, model parameters were duplicate for every gpu in ram

if you got multi gpu, the cost cpu memory = 2 * model_size* gpu_numbers

when using huggingface pretrained model with multi-gpu, model parameters were duplicate for every gpu in ram

> ```python > def configure_sharded_model(self): > ``` sorry for late reply，I build up model in configure_sharded_model , but the cpu memory still cost amountly

when using huggingface pretrained model with multi-gpu, model parameters were duplicate for every gpu in ram

I solve this by using deepspeed init with transformers trainer : https://huggingface.co/docs/transformers/main_classes/deepspeed 、、、 deepspeed --num_gpus 8 --num_nodes 2 --hostfile hostfile --master_addr hostname1 --master_port=9901 \ your_program.py --deepspeed ds_config.json 、、、

能提供docker镜像吗

ERROR: failed to solve: process "/bin/sh -c mim install mmcv==2.0.0" did not complete successfully: exit code: 1