DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

[BUG] AttributeError: 'PipelineEngine' object has no attribute '__flops__'

Open yerimChoi opened this issue 2 years ago • 2 comments

I've tried to use Flops Profiler on running Megatron-LM-DeepSpeed in model training workflow, (https://github.com/microsoft/Megatron-DeepSpeed) outside the DeepSpeed Runtime as shown below in https://www.deepspeed.ai/tutorials/flops-profiler/#example-megatron-lm

I defined Flopsprofiler as prof = FlopsProfiler(model[0]) and do profiling with

        if iteration == profile_step and mpu.get_data_parallel_rank() == 0:
            prof.start_profile()

        ... [training] ...

        if iteration == profile_step and mpu.get_data_parallel_rank() == 0:
            prof.stop_profile()
            flops = prof.get_total_flops()
            macs = prof.get_total_macs()
            params = prof.get_total_params()
            if print_profile:
                prof.print_model_profile(profile_step=profile_step)
            prof.end_profile()

At profile step, the error is shown as:


Traceback (most recent call last):
  File "~/Megatron-DeepSpeed-main/script/../pretrain_gpt.py", line 326, in <module>
    pretrain(train_valid_test_datasets_provider, model_provider, forward_step,
  File "~/Megatron-DeepSpeed-main/megatron/training.py", line 189, in pretrain
    iteration = train(forward_step_func,
  File "~/Megatron-DeepSpeed-main/megatron/training.py", line 1116, in train
    flops = prof.get_total_flops()
  File "~/Python-3.9.16_torch1.13.1/lib/python3.9/site-packages/deepspeed/profiling/flops_profiler/profiler.py", line 199, in get_total_flops
    total_flops = get_module_flops(self.model)
  File "~/Python-3.9.16_torch1.13.1/lib/python3.9/site-packages/deepspeed/profiling/flops_profiler/profiler.py", line 1140, in get_module_flops
    sum = module.__flops__
  File "~/Python-3.9.16_torch1.13.1/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 504, in __getattr__
    raise AttributeError(
AttributeError: 'PipelineEngine' object has no attribute '__flops__'

yerimChoi avatar May 15 '23 07:05 yerimChoi

@yerimChoi, can you please provide more details about your code and the software/hardware environment you're running on? thanks.

xiexbing avatar May 30 '23 05:05 xiexbing

Hi @yerimChoi, the flops profiler does not support deepspeed pipeline parallel engine yet, if you use pp_size > 1, disable the flops profiler

cli99 avatar Jun 13 '23 16:06 cli99