DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

Error calling `_initialize_deepspeed_train`: attempted relative import beyond top-level package

Open griff4692 opened this issue 2 years ago • 1 comments

I am using pytorch-lightning with deepspeed and getting the following error. Any ideas on how to fix? Thanks a lot!

Relevant Versions deepspeed==0.8.1 pytorch-lightning==1.9.4

ValueError: attempted relative import beyond top-level package
Traceback (most recent call last):
  File "main.py", line 127, in <module>
    run(args)
  File "main.py", line 91, in run
    trainer.fit(model, datamodule=datamodule, ckpt_path=args.ckpt_path)
  File "/home/griffin/bhc/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 608, in fit
    call._call_and_handle_interrupt(
  File "/home/griffin/bhc/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 36, in _call_and_handle_interrupt
    return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
  File "/home/griffin/bhc/lib/python3.8/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 88, in launch
    return function(*args, **kwargs)
  File "/home/griffin/bhc/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 650, in _fit_impl
    self._run(model, ckpt_path=self.ckpt_path)
  File "/home/griffin/bhc/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1093, in _run
    self.strategy.setup(self)
  File "/home/griffin/bhc/lib/python3.8/site-packages/pytorch_lightning/strategies/deepspeed.py", line 345, in setup
    self.init_deepspeed()
  File "/home/griffin/bhc/lib/python3.8/site-packages/pytorch_lightning/strategies/deepspeed.py", line 456, in init_deepspeed
    self._initialize_deepspeed_train(model)
  File "/home/griffin/bhc/lib/python3.8/site-packages/pytorch_lightning/strategies/deepspeed.py", line 493, in _initialize_deepspeed_train
    model, deepspeed_optimizer = self._setup_model_and_optimizer(model, optimizer, scheduler)
  File "/home/griffin/bhc/lib/python3.8/site-packages/pytorch_lightning/strategies/deepspeed.py", line 414, in _setup_model_and_optimizer
    deepspeed_engine, deepspeed_optimizer, _, _ = deepspeed.initialize(
  File "/home/griffin/bhc/lib/python3.8/site-packages/deepspeed/__init__.py", line 125, in initialize
    engine = DeepSpeedEngine(args=args,
  File "/home/griffin/bhc/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 336, in __init__
    self._configure_optimizer(optimizer, model_parameters)
  File "/home/griffin/bhc/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1292, in _configure_optimizer
    self.optimizer = self._configure_zero_optimizer(basic_optimizer)
  File "/home/griffin/bhc/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1542, in _configure_zero_optimizer
    optimizer = DeepSpeedZeroOptimizer(
  File "/home/griffin/bhc/lib/python3.8/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 165, in __init__
    util_ops = UtilsBuilder().load()
  File "/home/griffin/.local/lib/python3.8/site-packages/op_builder/builder.py", line 230, in load
    from ...git_version_info import installed_ops, torch_info
ValueError: attempted relative import beyond top-level package

griff4692 avatar Mar 03 '23 20:03 griff4692

Hi @griff4692, your deepspeed repo structure looks odd.

The line that throws error File "/home/griffin/.local/lib/python3.8/site-packages/op_builder/builder.py", line 230, in load

Should not it be in /home/griffin/.local/lib/python3.8/site-packages/deepspeed/ops/op_builder/builder.py instead?

My guess is your installation is corrupted. Maybe due to your old installation not being removed?

ShijieZZZZ avatar Mar 10 '23 02:03 ShijieZZZZ

Hi @griff4692, I will close this issue for now. Feel free to re-open if you're still seeing it.

ShijieZZZZ avatar Mar 17 '23 18:03 ShijieZZZZ