vall-e icon indicating copy to clipboard operation
vall-e copied to clipboard

AttributeError: 'NoneType' object has no attribute 'fp16_enabled'

Open TrupsT opened this issue 1 year ago • 5 comments

Unable to train, getting the following error.

/opt/conda/lib/python3.10/site-packages/torch/utils/data/dataloader.py:557: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 4, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary. warnings.warn(_create_warning_msg( Traceback (most recent call last): File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/kaggle/working/vall-e/vall_e/train.py", line 128, in main() File "/kaggle/working/vall-e/vall_e/train.py", line 119, in main trainer.train( File "/kaggle/working/vall-e/vall_e/utils/trainer.py", line 125, in train engines = engines_loader() File "/kaggle/working/vall-e/vall_e/train.py", line 21, in load_engines model=trainer.Engine( File "/kaggle/working/vall-e/vall_e/utils/engines.py", line 22, in init super().init(None, *args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 240, in init self._do_sanity_check() File "/opt/conda/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1039, in _do_sanity_check if self.fp16_enabled() and not get_accelerator().is_fp16_supported(): File "/opt/conda/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 796, in fp16_enabled return self._config.fp16_enabled AttributeError: 'NoneType' object has no attribute 'fp16_enabled' [2024-05-23 10:34:39,680] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] async_io: please install the libaio-dev package with apt [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH [WARNING] NVIDIA Inference is only supported on Ampere and newer architectures [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1 [WARNING] please install triton==1.0.0 if you want to use sparse attention 1222it [00:00, 62626.79it/s] /opt/conda/lib/python3.10/site-packages/torch/utils/data/dataloader.py:557: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 4, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary. warnings.warn(_create_warning_msg( Traceback (most recent call last): File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/kaggle/working/vall-e/vall_e/train.py", line 128, in main() File "/kaggle/working/vall-e/vall_e/train.py", line 119, in main trainer.train( File "/kaggle/working/vall-e/vall_e/utils/trainer.py", line 125, in train engines = engines_loader() File "/kaggle/working/vall-e/vall_e/train.py", line 21, in load_engines model=trainer.Engine( File "/kaggle/working/vall-e/vall_e/utils/engines.py", line 22, in init super().init(None, *args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 240, in init self._do_sanity_check() File "/opt/conda/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1039, in _do_sanity_check if self.fp16_enabled() and not get_accelerator().is_fp16_supported(): File "/opt/conda/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 796, in fp16_enabled return self._config.fp16_enabled AttributeError: 'NoneType' object has no attribute 'fp16_enabled'

TrupsT avatar May 28 '24 10:05 TrupsT

Could be related to https://github.com/enhuiz/vall-e/issues/94

What version of deep speed do you have? Downgrading to deepspeed==0.8.3 seemed to work for some people in the linked issue.

ljuvela avatar Jun 19 '24 12:06 ljuvela

Might be that the config is not being correctly passed to the DeepSpeedEngine base class here after some change in the DeepSpeed API.

ljuvela avatar Jun 19 '24 12:06 ljuvela

Oh there's even a (long) pending PR https://github.com/enhuiz/vall-e/pull/92 that might fix the config issue with newer DeepSpeed versions.

ljuvela avatar Jun 19 '24 12:06 ljuvela

Oh there's even a (long) pending PR #92 that might fix the config issue with newer DeepSpeed versions.

Thank you! My workaround is downgrading deepspeed to 0.8.3 as in #92 which is easiest for me. And I found that before downgrading it is preferable to downgrade these packages first: pip install pydantic==1.10.9 pip install numpy==1.26.4

PussyCat0700 avatar Dec 27 '24 12:12 PussyCat0700

您好,您的邮件我已经收到,会尽快处理,谢谢!中建八局设计管理总院 胡伟

hoowee avatar Dec 27 '24 12:12 hoowee