FlagAI icon indicating copy to clipboard operation
FlagAI copied to clipboard

[Question]: No module named 'megatron.data'

Open smallhackeryifa opened this issue 2 years ago • 3 comments

Description

warnings.warn( 11 [2023-07-05 15:12:54,815] [INFO] [logger.py:85:log_dist] [Rank -1] Unsupported bmtrain 12 [2023-07-05 15:12:55,201] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) 13 Traceback (most recent call last): 14 File "/root/FlagAI/examples/Aquila/aquila_pretrain.py", line 16, in 15 from flagai.data.dataset.indexed_dataset.build_index_mappings import _build_train_valid_test_datasets, build_train_valid_test_weighted_datasets 16 File "/root/miniconda3/lib/python3.10/site-packages/flagai-1.7.3-py3.10.egg/flagai/data/dataset/indexed_dataset/build_index_mappings.py", line 25, in 17 from megatron.data.dataset_utils import get_train_valid_test_split 18 ModuleNotFoundError: No module named 'megatron.data' 19 ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 29695) of binary: /root/miniconda3/bin/python 20 Traceback (most recent call last): 21 File "/root/miniconda3/lib/python3.10/runpy.py", line 196, in _run_module_as_main 22 return _run_code(code, main_globals, None, 23 File "/root/miniconda3/lib/python3.10/runpy.py", line 86, in _run_code 24 exec(code, run_globals) 25 File "/root/miniconda3/lib/python3.10/site-packages/torch-2.0.1-py3.10-linux-x86_64.egg/torch/distributed/launch.py", line 196, in 26 main() 27 File "/root/miniconda3/lib/python3.10/site-packages/torch-2.0.1-py3.10-linux-x86_64.egg/torch/distributed/launch.py", line 192, in main 28 launch(args) 29 File "/root/miniconda3/lib/python3.10/site-packages/torch-2.0.1-py3.10-linux-x86_64.egg/torch/distributed/launch.py", line 177, in launch 30 run(args) 31 File "/root/miniconda3/lib/python3.10/site-packages/torch-2.0.1-py3.10-linux-x86_64.egg/torch/distributed/run.py", line 785, in run 32 elastic_launch( 33 File "/root/miniconda3/lib/python3.10/site-packages/torch-2.0.1-py3.10-linux-x86_64.egg/torch/distributed/launcher/api.py", line 134, in call 34 return launch_agent(self._config, self._entrypoint, list(args)) 35 File "/root/miniconda3/lib/python3.10/site-packages/torch-2.0.1-py3.10-linux-x86_64.egg/torch/distributed/launcher/api.py", line 250, in launch_agent 36 raise ChildFailedError( 37 torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 38 ============================================================

Alternatives

No response

smallhackeryifa avatar Jul 05 '23 07:07 smallhackeryifa

试了很多种方式,包括git clone --branch fairseq_v3 https://github.com/ngoyal2707/Megatron-LM.git, 还是会报这个错

smallhackeryifa avatar Jul 05 '23 07:07 smallhackeryifa

试了很多种方式,包括git clone --branch fairseq_v3 https://github.com/ngoyal2707/Megatron-LM.git, 还是会报这个错

需要安装这个Megatron-LM

ftgreat avatar Jul 09 '23 12:07 ftgreat

已解决

smallhackeryifa avatar Jul 09 '23 12:07 smallhackeryifa

此问题已关闭,如果仍有疑问可以重新打开这个问题

Anhforth avatar Jul 11 '23 02:07 Anhforth