Description
warnings.warn(
11 [2023-07-05 15:12:54,815] [INFO] [logger.py:85:log_dist] [Rank -1] Unsupported bmtrain
12 [2023-07-05 15:12:55,201] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
13 Traceback (most recent call last):
14 File "/root/FlagAI/examples/Aquila/aquila_pretrain.py", line 16, in
15 from flagai.data.dataset.indexed_dataset.build_index_mappings import _build_train_valid_test_datasets, build_train_valid_test_weighted_datasets
16 File "/root/miniconda3/lib/python3.10/site-packages/flagai-1.7.3-py3.10.egg/flagai/data/dataset/indexed_dataset/build_index_mappings.py", line 25, in
17 from megatron.data.dataset_utils import get_train_valid_test_split
18 ModuleNotFoundError: No module named 'megatron.data'
19 ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 29695) of binary: /root/miniconda3/bin/python
20 Traceback (most recent call last):
21 File "/root/miniconda3/lib/python3.10/runpy.py", line 196, in _run_module_as_main
22 return _run_code(code, main_globals, None,
23 File "/root/miniconda3/lib/python3.10/runpy.py", line 86, in _run_code
24 exec(code, run_globals)
25 File "/root/miniconda3/lib/python3.10/site-packages/torch-2.0.1-py3.10-linux-x86_64.egg/torch/distributed/launch.py", line 196, in
26 main()
27 File "/root/miniconda3/lib/python3.10/site-packages/torch-2.0.1-py3.10-linux-x86_64.egg/torch/distributed/launch.py", line 192, in main
28 launch(args)
29 File "/root/miniconda3/lib/python3.10/site-packages/torch-2.0.1-py3.10-linux-x86_64.egg/torch/distributed/launch.py", line 177, in launch
30 run(args)
31 File "/root/miniconda3/lib/python3.10/site-packages/torch-2.0.1-py3.10-linux-x86_64.egg/torch/distributed/run.py", line 785, in run
32 elastic_launch(
33 File "/root/miniconda3/lib/python3.10/site-packages/torch-2.0.1-py3.10-linux-x86_64.egg/torch/distributed/launcher/api.py", line 134, in call
34 return launch_agent(self._config, self._entrypoint, list(args))
35 File "/root/miniconda3/lib/python3.10/site-packages/torch-2.0.1-py3.10-linux-x86_64.egg/torch/distributed/launcher/api.py", line 250, in launch_agent
36 raise ChildFailedError(
37 torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
38 ============================================================
Alternatives
No response
试了很多种方式,包括git clone --branch fairseq_v3 https://github.com/ngoyal2707/Megatron-LM.git, 还是会报这个错
试了很多种方式,包括git clone --branch fairseq_v3 https://github.com/ngoyal2707/Megatron-LM.git, 还是会报这个错
需要安装这个Megatron-LM