为啥会出现numpy的问题呢
(CogAgent) (.conda) (base) wpg@node7gpu:/workspace/kkkjr/Item/CogVLM/basic_demo$ torchrun --standalone --nnodes=1 --nproc-per-node=2 cli_demo_sat.py --from_pretrained cogagent-chat --version chat --bf16
/home/wpg/.local/lib/python3.10/site-packages/torch/_subclasses/functional_tensor.py:295: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)
cpu = _conversion_method_template(device=torch.device("cpu"))
W1129 05:48:09.364000 64664 torch/distributed/run.py:793]
W1129 05:48:09.364000 64664 torch/distributed/run.py:793] *****************************************
W1129 05:48:09.364000 64664 torch/distributed/run.py:793] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
W1129 05:48:09.364000 64664 torch/distributed/run.py:793] *****************************************
/home/wpg/.local/lib/python3.10/site-packages/torch/_subclasses/functional_tensor.py:295: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)
cpu = _conversion_method_template(device=torch.device("cpu"))
/home/wpg/.local/lib/python3.10/site-packages/torch/_subclasses/functional_tensor.py:295: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)
cpu = _conversion_method_template(device=torch.device("cpu"))
Traceback (most recent call last):
Traceback (most recent call last):
File "/workspace/kkkjr/Item/CogVLM/basic_demo/cli_demo_sat.py", line 7, in
File "/home/wpg/.local/lib/python3.10/site-packages/sat/init.py", line 1, in
File "/home/wpg/.local/lib/python3.10/site-packages/sat/arguments.py", line 23, in
ModuleNotFoundError: No module named 'numpy'
E1129 05:48:11.400000 64664 torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 0 (pid: 64829) of binary: /usr/bin/python3
Traceback (most recent call last):
File "/home/wpg/.local/bin/torchrun", line 8, in
sys.exit(main())
File "/home/wpg/.local/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 355, in wrapper
return f(*args, **kwargs)
File "/home/wpg/.local/lib/python3.10/site-packages/torch/distributed/run.py", line 919, in main
run(args)
File "/home/wpg/.local/lib/python3.10/site-packages/torch/distributed/run.py", line 910, in run
elastic_launch(
File "/home/wpg/.local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 138, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/wpg/.local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
cli_demo_sat.py FAILED
Failures: [1]: time : 2024-11-29_05:48:11 host : node7gpu rank : 1 (local_rank: 1) exitcode : 1 (pid: 64830) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
Root Cause (first observed failure): [0]: time : 2024-11-29_05:48:11 host : node7gpu rank : 0 (local_rank: 0) exitcode : 1 (pid: 64829) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
这里我的已经有了numpy,不知道为啥还是出现这个情况
numpy最高安装1.26.3版本的,不然会出现各种稀奇古怪的错误,版本不能太高。