微调 RuntimeError: CUDA error: invalid device ordinal
(llm-embedder) root@autodl-container-4eab48a812-48d634d3:~/autodl-tmp/FlagEmbedding-master/FlagEmbedding/llm_embedder# CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=8 run_dense.py --output_dir data/outputs/nq --train_data llm-embedder:qa/train.json --eval_data llm-embedder:qa/test.json --corpus llm-embedder:qa/corpus.json --metrics nq --key_max_length 128 --query_max_length 32 --contrastive_weight 0 --stable_distill --eval_steps 2000 --save_steps 2000 --max_steps 2000 --data_root /data/llm-embedder
01/13/2024 19:10:17 - INFO - faiss.loader - Loading faiss with AVX2 support.
01/13/2024 19:10:17 - INFO - faiss.loader - Loading faiss with AVX2 support.
01/13/2024 19:10:17 - INFO - faiss.loader - Successfully loaded faiss with AVX2 support.
01/13/2024 19:10:17 - INFO - faiss.loader - Successfully loaded faiss with AVX2 support.
01/13/2024 19:10:17 - INFO - faiss.loader - Loading faiss with AVX2 support.
01/13/2024 19:10:17 - INFO - faiss.loader - Loading faiss with AVX2 support.
01/13/2024 19:10:17 - INFO - faiss.loader - Successfully loaded faiss with AVX2 support.
01/13/2024 19:10:17 - INFO - faiss.loader - Successfully loaded faiss with AVX2 support.
01/13/2024 19:10:17 - INFO - faiss.loader - Loading faiss with AVX2 support.
01/13/2024 19:10:17 - INFO - faiss.loader - Successfully loaded faiss with AVX2 support.
01/13/2024 19:10:17 - INFO - faiss.loader - Loading faiss with AVX2 support.
01/13/2024 19:10:17 - INFO - faiss.loader - Successfully loaded faiss with AVX2 support.
01/13/2024 19:10:17 - INFO - faiss.loader - Loading faiss with AVX2 support.
01/13/2024 19:10:17 - INFO - faiss.loader - Loading faiss with AVX2 support.
01/13/2024 19:10:17 - INFO - faiss.loader - Successfully loaded faiss with AVX2 support.
01/13/2024 19:10:17 - INFO - faiss.loader - Successfully loaded faiss with AVX2 support.
Traceback (most recent call last):
File "/root/autodl-tmp/FlagEmbedding-master/FlagEmbedding/llm_embedder/run_dense.py", line 157, in TORCH_USE_CUDA_DSA to enable device-side assertions.
我是在租的服务器上运行的,cuda版本是11.8,只租了一块GPU,我是准备换数据进行微调的,能帮忙看一下是哪里出错了么
Hi, 尝试设置--nproc_per_node 1,因为你只用一个gpu