xtuner icon indicating copy to clipboard operation
xtuner copied to clipboard

[bug] The datasets can't load successfully when using two nodes in slurm.

Open amulil opened this issue 2 years ago • 2 comments

# reproduce
srun -p debug --job-name=xtuner --nodes=2 --gres=gpu:8 --ntasks-per-node=8 --kill-on-bad-exit=1 xtuner train yi_34b_qlora_oasst1_e3_gpu16 --deepspeed deepspeed_zero2 --launcher slurm
# loginfo
File "/data/miniconda3/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 2261, in broadcast_object_list
    object_tensor = torch.empty(  # type: ignore[call-overload]
TypeError: empty(): argument 'size' must be tuple of ints, but found element of type int at pos 1 
# env info
absl-py                       2.0.0
accelerate                    0.24.1
addict                        2.4.0
aiofiles                      23.1.0
aiohttp                       3.8.4
aiosignal                     1.3.1
aliyun-python-sdk-core        2.14.0
aliyun-python-sdk-kms         2.16.2
altair                        5.0.1
anyio                         3.7.0
appdirs                       1.4.4
argon2-cffi                   21.3.0
argon2-cffi-bindings          21.2.0
arrow                         1.2.3
asttokens                     2.2.1
async-lru                     2.0.2
async-timeout                 4.0.2
attrs                         23.1.0
Babel                         2.12.1
backcall                      0.2.0
beautifulsoup4                4.12.2
bitsandbytes                  0.41.1
bleach                        6.0.0
blinker                       1.6.2
brotlipy                      0.7.0
cachetools                    5.3.1
certifi                       2022.12.7
cffi                          1.15.1
charset-normalizer            2.0.4
click                         8.1.3
cmake                         3.26.3
comm                          0.1.3
conda                         23.1.0
conda-content-trust           0.1.3
conda-package-handling        2.0.2
conda_package_streaming       0.7.0
contourpy                     1.1.0
cpm-kernels                   1.0.11
crcmod                        1.7
cryptography                  38.0.4
cycler                        0.11.0
datasets                      2.12.0
debugpy                       1.6.7
decorator                     5.1.1
deepspeed                     0.12.3
defusedxml                    0.7.1
dill                          0.3.6
distro                        1.8.0
docker-pycreds                0.4.0
einops                        0.7.0
exceptiongroup                1.1.1
executing                     1.2.0
fastapi                       0.98.0
fastjsonschema                2.17.1
ffmpy                         0.3.0
filelock                      3.12.0
flash-attn                    2.3.3
Flask                         2.3.2
fonttools                     4.40.0
fqdn                          1.5.1
frozenlist                    1.3.3
fsspec                        2023.6.0
func-timeout                  4.3.5
future                        0.18.3
gast                          0.5.4
gitdb                         4.0.10
GitPython                     3.1.31
google-auth                   2.23.4
google-auth-oauthlib          1.1.0
gradio                        3.35.2
gradio_client                 0.2.7
grpcio                        1.59.2
h11                           0.14.0
hjson                         3.1.0
httpcore                      0.17.2
httpx                         0.24.1
huggingface-hub               0.17.3
idna                          3.4
importlib-metadata            6.7.0
ipykernel                     6.23.1
ipython                       8.14.0
isoduration                   20.11.0
itsdangerous                  2.1.2
jedi                          0.18.2
Jinja2                        3.1.2
jmespath                      0.10.0
json5                         0.9.14
jsonpointer                   2.3
jsonschema                    4.17.3
jupyter_client                8.2.0
jupyter_core                  5.3.0
jupyter-events                0.6.3
jupyter-lsp                   2.2.0
jupyter_server                2.6.0
jupyter_server_terminals      0.4.4
jupyterlab                    4.0.1
jupyterlab-pygments           0.2.2
jupyterlab_server             2.22.1
kiwisolver                    1.4.4
lagent                        0.1.2
latex2mathml                  3.76.0
linkify-it-py                 2.0.2
lit                           16.0.3
Markdown                      3.4.3
markdown-it-py                2.2.0
MarkupSafe                    2.1.2
matplotlib                    3.7.1
matplotlib-inline             0.1.6
mdit-py-plugins               0.3.3
mdtex2html                    1.2.0
mdurl                         0.1.2
mistune                       2.0.5
mmengine                      0.9.1
modelscope                    1.9.4
mpi4py-mpich                  3.1.2
mpmath                        1.3.0
multidict                     6.0.4
multiprocess                  0.70.14
nbclient                      0.8.0
nbconvert                     7.4.0
nbformat                      5.9.0
nest-asyncio                  1.5.6
networkx                      3.1
ninja                         1.11.1
notebook_shim                 0.2.3
numpy                         1.24.3
nvidia-cublas-cu11            11.10.3.66
nvidia-cuda-cupti-cu11        11.7.101
nvidia-cuda-nvrtc-cu11        11.7.99
nvidia-cuda-runtime-cu11      11.7.99
nvidia-cudnn-cu11             8.5.0.96
nvidia-cufft-cu11             10.9.0.58
nvidia-curand-cu11            10.2.10.91
nvidia-cusolver-cu11          11.4.0.1
nvidia-cusparse-cu11          11.7.4.91
nvidia-nccl-cu11              2.14.3
nvidia-nvtx-cu11              11.7.91
oauthlib                      3.2.2
opencv-python                 4.8.1.78
orjson                        3.9.1
oss2                          2.18.3
overrides                     7.3.1
packaging                     23.1
pandas                        2.0.1
pandocfilters                 1.5.0
parso                         0.8.3
pathtools                     0.1.2
peft                          0.6.0
pexpect                       4.8.0
pickleshare                   0.7.5
Pillow                        9.5.0
pip                           22.3.1
platformdirs                  3.5.1
pluggy                        1.0.0
prometheus-client             0.17.0
prompt-toolkit                3.0.38
protobuf                      3.20.3
psutil                        5.9.5
ptyprocess                    0.7.0
pure-eval                     0.2.2
py-cpuinfo                    9.0.0
pyarrow                       12.0.0
pyasn1                        0.5.0
pyasn1-modules                0.3.0
pycosat                       0.6.4
pycparser                     2.21
pycryptodome                  3.19.0
pydantic                      1.10.7
pydeck                        0.8.1b0
pydub                         0.25.1
Pygments                      2.15.1
Pympler                       1.0.1
pynvml                        11.5.0
pyOpenSSL                     22.0.0
pyparsing                     3.1.0
pyrsistent                    0.19.3
PySocks                       1.7.1
python-dateutil               2.8.2
python-json-logger            2.0.7
python-multipart              0.0.6
pytz                          2023.3
pytz-deprecation-shim         0.1.0.post0
PyYAML                        6.0
pyzmq                         25.1.0
regex                         2023.5.5
requests                      2.28.1
requests-oauthlib             1.3.1
responses                     0.18.0
rfc3339-validator             0.1.4
rfc3986-validator             0.1.1
rich                          13.4.2
rsa                           4.9
ruamel.yaml                   0.17.21
ruamel.yaml.clib              0.2.6
safetensors                   0.3.1
scipy                         1.11.3
semantic-version              2.10.0
Send2Trash                    1.8.2
sentencepiece                 0.1.99
sentry-sdk                    1.25.1
setproctitle                  1.3.2
setuptools                    65.6.3
simplejson                    3.19.2
six                           1.16.0
smmap                         5.0.0
sniffio                       1.3.0
sortedcontainers              2.4.0
soupsieve                     2.4.1
stack-data                    0.6.2
starlette                     0.27.0
streamlit                     1.24.0
streamlit-chat                0.1.1
sympy                         1.12
tenacity                      8.2.2
tensorboard                   2.15.1
tensorboard-data-server       0.7.2
termcolor                     2.3.0
terminado                     0.17.1
tiktoken                      0.5.1
tinycss2                      1.2.1
tokenizers                    0.14.1
toml                          0.10.2
tomli                         2.0.1
toolz                         0.12.0
torch                         2.0.1
tornado                       6.3.2
tqdm                          4.64.1
traitlets                     5.9.0
transformers                  4.34.0
transformers-stream-generator 0.0.4
triton                        2.0.0
typing_extensions             4.5.0
tzdata                        2023.3
tzlocal                       4.3.1
uc-micro-py                   1.0.2
uri-template                  1.2.0
urllib3                       1.26.14
uvicorn                       0.22.0
validators                    0.20.0
wandb                         0.15.4
watchdog                      3.0.0
wcwidth                       0.2.6
webcolors                     1.13
webencodings                  0.5.1
websocket-client              1.5.2
websockets                    11.0.3
Werkzeug                      2.3.6
wheel                         0.37.1
xtuner                        0.1.9     
xxhash                        3.2.0
yapf                          0.40.2
yarl                          1.9.2
zipp                          3.15.0
zstandard                     0.18.0

amulil avatar Nov 21 '23 07:11 amulil

Please complete the git commit id so that we can reproduce it.

pppppM avatar Nov 21 '23 07:11 pppppM

Please complete the git commit id so that we can reproduce it.

@pppppM

commit 6892d65ab93184024c561a4fdc0d5653a8f77299
Author: Zhihao Lin <[email protected]>
Date:   Mon Nov 20 14:46:45 2023 +0800

    [Fix] Fix bugs of llama dispatch (#229)
    
    * fix bugs
    
    * fix

amulil avatar Nov 21 '23 07:11 amulil