xtuner
xtuner copied to clipboard
[bug] The datasets can't load successfully when using two nodes in slurm.
# reproduce
srun -p debug --job-name=xtuner --nodes=2 --gres=gpu:8 --ntasks-per-node=8 --kill-on-bad-exit=1 xtuner train yi_34b_qlora_oasst1_e3_gpu16 --deepspeed deepspeed_zero2 --launcher slurm
# loginfo
File "/data/miniconda3/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 2261, in broadcast_object_list
object_tensor = torch.empty( # type: ignore[call-overload]
TypeError: empty(): argument 'size' must be tuple of ints, but found element of type int at pos 1
# env info
absl-py 2.0.0
accelerate 0.24.1
addict 2.4.0
aiofiles 23.1.0
aiohttp 3.8.4
aiosignal 1.3.1
aliyun-python-sdk-core 2.14.0
aliyun-python-sdk-kms 2.16.2
altair 5.0.1
anyio 3.7.0
appdirs 1.4.4
argon2-cffi 21.3.0
argon2-cffi-bindings 21.2.0
arrow 1.2.3
asttokens 2.2.1
async-lru 2.0.2
async-timeout 4.0.2
attrs 23.1.0
Babel 2.12.1
backcall 0.2.0
beautifulsoup4 4.12.2
bitsandbytes 0.41.1
bleach 6.0.0
blinker 1.6.2
brotlipy 0.7.0
cachetools 5.3.1
certifi 2022.12.7
cffi 1.15.1
charset-normalizer 2.0.4
click 8.1.3
cmake 3.26.3
comm 0.1.3
conda 23.1.0
conda-content-trust 0.1.3
conda-package-handling 2.0.2
conda_package_streaming 0.7.0
contourpy 1.1.0
cpm-kernels 1.0.11
crcmod 1.7
cryptography 38.0.4
cycler 0.11.0
datasets 2.12.0
debugpy 1.6.7
decorator 5.1.1
deepspeed 0.12.3
defusedxml 0.7.1
dill 0.3.6
distro 1.8.0
docker-pycreds 0.4.0
einops 0.7.0
exceptiongroup 1.1.1
executing 1.2.0
fastapi 0.98.0
fastjsonschema 2.17.1
ffmpy 0.3.0
filelock 3.12.0
flash-attn 2.3.3
Flask 2.3.2
fonttools 4.40.0
fqdn 1.5.1
frozenlist 1.3.3
fsspec 2023.6.0
func-timeout 4.3.5
future 0.18.3
gast 0.5.4
gitdb 4.0.10
GitPython 3.1.31
google-auth 2.23.4
google-auth-oauthlib 1.1.0
gradio 3.35.2
gradio_client 0.2.7
grpcio 1.59.2
h11 0.14.0
hjson 3.1.0
httpcore 0.17.2
httpx 0.24.1
huggingface-hub 0.17.3
idna 3.4
importlib-metadata 6.7.0
ipykernel 6.23.1
ipython 8.14.0
isoduration 20.11.0
itsdangerous 2.1.2
jedi 0.18.2
Jinja2 3.1.2
jmespath 0.10.0
json5 0.9.14
jsonpointer 2.3
jsonschema 4.17.3
jupyter_client 8.2.0
jupyter_core 5.3.0
jupyter-events 0.6.3
jupyter-lsp 2.2.0
jupyter_server 2.6.0
jupyter_server_terminals 0.4.4
jupyterlab 4.0.1
jupyterlab-pygments 0.2.2
jupyterlab_server 2.22.1
kiwisolver 1.4.4
lagent 0.1.2
latex2mathml 3.76.0
linkify-it-py 2.0.2
lit 16.0.3
Markdown 3.4.3
markdown-it-py 2.2.0
MarkupSafe 2.1.2
matplotlib 3.7.1
matplotlib-inline 0.1.6
mdit-py-plugins 0.3.3
mdtex2html 1.2.0
mdurl 0.1.2
mistune 2.0.5
mmengine 0.9.1
modelscope 1.9.4
mpi4py-mpich 3.1.2
mpmath 1.3.0
multidict 6.0.4
multiprocess 0.70.14
nbclient 0.8.0
nbconvert 7.4.0
nbformat 5.9.0
nest-asyncio 1.5.6
networkx 3.1
ninja 1.11.1
notebook_shim 0.2.3
numpy 1.24.3
nvidia-cublas-cu11 11.10.3.66
nvidia-cuda-cupti-cu11 11.7.101
nvidia-cuda-nvrtc-cu11 11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11 8.5.0.96
nvidia-cufft-cu11 10.9.0.58
nvidia-curand-cu11 10.2.10.91
nvidia-cusolver-cu11 11.4.0.1
nvidia-cusparse-cu11 11.7.4.91
nvidia-nccl-cu11 2.14.3
nvidia-nvtx-cu11 11.7.91
oauthlib 3.2.2
opencv-python 4.8.1.78
orjson 3.9.1
oss2 2.18.3
overrides 7.3.1
packaging 23.1
pandas 2.0.1
pandocfilters 1.5.0
parso 0.8.3
pathtools 0.1.2
peft 0.6.0
pexpect 4.8.0
pickleshare 0.7.5
Pillow 9.5.0
pip 22.3.1
platformdirs 3.5.1
pluggy 1.0.0
prometheus-client 0.17.0
prompt-toolkit 3.0.38
protobuf 3.20.3
psutil 5.9.5
ptyprocess 0.7.0
pure-eval 0.2.2
py-cpuinfo 9.0.0
pyarrow 12.0.0
pyasn1 0.5.0
pyasn1-modules 0.3.0
pycosat 0.6.4
pycparser 2.21
pycryptodome 3.19.0
pydantic 1.10.7
pydeck 0.8.1b0
pydub 0.25.1
Pygments 2.15.1
Pympler 1.0.1
pynvml 11.5.0
pyOpenSSL 22.0.0
pyparsing 3.1.0
pyrsistent 0.19.3
PySocks 1.7.1
python-dateutil 2.8.2
python-json-logger 2.0.7
python-multipart 0.0.6
pytz 2023.3
pytz-deprecation-shim 0.1.0.post0
PyYAML 6.0
pyzmq 25.1.0
regex 2023.5.5
requests 2.28.1
requests-oauthlib 1.3.1
responses 0.18.0
rfc3339-validator 0.1.4
rfc3986-validator 0.1.1
rich 13.4.2
rsa 4.9
ruamel.yaml 0.17.21
ruamel.yaml.clib 0.2.6
safetensors 0.3.1
scipy 1.11.3
semantic-version 2.10.0
Send2Trash 1.8.2
sentencepiece 0.1.99
sentry-sdk 1.25.1
setproctitle 1.3.2
setuptools 65.6.3
simplejson 3.19.2
six 1.16.0
smmap 5.0.0
sniffio 1.3.0
sortedcontainers 2.4.0
soupsieve 2.4.1
stack-data 0.6.2
starlette 0.27.0
streamlit 1.24.0
streamlit-chat 0.1.1
sympy 1.12
tenacity 8.2.2
tensorboard 2.15.1
tensorboard-data-server 0.7.2
termcolor 2.3.0
terminado 0.17.1
tiktoken 0.5.1
tinycss2 1.2.1
tokenizers 0.14.1
toml 0.10.2
tomli 2.0.1
toolz 0.12.0
torch 2.0.1
tornado 6.3.2
tqdm 4.64.1
traitlets 5.9.0
transformers 4.34.0
transformers-stream-generator 0.0.4
triton 2.0.0
typing_extensions 4.5.0
tzdata 2023.3
tzlocal 4.3.1
uc-micro-py 1.0.2
uri-template 1.2.0
urllib3 1.26.14
uvicorn 0.22.0
validators 0.20.0
wandb 0.15.4
watchdog 3.0.0
wcwidth 0.2.6
webcolors 1.13
webencodings 0.5.1
websocket-client 1.5.2
websockets 11.0.3
Werkzeug 2.3.6
wheel 0.37.1
xtuner 0.1.9
xxhash 3.2.0
yapf 0.40.2
yarl 1.9.2
zipp 3.15.0
zstandard 0.18.0
Please complete the git commit id so that we can reproduce it.
Please complete the git commit id so that we can reproduce it.
@pppppM
commit 6892d65ab93184024c561a4fdc0d5653a8f77299
Author: Zhihao Lin <[email protected]>
Date: Mon Nov 20 14:46:45 2023 +0800
[Fix] Fix bugs of llama dispatch (#229)
* fix bugs
* fix