alignment-handbook
alignment-handbook copied to clipboard
Not able to run Zephyr 7B Gemma with 4 80GB A100s
I'm not able to run Zephyr 7B Gemma with 4 80GB A100s. I get the following error:
RuntimeError: The size of tensor a (0) must match the size of tensor b (24576) at non-singleton dimension 1
After running:
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_sft.py recipes/zephyr-7b-gemma/sft/config_full.yaml
As can be seen, I've just modified num_processes and I tested zero3_init_flag: false
compute_environment: LOCAL_MACHINE
debug: false
deepspeed_config:
deepspeed_multinode_launcher: standard
offload_optimizer_device: none
offload_param_device: none
zero3_init_flag: false
zero3_save_16bit_model: true
zero_stage: 3
distributed_type: DEEPSPEED
downcast_bf16: 'no'
machine_rank: 0
main_training_function: main
mixed_precision: bf16
num_machines: 1
num_processes: 4
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false
I've seen this related issue, (#57), but none of the solutions work.
Hope we find a solution soon for the members of the 4 GPU cluster club! 🤗
I've just find out that it works IF YOU INSTALL the dependencies as point 1 of this post. I've run the following to set up the environment:
pip install "torch==2.1.2" tensorboard
python -m pip install .
pip uninstall transformer-engine # I got errors, I'm working with A100s
pip install --upgrade \
"transformers==4.38.2" \
"datasets==2.16.1" \
"accelerate==0.26.1" \
"evaluate==0.4.1" \
"bitsandbytes==0.42.0" \
"trl==0.7.11" \
"peft==0.8.2"
pip install ninja packaging
MAX_JOBS=4 pip install flash-attn --no-build-isolation --upgrade
And the complete list of dependencies:
absl-py 2.0.0
accelerate 0.26.1
aiohttp 3.8.5
aiosignal 1.3.1
alignment-handbook 0.4.0.dev0
annotated-types 0.5.0
apex 0.1
appdirs 1.4.4
argon2-cffi 23.1.0
argon2-cffi-bindings 21.2.0
asttokens 2.4.0
astunparse 1.6.3
async-timeout 4.0.3
attrs 23.1.0
audioread 3.0.1
backcall 0.2.0
beautifulsoup4 4.12.2
bitsandbytes 0.42.0
bleach 6.0.0
blis 0.7.11
cachetools 5.3.1
catalogue 2.0.10
certifi 2023.7.22
cffi 1.16.0
charset-normalizer 3.2.0
click 8.1.6
cloudpathlib 0.15.1
cloudpickle 2.2.1
cmake 3.27.6
comm 0.1.4
confection 0.1.3
contourpy 1.1.1
cubinlinker 0.3.0+2.gce0680b
cuda-python 12.2.0rc5+5.g84845d1
cudf 23.8.0
cugraph 23.8.0
cugraph-dgl 23.8.0
cugraph-service-client 23.8.0
cugraph-service-server 23.8.0
cuml 23.8.0
cupy-cuda12x 12.1.0
cycler 0.12.1
cymem 2.0.8
Cython 3.0.3
dask 2023.7.1
dask-cuda 23.8.0
dask-cudf 23.8.0
datasets 2.16.1
debugpy 1.8.0
decorator 5.1.1
deepspeed 0.12.2
defusedxml 0.7.1
dill 0.3.7
distributed 2023.7.1
dm-tree 0.1.8
docker-pycreds 0.4.0
docstring-parser 0.15
einops 0.7.0
evaluate 0.4.1
exceptiongroup 1.1.3
execnet 2.0.2
executing 2.0.0
expecttest 0.1.3
fastjsonschema 2.18.1
fastrlock 0.8.1
filelock 3.12.4
flash-attn 2.5.6
fonttools 4.43.1
frozenlist 1.4.0
fsspec 2023.6.0
gast 0.5.4
gitdb 4.0.11
GitPython 3.1.40
google-auth 2.23.2
google-auth-oauthlib 0.4.6
graphsurgeon 0.4.6
grpcio 1.59.0
hf_transfer 0.1.6
hjson 3.1.0
huggingface-hub 0.21.4
hypothesis 5.35.1
idna 3.4
importlib-metadata 6.8.0
iniconfig 2.0.0
intel-openmp 2021.4.0
ipykernel 6.25.2
ipython 8.16.1
ipython-genutils 0.2.0
ipywidgets 8.1.1
jedi 0.19.1
Jinja2 3.1.2
joblib 1.3.2
json5 0.9.14
jsonschema 4.19.1
jsonschema-specifications 2023.7.1
jupyter 1.0.0
jupyter_client 8.3.1
jupyter-console 6.6.3
jupyter_core 5.3.2
jupyter-tensorboard 0.2.0
jupyterlab 2.3.2
jupyterlab-pygments 0.2.2
jupyterlab-server 1.2.0
jupyterlab-widgets 3.0.9
jupytext 1.15.2
kiwisolver 1.4.5
langcodes 3.3.0
librosa 0.9.2
lit 17.0.6
llvmlite 0.40.1
locket 1.0.0
Markdown 3.4.4
markdown-it-py 3.0.0
MarkupSafe 2.1.3
matplotlib 3.8.0
matplotlib-inline 0.1.6
mdit-py-plugins 0.4.0
mdurl 0.1.2
mistune 3.0.2
mkl 2021.1.1
mkl-devel 2021.1.1
mkl-include 2021.1.1
mock 5.1.0
mpmath 1.3.0
msgpack 1.0.5
multidict 6.0.4
multiprocess 0.70.15
munch 4.0.0
murmurhash 1.0.10
nbclient 0.8.0
nbconvert 7.9.2
nbformat 5.9.2
nest-asyncio 1.5.8
networkx 2.6.3
ninja 1.11.1.1
notebook 6.4.10
numba 0.57.1+1.g5fba9aa8f
numpy 1.26.4
nvfuser 0.0.20+gitunknown
nvidia-cublas-cu11 11.10.3.66
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu11 11.7.101
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu11 11.7.99
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu11 8.5.0.96
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu11 10.9.0.58
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu11 10.2.10.91
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu11 11.4.0.1
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu11 11.7.4.91
nvidia-cusparse-cu12 12.1.0.106
nvidia-dali-cuda120 1.30.0
nvidia-nccl-cu11 2.14.3
nvidia-nccl-cu12 2.18.1
nvidia-nvjitlink-cu12 12.4.99
nvidia-nvtx-cu11 11.7.91
nvidia-nvtx-cu12 12.1.105
nvidia-pyindex 1.0.9
nvtx 0.2.5
oauthlib 3.2.2
onnx 1.14.0
opencv 4.7.0
packaging 23.1
pandas 1.5.3
pandocfilters 1.5.0
parso 0.8.3
partd 1.4.0
pathy 0.10.2
peft 0.8.2
pexpect 4.8.0
pickleshare 0.7.5
Pillow 9.2.0
pip 23.3.2
platformdirs 3.11.0
pluggy 1.3.0
ply 3.11
polygraphy 0.49.0
pooch 1.7.0
preshed 3.0.9
prettytable 3.9.0
prometheus-client 0.17.1
prompt-toolkit 3.0.39
protobuf 3.20.2
psutil 5.9.4
ptxcompiler 0.8.1+1.g2cb1b35
ptyprocess 0.7.0
pure-eval 0.2.2
py-cpuinfo 9.0.0
pyarrow 11.0.0
pyarrow-hotfix 0.6
pyasn1 0.5.0
pyasn1-modules 0.3.0
pybind11 2.11.1
pybind11-global 2.11.1
pycocotools 2.0+nv0.7.3
pycparser 2.21
pydantic 1.10.13
pydantic_core 2.10.1
Pygments 2.16.1
pylibcugraph 23.8.0
pylibcugraphops 23.8.0
pylibraft 23.8.0
pynvml 11.4.1
pyparsing 3.1.1
pytest 7.4.2
pytest-flakefinder 1.1.0
pytest-rerunfailures 12.0
pytest-shard 0.1.2
pytest-xdist 3.3.1
python-dateutil 2.8.2
python-hostlist 1.23.0
pytorch-quantization 2.1.2
pytz 2023.3
PyYAML 6.0.1
pyzmq 25.1.1
qtconsole 5.5.1
QtPy 2.4.1
raft-dask 23.8.0
referencing 0.30.2
regex 2023.10.3
requests 2.31.0
requests-oauthlib 1.3.1
resampy 0.4.2
responses 0.18.0
rich 13.7.1
rmm 23.8.0
rpds-py 0.10.4
rsa 4.9
safetensors 0.4.2
scikit-learn 1.2.0
scipy 1.11.1
seaborn 0.13.1
Send2Trash 1.8.2
sentencepiece 0.1.99
sentry-sdk 1.39.1
setproctitle 1.3.3
setuptools 69.0.3
shtab 1.7.1
six 1.16.0
smart-open 6.4.0
smmap 5.0.1
sortedcontainers 2.4.0
soundfile 0.12.1
soupsieve 2.5
spacy 3.7.1
spacy-legacy 3.0.12
spacy-loggers 1.0.5
sphinx-glpi-theme 0.3
srsly 2.4.8
stack-data 0.6.3
sympy 1.12
tabulate 0.9.0
tbb 2021.10.0
tblib 2.0.0
tensorboard 2.9.0
tensorboard-data-server 0.6.1
tensorboard-plugin-wit 1.8.1
tensorrt 8.6.1
terminado 0.17.1
thinc 8.2.1
threadpoolctl 3.2.0
thriftpy2 0.4.16
tinycss2 1.2.1
tokenizers 0.15.2
toml 0.10.2
tomli 2.0.1
toolz 0.12.0
torch 2.1.2
tornado 6.3.3
tqdm 4.66.1
traitlets 5.9.0
transformers 4.38.2
treelite 3.2.0
treelite-runtime 3.2.0
triton 2.1.0
trl 0.7.11
typer 0.9.0
types-dataclasses 0.6.6
typing_extensions 4.7.1
tyro 0.7.3
ucx-py 0.33.0
uff 0.6.9
urllib3 1.26.16
wandb 0.16.1
wasabi 1.1.2
wcwidth 0.2.8
weasel 0.3.2
webencodings 0.5.1
Werkzeug 3.0.0
wheel 0.41.2
widgetsnbextension 4.0.9
xdoctest 1.0.2
xgboost 1.7.5
xxhash 3.4.1
yarl 1.9.2
zict 3.0.0
zipp 3.16.2