DeepSpeed DeepSpeed just doesn't install properly on Databricks

I've been trying for a while to setup DeepSpeed on my Databricks cluster correctly, but have been largely unsuccessful in doing so.

Platform Specifications

absl-py==1.0.0 accelerate==0.29.3 aiohttp==3.9.1 aiosignal==1.3.1 anyio==3.5.0 appdirs==1.4.4 argon2-cffi==21.3.0 argon2-cffi-bindings==21.2.0 astor==0.8.1 asttokens==2.0.5 astunparse==1.6.3 async-timeout==4.0.3 attrs==22.1.0 audioread==3.0.1 azure-core==1.29.1 azure-cosmos==4.3.1 azure-storage-blob==12.19.0 azure-storage-file-datalake==12.14.0 backcall==0.2.0 bcrypt==3.2.0 beautifulsoup4==4.11.1 black==22.6.0 bleach==4.1.0 blinker==1.4 blis==0.7.11 boto3==1.24.28 botocore==1.27.96 cachetools==5.3.2 catalogue==2.0.10 category-encoders==2.6.3 certifi==2022.12.7 cffi==1.15.1 chardet==4.0.0 charset-normalizer==2.0.4 click==8.0.4 cloudpathlib==0.16.0 cloudpickle==2.0.0 cmake==3.28.1 cmdstanpy==1.2.0 comm==0.1.2 confection==0.1.4 configparser==5.2.0 contourpy==1.0.5 cryptography==39.0.1 cycler==0.11.0 cymem==2.0.8 Cython==0.29.32 dacite==1.8.1 databricks-automl-runtime==0.2.20 databricks-cli==0.18.0 databricks-feature-engineering==0.2.1 databricks-sdk==0.1.6 dataclasses-json==0.6.3 datasets==2.15.0 dbl-tempo==0.1.26 dbus-python==1.2.18 debugpy==1.6.7 decorator==5.1.1 deepspeed==0.14.2 defusedxml==0.7.1 dill==0.3.6 diskcache==5.6.3 distlib==0.3.7 distro==1.7.0 distro-info==1.1+ubuntu0.2 docstring-to-markdown==0.11 docstring_parser==0.16 einops==0.7.0 entrypoints==0.4 evaluate==0.4.1 executing==0.8.3 facets-overview==1.1.1 fastjsonschema==2.19.1 fasttext==0.9.2 filelock==3.9.0 flash-attn==2.5.7 Flask==2.2.5 flatbuffers==23.5.26 fonttools==4.25.0 frozenlist==1.4.1 fsspec==2023.6.0 future==0.18.3 gast==0.4.0 gensim==4.3.2 gitdb==4.0.11 GitPython==3.1.27 google-api-core==2.15.0 google-auth==2.21.0 google-auth-oauthlib==1.0.0 google-cloud-core==2.4.1 google-cloud-storage==2.11.0 google-crc32c==1.5.0 google-pasta==0.2.0 google-resumable-media==2.7.0 googleapis-common-protos==1.62.0 greenlet==2.0.1 grpcio==1.48.2 grpcio-status==1.48.1 gunicorn==20.1.0 gviz-api==1.10.0 h5py==3.7.0 hf_transfer==0.1.6 hjson==3.1.0 holidays==0.38 horovod==0.28.1 htmlmin==0.1.12 httplib2==0.20.2 huggingface-hub==0.21.3 idna==3.4 ImageHash==4.3.1 imbalanced-learn==0.11.0 importlib-metadata==4.11.3 importlib-resources==6.1.1 ipykernel==6.25.0 ipython==8.14.0 ipython-genutils==0.2.0 ipywidgets==7.7.2 isodate==0.6.1 itsdangerous==2.0.1 jedi==0.18.1 jeepney==0.7.1 Jinja2==3.1.2 jmespath==0.10.0 joblib==1.2.0 joblibspark==0.5.1 jsonpatch==1.33 jsonpointer==2.4 jsonschema==4.17.3 jupyter-client==7.3.4 jupyter-server==1.23.4 jupyter_core==5.2.0 jupyterlab-pygments==0.1.2 jupyterlab-widgets==1.0.0 keras==2.14.0 keyring==23.5.0 kiwisolver==1.4.4 langchain==0.0.348 langchain-core==0.0.13 langcodes==3.3.0 langsmith==0.0.79 launchpadlib==1.10.16 lazr.restfulclient==0.14.4 lazr.uri==1.0.6 lazy_loader==0.3 libclang==15.0.6.1 librosa==0.10.1 lightgbm==4.1.0 lit==17.0.6 llvmlite==0.39.1 lxml==4.9.1 Mako==1.2.0 Markdown==3.4.1 markdown-it-py==3.0.0 MarkupSafe==2.1.1 marshmallow==3.20.2 matplotlib==3.7.0 matplotlib-inline==0.1.6 mccabe==0.7.0 mdurl==0.1.2 mistune==0.8.4 ml-dtypes==0.2.0 mlflow-skinny==2.9.2 more-itertools==8.10.0 mpmath==1.2.1 msgpack==1.0.7 multidict==6.0.4 multimethod==1.10 multiprocess==0.70.14 murmurhash==1.0.10 mypy-extensions==0.4.3 nbclassic==0.5.2 nbclient==0.5.13 nbconvert==6.5.4 nbformat==5.7.0 nest-asyncio==1.5.6 networkx==2.8.4 ninja==1.11.1.1 nltk==3.7 nodeenv==1.8.0 notebook==6.5.2 notebook_shim==0.2.2 numba==0.56.4 numpy==1.23.5 nvidia-cublas-cu11==11.11.3.6 nvidia-cublas-cu12==12.1.3.1 nvidia-cuda-cupti-cu11==11.8.87 nvidia-cuda-cupti-cu12==12.1.105 nvidia-cuda-nvrtc-cu11==11.8.89 nvidia-cuda-nvrtc-cu12==12.1.105 nvidia-cuda-runtime-cu11==11.8.89 nvidia-cuda-runtime-cu12==12.1.105 nvidia-cudnn-cu11==8.7.0.84 nvidia-cudnn-cu12==8.9.2.26 nvidia-cufft-cu11==10.9.0.58 nvidia-cufft-cu12==11.0.2.54 nvidia-curand-cu11==10.3.0.86 nvidia-curand-cu12==10.3.2.106 nvidia-cusolver-cu11==11.4.1.48 nvidia-cusolver-cu12==11.4.5.107 nvidia-cusparse-cu11==11.7.5.86 nvidia-cusparse-cu12==12.1.0.106 nvidia-nccl-cu11==2.19.3 nvidia-nccl-cu12==2.19.3 nvidia-nvjitlink-cu12==12.4.127 nvidia-nvtx-cu11==11.8.86 nvidia-nvtx-cu12==12.1.105 oauthlib==3.2.0 openai==0.28.1 opt-einsum==3.3.0 packaging==23.2 pandas==1.5.3 pandocfilters==1.5.0 paramiko==2.9.2 parso==0.8.3 pathspec==0.10.3 patsy==0.5.3 peft==0.10.0 petastorm==0.12.1 pexpect==4.8.0 phik==0.12.4 pickleshare==0.7.5 Pillow==9.4.0 platformdirs==2.5.2 plotly==5.9.0 pluggy==1.0.0 pmdarima==2.0.4 pooch==1.4.0 preshed==3.0.9 prompt-toolkit==3.0.36 prophet==1.1.5 protobuf==4.24.0 psutil==5.9.0 psycopg2==2.9.3 ptyprocess==0.7.0 pure-eval==0.2.2 py-cpuinfo==9.0.0 pyarrow==8.0.0 pyarrow-hotfix==0.5 pyasn1==0.4.8 pyasn1-modules==0.2.8 pybind11==2.11.1 pycparser==2.21 pydantic==1.10.6 pyflakes==3.1.0 Pygments==2.17.2 PyGObject==3.42.1 PyJWT==2.3.0 PyNaCl==1.5.0 pynvml==11.5.0 pyodbc==4.0.32 pyparsing==3.0.9 pyright==1.1.294 pyrsistent==0.18.0 pytesseract==0.3.10 python-apt==2.4.0+ubuntu3 python-dateutil==2.8.2 python-editor==1.0.4 python-lsp-jsonrpc==1.1.1 python-lsp-server==1.8.0 pytoolconfig==1.2.5 pytz==2022.7 PyWavelets==1.4.1 PyYAML==6.0 pyzmq==23.2.0 regex==2022.7.9 requests==2.28.1 requests-oauthlib==1.3.1 responses==0.18.0 rich==13.7.1 rope==1.7.0 rsa==4.9 s3transfer==0.6.2 safetensors==0.4.1 scikit-learn==1.1.1 scipy==1.10.0 seaborn==0.12.2 SecretStorage==3.3.1 Send2Trash==1.8.0 sentence-transformers==2.2.2 sentencepiece==0.1.99 shap==0.44.0 shtab==1.7.1 simplejson==3.17.6 six==1.16.0 slicer==0.0.7 smart-open==5.2.1 smmap==5.0.0 sniffio==1.2.0 soundfile==0.12.1 soupsieve==2.3.2.post1 soxr==0.3.7 spacy==3.7.2 spacy-legacy==3.0.12 spacy-loggers==1.0.5 spark-tensorflow-distributor==1.0.0 SQLAlchemy==1.4.39 sqlparse==0.4.2 srsly==2.4.8 ssh-import-id==5.11 stack-data==0.2.0 stanio==0.3.0 statsmodels==0.13.5 sympy==1.11.1 tabulate==0.8.10 tangled-up-in-unicode==0.2.0 tenacity==8.1.0 tensorboard==2.14.1 tensorboard-data-server==0.7.2 tensorboard-plugin-profile==2.14.0 tensorflow==2.14.1 tensorflow-estimator==2.14.0 tensorflow-io-gcs-filesystem==0.35.0 termcolor==2.4.0 terminado==0.17.1 thinc==8.2.2 threadpoolctl==2.2.0 tiktoken==0.5.2 tinycss2==1.2.1 tokenize-rt==4.2.1 tokenizers==0.19.1 tomli==2.0.1 torch==2.2.1+cu118 torchaudio==2.2.1+cu118 torchvision==0.17.1+cu118 tornado==6.1 tqdm==4.64.1 traitlets==5.7.1 transformers==4.40.1 triton==2.2.0 trl==0.8.6 typeguard==2.13.3 typer==0.9.0 typing-inspect==0.9.0 typing_extensions==4.11.0 tyro==0.8.3 ujson==5.4.0 unattended-upgrades==0.1 urllib3==1.26.14 virtualenv==20.16.7 visions==0.7.5 wadllib==1.3.6 wasabi==1.1.2 wcwidth==0.2.5 weasel==0.3.4 webencodings==0.5.1 websocket-client==0.58.0 Werkzeug==2.2.2 whatthepatch==1.0.2 widgetsnbextension==3.6.1 wordcloud==1.9.3 wrapt==1.14.1 xgboost==1.7.6 xxhash==3.4.1 yapf==0.33.0 yarl==1.9.4 ydata-profiling==4.2.0 zipp==3.11.0

I'm also using Databricks Runtime Version == 14.3 LTS ML with CUDA == 11.8.

This is the output of ds_report:

ds_report

[2024-04-29 17:15:09,427] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] async_io: please install the libaio-dev package with apt [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2 [WARNING] using untested triton version (2.2.0), only 1.0.0 is known to be compatible

DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja ninja .................. [OKAY]

op name ................ installed .. compatible

[WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] async_io: please install the libaio-dev package with apt [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] fused_adam ............. [NO] ....... [OKAY] cpu_adam ............... [NO] ....... [OKAY] cpu_adagrad ............ [NO] ....... [OKAY] cpu_lion ............... [NO] ....... [OKAY] [WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH evoformer_attn ......... [NO] ....... [NO] fp_quantizer ........... [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] fused_lion ............. [NO] ....... [OKAY] inference_core_ops ..... [NO] ....... [OKAY] cutlass_ops ............ [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] ragged_device_ops ...... [NO] ....... [OKAY] ragged_ops ............. [NO] ....... [OKAY] random_ltd ............. [NO] ....... [OKAY] [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2 [WARNING] using untested triton version (2.2.0), only 1.0.0 is known to be compatible sparse_attn ............ [NO] ....... [NO] spatial_inference ...... [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY]

DeepSpeed general environment info: torch install path ............... ['/databricks/python3/lib/python3.10/site-packages/torch'] torch version .................... 2.2.2+cu121 deepspeed install path ........... ['/databricks/python3/lib/python3.10/site-packages/deepspeed'] deepspeed info ................... 0.14.2, unknown, unknown torch cuda version ............... 12.1 torch hip version ................ None nvcc version ..................... 11.8 deepspeed wheel compiled w. ...... torch 2.2, cuda 12.1 shared memory (/dev/shm) size .... 560.90 GB

As you can see, most of the stuff here has NOT been installed.

I've seen a lot of envs with which I'm supposed to install deepspeed like DS_BUILD_OPS=1, DS_BUILD_SPARSE_ATTN=0, DS_BUILD_AIO=1, etc but i'm really not sure which ones to use.

I'm entirely new to DeepSpeed, so I'd really appreciate your help, thanks!

Apr 29 '24 17:04 vikram71198

Hi @vikram71198 - it looks like DeepSpeed is installed, what you are seeing is that you have not pre-compiled any ops. That's fine, you don't need to, the ops can be JIT compiled just fine. You probably don't need to pre-compile, but you can read more about that here and decide if you need to. If you do, determine what ops you will need and you can pre-compile those. Some ops have other dependencies, async_io, cutlass kernels, etc, that's why you see some envs with those disabled.

Apr 29 '24 17:04 loadams

Gotcha. I explicitly pip install torch == 2.2.1+cu118 (torch == 2.2.2+cu121 is the default torch which I attempt to override), so another part of ds_report that I find confounding is this:

DeepSpeed general environment info: torch install path ............... ['/databricks/python3/lib/python3.10/site-packages/torch'] torch version .................... 2.2.2+cu121 deepspeed install path ........... ['/databricks/python3/lib/python3.10/site-packages/deepspeed'] deepspeed info ................... 0.14.2, unknown, unknown torch cuda version ............... 12.1 torch hip version ................ None nvcc version ..................... 11.8 deepspeed wheel compiled w. ...... torch 2.2, cuda 12.1 shared memory (/dev/shm) size .... 560.90 GB

Why do torch version, torch cuda version & deepspeed compiled wheel all indicate torch == 2.2.2+cu121 & not 2.2.1+cu118?

The Databricks Cluster Runtime I'm currently using has CUDA == 11.8.

And yes, I run the torch installation before the DeepSpeed installation.

Apr 29 '24 17:04 vikram71198

Okay, I fixed this myself. Nvm.

Apr 29 '24 18:04 vikram71198

So we can close this issue?

Apr 29 '24 19:04 loadams

Hi @vikram71198 - I assume we can close this issue. If not, please comment and we can re-open.

Apr 30 '24 21:04 loadams