blocksparse Issues in building from source

Configuration: Operating System: Linux Ubuntu 16.04 Python version: 3.5.2 Tensorflow version: 1.12.0 Cuda version: 9.0 GPU: TITAN X (Pascal)

Command to Reproduce make compile

Problem: The build command fails with the following errors:

ptxas /tmp/tmpxft_00007d26_00000000-5_blocksparse_hgemm_cn_op_gpu.compute_70.ptx, line 252; error   : Illegal modifier '.m8n32k16' for instruction 'wmma.mma'
ptxas /tmp/tmpxft_00007d26_00000000-5_blocksparse_hgemm_cn_op_gpu.compute_70.ptx, line 713; error   : Illegal modifier '.m8n32k16' for instruction 'wmma.mma'
ptxas /tmp/tmpxft_00007d26_00000000-5_blocksparse_hgemm_cn_op_gpu.compute_70.ptx, line 1186; error   : Illegal modifier '.m8n32k16' for instruction 'wmma.mma'
ptxas /tmp/tmpxft_00007d26_00000000-5_blocksparse_hgemm_cn_op_gpu.compute_70.ptx, line 1661; error   : Illegal modifier '.m8n32k16' for instruction 'wmma.mma'
ptxas fatal   : Ptx assembly aborted due to errors
Makefile:106: recipe for target 'build/blocksparse_hgemm_cn_op_gpu.cu.o' failed
make: *** [build/blocksparse_hgemm_cn_op_gpu.cu.o] Error 255

pip install blocksparse fails too and results in https://github.com/openai/blocksparse/issues/7

Jan 17 '19 04:01 divyam3897

Hi Divyam,

First let me ask some details. Are you working in a virtual env or a conda env? If so activate that environment before compiling. Perform a make clean and a make compile afterward.

I can confirm that in many cases the pip install doesn't work. It seems to be highly dependent on your specific setup. So compiling yourself seems to be necessary in most cases.

First let me give some general advice. Currently the readme is out dated. There are some extra requirements you need to install which are not mentioned there

Python requirements:

Networkx

System requirements: mpich : apt-get install mpich Nvidia cudnn Nvidia nccl

Your issue seems to be related with the fact that you are using compute_70 in CUDA 9. For some reason this didn't work in my case either. If you comment out the following lines your issues might be solved

 	-gencode=arch=compute_70,code=sm_70 \
 	-gencode=arch=compute_70,code=compute_70

Good luck

Jan 18 '19 15:01 ThomasHagebols

Hi @ThomasHagebols

Thank you for your valuable comment, I agree that the README is outdated. To answer your question, I am working in a virtualenv, I had the system requirements fulfilled before and commenting the lines did help in succeeding the build.

However, if I run test/blocksparse_matmul_test.py after the build, it is back to https://github.com/openai/blocksparse/issues/7 which from the discussion in https://github.com/openai/blocksparse/issues/7 seems was fixed in the source but looks like it still exists?

Jan 18 '19 16:01 divyam3897

I'll make a pull request for the Readme and update the requirements in the setup file.

I have the same issue with failing tests. Unfortunately I don't have the expertise to fix those issues.

Jan 18 '19 16:01 ThomasHagebols

The m8n32k16 error is just a matter of not having cuda >= 9.2. It's kind of annoying all the ptx breaking changes nvidia has been making lately that could have been easily avoided with a small amount of foresight.

Anyway, we have a paper going out soon covering the blocksparse transformer ops. I plan to clean things up and fully document everything prior to that. I'll also have some new conv kernels as well. We're pushing hard now on learned sparsity in a variety of architectures so this code is changing quickly internally. Though I guess I should warn you that a lot of the new development is mostly targeting tensorcore capable hardware.

Jan 18 '19 18:01 scott-gray

Hi @scott-gray

Thank you for your comment, looking forward to the changes! Though the build succeeds with the changes in Makefile, however even then the import fails due to tensorflow.python.framework.errors_impl.NotFoundError: ...../blocksparse_ops.so: undefined symbol: _ZN3MPI8Datatype4FreeEv as also raised in https://github.com/openai/blocksparse/issues/7 . Seems like you did commit the changes for it before but it still persists as a result of which the pip install fails to make it work,

Jan 18 '19 18:01 divyam3897

No idea what that error could be. Something must be off with your build env. I put some comments in the bottom of the Makefile showing the env I use:

https://github.com/openai/blocksparse/blob/master/Makefile

Though you no longer need to patch tensorflow to support batched matmul in fp16. But you will still likely need to build from source to get cuda >= 9.2 support.

Jan 18 '19 18:01 scott-gray

Same configuration as above.

I have the exactly same problem.

when I comment out

 	-gencode=arch=compute_70,code=sm_70 \
 	-gencode=arch=compute_70,code=compute_70

I can finish make compile

but when i try test/blocksparse_matmul_test.py

it failed due to

(spinningup) ruiwang@ubuntu-ruiwang:~/blocksparse$ python test/blocksparse_matmul_test.py
Traceback (most recent call last):
  File "test/blocksparse_matmul_test.py", line 12, in <module>
    from blocksparse.matmul import BlocksparseMatMul, SparseProj, group_param_grads
  File "/home/ruiwang/anaconda3/envs/spinningup/lib/python3.6/site-packages/blocksparse/matmul.py", line 13, in <module>
    import blocksparse.ewops as ew
  File "/home/ruiwang/anaconda3/envs/spinningup/lib/python3.6/site-packages/blocksparse/ewops.py", line 17, in <module>
    _op_module = tf.load_op_library(os.path.join(data_files_path, 'blocksparse_ops.so'))
  File "/home/ruiwang/anaconda3/envs/spinningup/lib/python3.6/site-packages/tensorflow/python/framework/load_library.py", line 61, in load_op_library
    lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: /home/ruiwang/anaconda3/envs/spinningup/lib/python3.6/site-packages/blocksparse/blocksparse_ops.so: undefined symbol: _ZN10tensorflow15OpKernelContext10input_listENS_11StringPieceEPNS_11OpInputListE

Mar 13 '19 22:03 ruiwang2uber

@scott-gray @divyam3897 have you ever got any chance to resolve this? Thanks!

Mar 13 '19 22:03 ruiwang2uber

Same issue as @ruiwang2uber @divyam3897, any updates? Would appreciate it. Thx!

Apr 26 '19 13:04 rohitg1594

In case this helps anyone, I created the following Dockerfile and instructions that worked for me:

Dockerfile (place this in root of the blocksparse repo):

FROM tensorflow/tensorflow:1.15.2-gpu-py3
RUN pip install --upgrade pip
RUN pip3 install tensorflow-gpu==1.13.1

# Need this to run the tests
RUN pip3 install networkx==2.5

ENV NCCL_VERSION=2.4.8-1+cuda10.0
RUN apt-get update && apt-get install -y --no-install-recommends \
  mpich \
  libmpich-dev \
  libnccl2=${NCCL_VERSION} \
  libnccl-dev=${NCCL_VERSION} \
  curl

# Make sure the linker knows where to look for things
ENV LD_LIBRARY_PATH="/usr/local/lib:${LD_LIBRARY_PATH}"

Instructions (you might need to run these commands with sudo): NOTE:

commands prefixed by $ should be run in a shell on the host machine
commands prefixed by # should be run in an interactive shell in the docker container

Build image

$ docker image build -f Dockerfile --rm -t blocksparse:local .

Start docker container with an interactive terminal, Choose the relevant CPU/GPU option below

CPU

the tests below will fail if you try to run them without GPU support
the ln command should be run inside the docker container

$ docker run -it --privileged -w /working_dir -v ${PWD}:/working_dir --rm blocksparse:local
# ln -s /usr/local/cuda/compat/libcuda.so /usr/lib/libcuda.so

GPU

$ docker run -it --gpus all --privileged -w /working_dir -v ${PWD}:/working_dir --rm blocksparse:local

Compile (inside the docker container)

# make compile

Install compiled version (inside the docker container)

# pip3 install dist/*.whl

Test compiled version (inside the docker container)

# test/blocksparse_matmul_test.py
# test/blocksparse_conv_test.py

Dec 30 '20 06:12 jlozano