Issues in building from source
Configuration: Operating System: Linux Ubuntu 16.04 Python version: 3.5.2 Tensorflow version: 1.12.0 Cuda version: 9.0 GPU: TITAN X (Pascal)
Command to Reproduce
make compile
Problem: The build command fails with the following errors:
ptxas /tmp/tmpxft_00007d26_00000000-5_blocksparse_hgemm_cn_op_gpu.compute_70.ptx, line 252; error : Illegal modifier '.m8n32k16' for instruction 'wmma.mma'
ptxas /tmp/tmpxft_00007d26_00000000-5_blocksparse_hgemm_cn_op_gpu.compute_70.ptx, line 713; error : Illegal modifier '.m8n32k16' for instruction 'wmma.mma'
ptxas /tmp/tmpxft_00007d26_00000000-5_blocksparse_hgemm_cn_op_gpu.compute_70.ptx, line 1186; error : Illegal modifier '.m8n32k16' for instruction 'wmma.mma'
ptxas /tmp/tmpxft_00007d26_00000000-5_blocksparse_hgemm_cn_op_gpu.compute_70.ptx, line 1661; error : Illegal modifier '.m8n32k16' for instruction 'wmma.mma'
ptxas fatal : Ptx assembly aborted due to errors
Makefile:106: recipe for target 'build/blocksparse_hgemm_cn_op_gpu.cu.o' failed
make: *** [build/blocksparse_hgemm_cn_op_gpu.cu.o] Error 255
pip install blocksparse fails too and results in https://github.com/openai/blocksparse/issues/7
Hi Divyam,
First let me ask some details. Are you working in a virtual env or a conda env? If so activate that environment before compiling. Perform a make clean and a make compile afterward.
I can confirm that in many cases the pip install doesn't work. It seems to be highly dependent on your specific setup. So compiling yourself seems to be necessary in most cases.
First let me give some general advice. Currently the readme is out dated. There are some extra requirements you need to install which are not mentioned there
Python requirements:
- Networkx
System requirements:
mpich : apt-get install mpich
Nvidia cudnn
Nvidia nccl
Your issue seems to be related with the fact that you are using compute_70 in CUDA 9. For some reason this didn't work in my case either. If you comment out the following lines your issues might be solved
-gencode=arch=compute_70,code=sm_70 \
-gencode=arch=compute_70,code=compute_70
Good luck
Hi @ThomasHagebols
Thank you for your valuable comment, I agree that the README is outdated. To answer your question, I am working in a virtualenv, I had the system requirements fulfilled before and commenting the lines did help in succeeding the build.
However, if I run test/blocksparse_matmul_test.py after the build, it is back to https://github.com/openai/blocksparse/issues/7 which from the discussion in https://github.com/openai/blocksparse/issues/7 seems was fixed in the source but looks like it still exists?
I'll make a pull request for the Readme and update the requirements in the setup file.
I have the same issue with failing tests. Unfortunately I don't have the expertise to fix those issues.
The m8n32k16 error is just a matter of not having cuda >= 9.2. It's kind of annoying all the ptx breaking changes nvidia has been making lately that could have been easily avoided with a small amount of foresight.
Anyway, we have a paper going out soon covering the blocksparse transformer ops. I plan to clean things up and fully document everything prior to that. I'll also have some new conv kernels as well. We're pushing hard now on learned sparsity in a variety of architectures so this code is changing quickly internally. Though I guess I should warn you that a lot of the new development is mostly targeting tensorcore capable hardware.
Hi @scott-gray
Thank you for your comment, looking forward to the changes!
Though the build succeeds with the changes in Makefile, however even then the import fails due to tensorflow.python.framework.errors_impl.NotFoundError: ...../blocksparse_ops.so: undefined symbol: _ZN3MPI8Datatype4FreeEv as also raised in https://github.com/openai/blocksparse/issues/7 . Seems like you did commit the changes for it before but it still persists as a result of which the pip install fails to make it work,
No idea what that error could be. Something must be off with your build env. I put some comments in the bottom of the Makefile showing the env I use:
https://github.com/openai/blocksparse/blob/master/Makefile
Though you no longer need to patch tensorflow to support batched matmul in fp16. But you will still likely need to build from source to get cuda >= 9.2 support.
Same configuration as above.
I have the exactly same problem.
when I comment out
-gencode=arch=compute_70,code=sm_70 \
-gencode=arch=compute_70,code=compute_70
I can finish make compile
but when i try test/blocksparse_matmul_test.py
it failed due to
(spinningup) ruiwang@ubuntu-ruiwang:~/blocksparse$ python test/blocksparse_matmul_test.py
Traceback (most recent call last):
File "test/blocksparse_matmul_test.py", line 12, in <module>
from blocksparse.matmul import BlocksparseMatMul, SparseProj, group_param_grads
File "/home/ruiwang/anaconda3/envs/spinningup/lib/python3.6/site-packages/blocksparse/matmul.py", line 13, in <module>
import blocksparse.ewops as ew
File "/home/ruiwang/anaconda3/envs/spinningup/lib/python3.6/site-packages/blocksparse/ewops.py", line 17, in <module>
_op_module = tf.load_op_library(os.path.join(data_files_path, 'blocksparse_ops.so'))
File "/home/ruiwang/anaconda3/envs/spinningup/lib/python3.6/site-packages/tensorflow/python/framework/load_library.py", line 61, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: /home/ruiwang/anaconda3/envs/spinningup/lib/python3.6/site-packages/blocksparse/blocksparse_ops.so: undefined symbol: _ZN10tensorflow15OpKernelContext10input_listENS_11StringPieceEPNS_11OpInputListE
@scott-gray @divyam3897 have you ever got any chance to resolve this? Thanks!
Same issue as @ruiwang2uber @divyam3897, any updates? Would appreciate it. Thx!
In case this helps anyone, I created the following Dockerfile and instructions that worked for me:
Dockerfile (place this in root of the blocksparse repo):
FROM tensorflow/tensorflow:1.15.2-gpu-py3
RUN pip install --upgrade pip
RUN pip3 install tensorflow-gpu==1.13.1
# Need this to run the tests
RUN pip3 install networkx==2.5
ENV NCCL_VERSION=2.4.8-1+cuda10.0
RUN apt-get update && apt-get install -y --no-install-recommends \
mpich \
libmpich-dev \
libnccl2=${NCCL_VERSION} \
libnccl-dev=${NCCL_VERSION} \
curl
# Make sure the linker knows where to look for things
ENV LD_LIBRARY_PATH="/usr/local/lib:${LD_LIBRARY_PATH}"
Instructions (you might need to run these commands with sudo):
NOTE:
- commands prefixed by
$should be run in a shell on the host machine - commands prefixed by
#should be run in an interactive shell in the docker container
- Build image
$ docker image build -f Dockerfile --rm -t blocksparse:local .
- Start docker container with an interactive terminal, Choose the relevant CPU/GPU option below
CPU
- the tests below will fail if you try to run them without GPU support
- the
lncommand should be run inside the docker container
$ docker run -it --privileged -w /working_dir -v ${PWD}:/working_dir --rm blocksparse:local
# ln -s /usr/local/cuda/compat/libcuda.so /usr/lib/libcuda.so
GPU
$ docker run -it --gpus all --privileged -w /working_dir -v ${PWD}:/working_dir --rm blocksparse:local
- Compile (inside the docker container)
# make compile
- Install compiled version (inside the docker container)
# pip3 install dist/*.whl
- Test compiled version (inside the docker container)
# test/blocksparse_matmul_test.py
# test/blocksparse_conv_test.py