Has anyone correctly build from source and run `test/blocksparse_matmul_test.py`?
Could you please share an exact setup and how you did it?
has been struggled to resolve this error after i compiled blocksparse from source.
If you managed to have pip install works, please also share.
tensorflow.python.framework.errors_impl.NotFoundError: /home/ruiwang/anaconda3/envs/spinningup/lib/python3.6/site-packages/blocksparse/blocksparse_ops.so: undefined symbol: _ZNK10tensorflow8OpKernel4nameB5cxx11Ev
Could the author consider to release a docker file?
Ok, after many trails, this is what worked for me on Ubuntu 18.04, cuda-10 and anaconda: First install g++ 5. Because tensorflow-gpu installed using pip is compiled using g++ 5.4: In Ubuntu 18.04, you can install g++ 5 as follows: $ sudo apt-get install g++-5
Clone the blocksparse repo and in Makefile change "g++" to "g++-5" (without quotes) Now create a new virtual environment using conda and Python 3.6 (Use 3.6 not 3.7). Activate this environment and install tensorflow-gpu using pip. Now you can compile blocksparse in this environment.
In case this helps anyone, I created the following Dockerfile and instructions that worked for me:
Dockerfile (place this in root of the blocksparse repo):
FROM tensorflow/tensorflow:1.15.2-gpu-py3
RUN pip install --upgrade pip
RUN pip3 install tensorflow-gpu==1.13.1
# Need this to run the tests
RUN pip3 install networkx==2.5
ENV NCCL_VERSION=2.4.8-1+cuda10.0
RUN apt-get update && apt-get install -y --no-install-recommends \
mpich \
libmpich-dev \
libnccl2=${NCCL_VERSION} \
libnccl-dev=${NCCL_VERSION} \
curl
# Make sure the linker knows where to look for things
ENV LD_LIBRARY_PATH="/usr/local/lib:${LD_LIBRARY_PATH}"
Instructions (you might need to run these commands with sudo):
NOTE:
- commands prefixed by
$should be run in a shell on the host machine - commands prefixed by
#should be run in an interactive shell in the docker container
- Build image
$ docker image build -f Dockerfile --rm -t blocksparse:local .
- Start docker container with an interactive terminal, Choose the relevant CPU/GPU option below
CPU
- the tests below will fail if you try to run them without GPU support
- the
lncommand should be run inside the docker container
$ docker run -it --privileged -w /working_dir -v ${PWD}:/working_dir --rm blocksparse:local
# ln -s /usr/local/cuda/compat/libcuda.so /usr/lib/libcuda.so
GPU
$ docker run -it --gpus all --privileged -w /working_dir -v ${PWD}:/working_dir --rm blocksparse:local
- Compile (inside the docker container)
# make compile
- Install compiled version (inside the docker container)
# pip3 install dist/*.whl
- Test compiled version (inside the docker container)
# test/blocksparse_matmul_test.py
# test/blocksparse_conv_test.py
In Choose the relevant CPU/GPU option, I had the error as Error response from daemon: could not select device driver "" with capabilities: [[gpu]].. Any thoughts on this ?
If I choose CPU version, then the error is InvalidArgumentError (see above for traceback): No OpKernel was registered to support Op 'FloatCast' used by node FloatCast (defined at <string>:5598) with these attrs: [TX=DT_FLOAT, dx_dtype=DT_FLOAT, TY=DT_HALF]. How to fix this ?