SIGILL error when trying to use packmol in Docker container
I have a Docker container which contains Packmol and when I attempt to use it, Packmol boots me out with a SIGILL error.
To reproduce: I have uploaded the image to https://hub.docker.com/r/alexhroom/packmol-bug When trying to run an example script in a shell session (e.g. the mixture of water and urea example) the following occurs:
################################################################################
PACKMOL - Packing optimization for the automated generation of
starting configurations for molecular dynamics simulations.
Version 20.14.2
################################################################################
Packmol must be run with: packmol < inputfile.inp
Userguide at: http://m3g.iqm.unicamp.br/packmol
Reading input file... (Control-C aborts)
Program received signal SIGILL: Illegal instruction.
Backtrace for this error:
#0 0x7f2b1eea58b0 in ???
#1 0x7f2b1eea4ae3 in ???
#2 0x7f2b1eb2096f in ???
#3 0x556644cfd278 in ???
#4 0x556644d1d3be in ???
#5 0x556644cd52fe in ???
#6 0x7f2b1eb0d09a in ???
#7 0x556644cd5349 in ???
#8 0xffffffffffffffff in ???
Illegal instruction (core dumped)
It produces a core dump which I can share if required (although i don't know where to put it)
Can you provide some instruction on how to run that, to try to reproduce the issue?
(the docker link starts asking for a username and password, is that required?)
As a very bold shot, try removing the --fast-math from the compiler options in the Makefile, to see if that solves the issue.
@lmiq link fixed sorry. if you have docker installed, run the following:
docker pull alexhroom/packmol-bug
docker run -it alexhroom/packmol-bug /bin/sh
wget https://m3g.github.io/packmol/examples/mixture.inp
wget https://m3g.github.io/packmol/examples/water.pdb
wget https://m3g.github.io/packmol/examples/urea.pdb
packmol < mixture.inp
recompiling without --fast-math fixes it, but weirdly so does recompiling with --fast-math. This is in line with what the software I've been working on has been seeing where this bug has been happening intermittently between container rebuilds - seems like sometimes the packmol builds just don't work correctly?
I have seen issues associated with the use of --fast-math (and those are supposedly solved). Otherwise I don't know of any other compilation issue, and we have been cross-compiling for all platforms using the Julia BinaryBuilder, so that's strange.
I'll try to run the example and see if I can find any clue about what's going on.
You can also try compiling with cmake:
cd packmol
cmake .
make
instead of using the Makefile provided, that may give some higher degree of robustness.
I'm getting this:
% docker pull alexhroom/packmol-bug
Using default tag: latest
Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/images/create?fromImage=alexhroom%2Fpackmol-bug&tag=latest": dial unix /var/run/docker.sock: connect: permission denied
you should either be root or add your user to the docker group :)
Ok, I can reproduce the bug here, but I have no idea what to do with it. If compiling with cmake does not help, maybe you want to create a docker with Packmol compiled with:
make devel
which will turn on all possible warning and error flags. Then, if the error appears in the docker, we may be able to identify which instruction is causing the failure.
thanks for the help, will give that a go. i'm not really sure how to go about fixing it either, but i also don't know how to debug via core dumps - in any case it's good that it's not (necessarily) hardware-dependent and can be consistently reproduced there
Alternatively, or if possible, give the instructions on how that gets compiled there. I never used a docker before, so I'm lost here.
sure: essentially, Docker starts with a base image (usually just an OS installation) and then runs shell scripts to install and configure packages. for packmol, our image uses
RUN mkdir /opt/other
RUN mkdir /opt/other/gfortran
WORKDIR /opt/other/gfortran
RUN wget https://gfortran.meteodat.ch/download/x86_64/releases/gcc-12.2.0.tar.xz
RUN tar -xJf gcc-12.2.0.tar.xz
RUN LD_LIBRARY_PATH="/opt/other/gfortran/gcc-12.2.0/lib64:$LD_LIBRARY_PATH"
RUN export LD_LIBRARY_PATH
# Get packmol
RUN mkdir /opt/other/packmol
WORKDIR /opt/other/packmol
RUN wget https://github.com/m3g/packmol/archive/refs/tags/v20.14.2.tar.gz
RUN tar -xzvf v20.14.2.tar.gz
RUN rm v20.14.2.tar.gz
# Build Packmol
WORKDIR /opt/other/packmol/packmol-20.14.2
RUN ./configure /opt/other/gfortran/gcc-12.2.0/
RUN make
I'm sorry, but I need some more step-by-step instructions. What do I do with those instructions? Are they input files for some docker command? (I really can´t go through the docker manual now to understand what I need here).
@lmiq Apologies for not explaining in enough detail. Essentially, Docker containers are a type of virtual machine with an 'image' as their base machine. This image is created by a Dockerfile, which contains instructions for what shell commands to run to add software to the image and set it up for use. This is a snippet of the Dockerfile which contains our setup and compilation for Packmol. The equivalent shell script would be:
mkdir /opt/other
mkdir /opt/other/gfortran
cd /opt/other/gfortran
wget https://gfortran.meteodat.ch/download/x86_64/releases/gcc-12.2.0.tar.xz
tar -xJf gcc-12.2.0.tar.xz
LD_LIBRARY_PATH="/opt/other/gfortran/gcc-12.2.0/lib64:$LD_LIBRARY_PATH"
export LD_LIBRARY_PATH
# Get packmol
mkdir /opt/other/packmol
cd /opt/other/packmol
wget https://github.com/m3g/packmol/archive/refs/tags/v20.14.2.tar.gz
tar -xzvf v20.14.2.tar.gz
rm v20.14.2.tar.gz
# Build Packmol
cd /opt/other/packmol/packmol-20.14.2
./configure /opt/other/gfortran/gcc-12.2.0/
make
Sorry me, I didn´t express myself precisely.
I know those are commands I could use in a bash shell, in my distribution. My question is about how to build the docker image you built, and in which you see the error.
So I take that script with the RUN directives, and do what with it?
aha, understood. here's the snippet turned into a full Dockerfile which should create a debian-buster Docker image with packmol installed the way we install it. to use it, download the Dockerfile and run docker build . in the directory containing the file. if you want to make any changes, just edit the Dockerfile and run docker build . again.
Hi @alexhroom. You ever resolve this? I'm getting the same issue when running on slurm.
hi @mshuaibii afraid not, but glad to hear it's not just me!
On my side, I was never able to reproduce the issue. If you need help, I'll need really a step-by-step guide to how to run that. I am not a user of docker, or slurm, so I hit a wall very early in what I try, for instance, there are two packages in Ubuntu that provide a docker command:
sudo apt install podman-docker # version 4.9.3+ds1-1ubuntu0.1, or
sudo apt install docker.io # version 24.0.7-0ubuntu4.1
I cannot try to reproduce the bug by trying these alternatives at random.
Anyway, these are the compilation options in the Makefile:
FLAGS= -O3 --fast-math -march=native -funroll-loops
In any case, I suggest removing all of them, leaving only -O and try. That might solve that issue. Packmol will be slightly slower, but that is probably irrelevant in most cases.
hi @lmiq, I no longer work on the project that was having this issue so I can no longer test it! Here are details if you still want to try and reproduce it (i'm assuming you've already done all the things described above of adding yourself to the docker group etc.):
-
docker.iois the correct package! - download the Dockerfile from this gist: https://gist.github.com/alexhroom/2a63ef74979ff019fa8d807a3288aa2c and save it as a file named
Dockerfilein a new empty directory. - Move to that directory and run the command
docker build -t packmol-bug . - Wait for the container to build.
- Now run
docker run -it packmol-bug /bin/sh - This will run the container and you will be in a shell environment inside the container.
- Now run the packmol example to try and reproduce the bug:
wget https://m3g.github.io/packmol/examples/mixture.inp
wget https://m3g.github.io/packmol/examples/water.pdb
wget https://m3g.github.io/packmol/examples/urea.pdb
packmol < mixture.inp
If you want to try packmol built differently, you can tweak the packmol install commands in the Dockerfile file. As I said, it seems to happen intermittently - Docker containers don't have to be updated often so we repeatedly rebuild until we get a working build then try to run with that as long as possible. I do apologise for how obscure a bug it is!
And these are the options of the devel compilation option make devel:
FLAGS = -Wall -fcheck=bounds -g -fbacktrace -ffpe-trap=zero,overflow,underflow
By using them one normally obtains clearer information about these strange errors. Or the error might not manifest at all, as the illegal instruction sounds something related to the use of --fast-math.
For what is worth, I have just executed all these commands and the bug is not reproducible anymore (I tried both with packmol 20.14.2 and the latest 20.15.1). So I guess it was docker bug?
I'm closing this now, and please report a separate issue (with details), @mshuaibii, if you are experiencing something similar in another context.
Anyway, these are the compilation options in the
Makefile:FLAGS= -O3 --fast-math -march=native -funroll-loopsIn any case, I suggest removing all of them, leaving only
-Oand try. That might solve that issue. Packmol will be slightly slower, but that is probably irrelevant in most cases.
This actually worked for me, only leaving -O. Thank you @lmiq!