DLKcat on Mac with Apple Silicon (arm64 architecture)
Hi, I’m using a Mac to run GECKO and encountered the following error when executing runDLKcat():WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
To address this, I modified the Docker command in runDLKcat.m to: status = system(['docker run --platform linux/amd64 --rm -v "' fullfile(params.path,'/data') '":/data ghcr.io/sysbiochalmers/dlkcat-gecko:0.1 /bin/bash -c "python DLKcat.py /data/DLKcat.tsv /data/DLKcatOutput.tsv"']);
However, this results in an extremely long runtime—more than 20 hours for processing a single sample. Are there any recommendations or solutions to improve the performance? Thanks!
I don't have access to a Mac with Apple Silicon CPU, but I imagine that this might be where the problem lies. @simas232 have you run DLKcat on Apple Silicon CPU?
You can use servbay instead of docker, which is better than docker on mac
Had a quick glance at that, it seems servbay is meant for a different purpose.
That being said, the root cause has been correctly identified - the Docker imagine would need to be updated for wider support.
I gave this a go by running docker buildx build --platform linux/amd64,linux/arm64/v8 . and encountered:
=> ERROR [linux/arm64 3/3] RUN pip install --no-cache-dir -r requirements.txt torch@https://download.pytorch.org/whl/cpu/torch-1.9.1%2Bcpu-cp39-cp39-linux_x86_64.whl
There is an error while installing the dependencies from requirements.txt for the arm64 platform, so it looks like this needs more investigation and it won't be 5 min job. Anyone wanting to give this a go?
I don't have access to a Mac with Apple Silicon CPU, but I imagine that this might be where the problem lies. @simas232 have you run DLKcat on Apple Silicon CPU?
DLKcat works fine on MacBook M1
You could otherwise try to make your own Docker with the following code. In src/dlkcat-gecko/, change Dockerfile to:
FROM python:3.9-slim
LABEL org.opencontainers.image.source=https://github.com/sysbiochalmers/gecko
LABEL version="0.2-arm"
LABEL description="Custom Docker image of SysBioChalmers/DKLcat adapted for SysBioChalmers/GECKO version 3"
COPY . .
RUN pip install --no-cache-dir -r requirements.txt
RUN pip install torch==1.9.0 --index-url https://download.pytorch.org/whl/cpu
and change requirements.txtto:
scikit-learn>=0.23.2
Biopython==1.78
rdkit-pypi
pandas
SciPy
NumPy<2
Then, while in dlkcat-gecko, run the following line in your Terminal:
docker buildx build -t ghcr.io/sysbiochalmers/dlkcat-gecko:0.2-arm --platform linux/arm64/v8 .
Finally, in runDLKcat.m line 52, you should refer to the right Docker image by mentioning the LABEL version= string that was specified in Dockerfile (note 0.2-arm):
status = system(['docker run --rm -v "' fullfile(params.path,'/data') '":/data ghcr.io/sysbiochalmers/dlkcat-gecko:0.2-arm /bin/bash -c "python DLKcat.py /data/tempDLKcat.tsv /data/tempDLKcatOutput.tsv"']);
If I tried this on a Windows 10 PC, the resulting image was less than 300 MB, which is substantially less than the 1.9 GB that the current Docker has. So I strongly doubt that my attempt worked. But maybe you have more successful by directly running this on a M1-4 chip.
You could otherwise try to make your own Docker with the following code. In
src/dlkcat-gecko/, changeDockerfileto:FROM python:3.9-slim LABEL org.opencontainers.image.source=https://github.com/sysbiochalmers/gecko LABEL version="0.2-arm" LABEL description="Custom Docker image of SysBioChalmers/DKLcat adapted for SysBioChalmers/GECKO version 3" COPY . . RUN pip install --no-cache-dir -r requirements.txt RUN pip install torch==1.9.0 --index-url https://download.pytorch.org/whl/cpuand change
requirements.txtto:scikit-learn>=0.23.2 Biopython==1.78 rdkit-pypi pandas SciPy NumPy<2Then, while in
dlkcat-gecko, run the following line in your Terminal:
docker buildx build -t ghcr.io/sysbiochalmers/dlkcat-gecko:0.2-arm --platform linux/arm64/v8 .Finally, in
runDLKcat.mline 52, you should refer to the right Docker image by mentioning theLABEL version=string that was specified inDockerfile(note0.2-arm):
status = system(['docker run --rm -v "' fullfile(params.path,'/data') '":/data ghcr.io/sysbiochalmers/dlkcat-gecko:0.2-arm /bin/bash -c "python DLKcat.py /data/tempDLKcat.tsv /data/tempDLKcatOutput.tsv"']);If I tried this on a Windows 10 PC, the resulting image was less than 300 MB, which is substantially less than the 1.9 GB that the current Docker has. So I strongly doubt that my attempt worked. But maybe you have more successful by directly running this on a M1-4 chip.
Hi, many thanks for your reply, I made the modifications accordingly, but encountered the following error when running runDLKcat():
Running DLKcat prediction, this may take many minutes, especially the first time.
Traceback (most recent call last):
File "//DLKcat.py", line 26, in <module>
fingerprint_dict = load_pickle('input/fingerprint_dict.pickle')
File "//DLKcat.py", line 24, in load_pickle
return pickle.load(f)
_pickle.UnpicklingError: invalid load key, 'v'.
Error using runDLKcat
DLKcat encountered an error or it did not create any output file.
As I don't have a Mac with Apple Silicon CPU, I unfortunately cannot give further support. But from this earlier comment it appears that the original Docker image should work on at least M1 CPUs. Maybe you want to look into OrbStack? Note that you would then also want to change the runDLKcat function to use OrbStack instead. Again, I can give no support for this.
As I don't have a Mac with Apple Silicon CPU, I unfortunately cannot give further support. But from this earlier comment it appears that the original Docker image should work on at least M1 CPUs. Maybe you want to look into OrbStack? Note that you would then also want to change the
runDLKcatfunction to use OrbStack instead. Again, I can give no support for this.
Thanks for the reply! I’ll give it a try and see how it goes.
If I tried this on a Windows 10 PC, the resulting image was less than 300 MB, which is substantially less than the 1.9 GB that the current Docker has.
Thanks for trying this out @edkerk. I, too, find the size difference very surprising. In any case, your approach has triggered my curiosity, so I've managed to push a multiarch version of the image that was build for arm64 and amd64.
@Mengzhen-Li-sw it would be great if you could give it a try https://github.com/SysBioChalmers/GECKO/pkgs/container/dlkcat-gecko/291906062?tag=0.1-multiarch
Note: I haven't tested this at all - my M3 machine is not set up with Matlab.
@mihai-sysbio It doesn't work on PC, but this is because it uses numpy 2. Please see my suggested changes to requirements.txt and Dockerfile, these are also required for amd86.
Fantastic you caught that, I've applied the fix, see #395 . In the meantime, I've deleted the published 0.1-multiarch and a new one is being uploaded.
edit: upload finished, please test again
It didn't work, I get the error message:
Traceback (most recent call last):
File "//DLKcat.py", line 26, in <module>
fingerprint_dict = load_pickle('input/fingerprint_dict.pickle')
File "//DLKcat.py", line 24, in load_pickle
return pickle.load(f)
_pickle.UnpicklingError: invalid load key, 'v'.
When I run with 0.1 instead of 0.1-multiarch there is no problem, so the solution I first thought I had found (#396) does not resolve this.
It didn't work, I get the error message:
Traceback (most recent call last): File "//DLKcat.py", line 26, in <module> fingerprint_dict = load_pickle('input/fingerprint_dict.pickle') File "//DLKcat.py", line 24, in load_pickle return pickle.load(f) _pickle.UnpicklingError: invalid load key, 'v'.When I run with
0.1instead of0.1-multiarchthere is no problem, so the solution I first thought I had found (#396) does not resolve this.
I also tried on Apple Silicon and encountered the same error.
I also tried on Apple Silicon and encountered the same error.
Thanks @SilentWaveSW for confirming - could you please follow in #396?
The "solution" in #396 does not work. In that Issue there is a comment:
I'm wondering if it could have something to do with the new packages that are used in obtaining the new image.
I can test this by again making a 0.1-amd64-only container, but if these changes are made to the Dockerfile and requirements.txt it should work.