ffcv Installation issues

We're trying to replicate the results from the FFCV paper, and are having difficulty setting up a working environment. The suggested conda install command appears to hang (no progress after 2 hours, 100% CPU usage) using a fresh conda installation (as suggested by #85). The suggested troubleshooting tips for a conda install made no apparent change.

We were able to build and run the provided conda-less dockerfile, however we're still unable to use FFCV, as seen below.

root@ac2fad055eeb:/workspace# python
Python 3.8.10 (default, May 26 2023, 14:05:08)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ffcv
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/dist-packages/ffcv/__init__.py", line 1, in <module>
    from .loader import Loader
  File "/usr/local/lib/python3.8/dist-packages/ffcv/loader/__init__.py", line 1, in <module>
    from .loader import Loader, OrderOption
  File "/usr/local/lib/python3.8/dist-packages/ffcv/loader/loader.py", line 14, in <module>
    from ffcv.fields.base import Field
  File "/usr/local/lib/python3.8/dist-packages/ffcv/fields/__init__.py", line 3, in <module>
    from .rgb_image import RGBImageField
  File "/usr/local/lib/python3.8/dist-packages/ffcv/fields/rgb_image.py", line 5, in <module>
    import cv2
  File "/usr/local/lib/python3.8/dist-packages/cv2/__init__.py", line 181, in <module>
    bootstrap()
  File "/usr/local/lib/python3.8/dist-packages/cv2/__init__.py", line 175, in bootstrap
    if __load_extra_py_code_for_module("cv2", submodule, DEBUG):
  File "/usr/local/lib/python3.8/dist-packages/cv2/__init__.py", line 28, in __load_extra_py_code_for_module
    py_module = importlib.import_module(module_name)
  File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/usr/local/lib/python3.8/dist-packages/cv2/typing/__init__.py", line 169, in <module>
    LayerId = cv2.dnn.DictValue
AttributeError: module 'cv2.dnn' has no attribute 'DictValue'

System details:

Ubuntu 20.04
kernel 5.4.0-91-generic
conda 23.7.2
Intel Xeon Gold 6126 CPU @ 2.60GHz (2 sockets)
Nvidia RTX6000 (CUDA 11.2)
192GB RAM

Any suggestions?

Aug 23 '23 23:08 gustrain

Hi @gustrain ! What conda command are you using to install (the one that hangs?)

Aug 23 '23 23:08 andrewilyas

Hi @andrewilyas -- thanks for the quick reply!

I'm running conda create -y -n ffcv python=3.9 cupy pkg-config libjpeg-turbo opencv pytorch torchvision cudatoolkit=11.3 numba -c pytorch -c conda-forge as suggested in the FFCV readme.

Aug 23 '23 23:08 gustrain

Interesting, that command seems to work for me, with the difference being that I am on CUDA 11.6. I'm not 100% sure but there might be a compatibility issue between PyTorch 2.0 and CUDA 11.2 - can you try updating CUDA to 11.6 and see if the issue persists?

Aug 24 '23 00:08 andrewilyas

I updated CUDA, but this unfortunately did not seem to make any difference. I'll see if it just needs a bit more time, but as of right now it just seems to be spinning on "Solving environment," as it was doing before.

How long should the installation take when successful?

Aug 24 '23 00:08 gustrain

So it should terminate in like a day or so, but when things are working properly it usually takes 30 minutes. The super long installation is something we experienced a few versions ago but should have been fixed a while back now. What version of CUDA are you on now? If it's not too much trouble, can you try separating out the steps? So first installing pytorch using the instructions from pytorch.org, and then running conda install cupy pkg-config libjpeg-turbo opencv numba -c conda-forge?

Aug 24 '23 00:08 andrewilyas

@andrewilyas thanks for the package! I was wondering if ffcv is now compatible with newer versions of python (3.11) and torch 2.0?

Dec 10 '23 10:12 aegonwolf