device_dir hard code to /dev/dri/by-path/
Hi, developers, I found the variable device_dir is hard coded to /dev/dri/by-path/ (see code here), in most case, this will not be a problem, but in some case, it may not work well.
Take mine as example, I set up the environment in docker container, and I start the container with the following command:
docker run --device=/dev/dri ..., then after launch the training, I will met the problem:
RuntimeError: oneCCL: ze_fd_manager.cpp:143 init_device_fds: EXCEPTION: opendir failed: could not open device directory, since the device_dir is hard coded to /dev/dri/by-path/, but the docker container only map the /dev/dri from host machine without map the subfolder by-path, thus there is not such a /dev/dri/by-path/ in container, thus causing the problem.
I am not sure if I explain it clearly. Could you please share some of your thoughts of the problem?
Hello @zhouyu5, thanks. Your case makes sense to us, we would like to implement an environment variable through which you can specify the path to device_dir, but by default it'll be standard path which is /dev/dri/by-path/. It will help to address your issue.
Thanks, sounds great.
Hello @zhouyu5, We've tried to address your concern. May you try this variables to avoid your issue and give the feedback? Thank you.
CCL_DRMFD_DEV_RENDER_DIR_PATH
Set the directory path for DRM render devices.
This environment variable specifies the directory path where DRM render devices are located\&.
Example value: '/custom/path/to/devices/'
By-default: '/dev/dri/by-path/'
CCL_DRMFD_DEV_RENDER_SUFFIX
Set the suffix for DRM render device names.
This environment variable specifies the suffix to be used when searching for DRM render device names.
Example value: '-customsuffix'
By-default: '-render'
@zhouyu5 Hello, friendly ping :)
I am closing this issue since the request was addressed and now it is available for usage.