oneCCL icon indicating copy to clipboard operation
oneCCL copied to clipboard

device_dir hard code to /dev/dri/by-path/

Open zhouyu5 opened this issue 1 year ago • 2 comments

Hi, developers, I found the variable device_dir is hard coded to /dev/dri/by-path/ (see code here), in most case, this will not be a problem, but in some case, it may not work well.

Take mine as example, I set up the environment in docker container, and I start the container with the following command: docker run --device=/dev/dri ..., then after launch the training, I will met the problem: RuntimeError: oneCCL: ze_fd_manager.cpp:143 init_device_fds: EXCEPTION: opendir failed: could not open device directory, since the device_dir is hard coded to /dev/dri/by-path/, but the docker container only map the /dev/dri from host machine without map the subfolder by-path, thus there is not such a /dev/dri/by-path/ in container, thus causing the problem.

I am not sure if I explain it clearly. Could you please share some of your thoughts of the problem?

zhouyu5 avatar Jun 03 '24 01:06 zhouyu5

Hello @zhouyu5, thanks. Your case makes sense to us, we would like to implement an environment variable through which you can specify the path to device_dir, but by default it'll be standard path which is /dev/dri/by-path/. It will help to address your issue.

nikitaxgusev avatar Jun 11 '24 16:06 nikitaxgusev

Thanks, sounds great.

zhouyu5 avatar Jun 12 '24 00:06 zhouyu5

Hello @zhouyu5, We've tried to address your concern. May you try this variables to avoid your issue and give the feedback? Thank you.

CCL_DRMFD_DEV_RENDER_DIR_PATH

Set the directory path for DRM render devices.
This environment variable specifies the directory path where DRM render devices are located\&.
Example value: '/custom/path/to/devices/'

By-default: '/dev/dri/by-path/' 

CCL_DRMFD_DEV_RENDER_SUFFIX

Set the suffix for DRM render device names.
This environment variable specifies the suffix to be used when searching for DRM render device names.

Example value: '-customsuffix'

By-default: '-render' 

nikitaxgusev avatar Aug 09 '24 13:08 nikitaxgusev

@zhouyu5 Hello, friendly ping :)

nikitaxgusev avatar Aug 20 '24 07:08 nikitaxgusev

I am closing this issue since the request was addressed and now it is available for usage.

nikitaxgusev avatar Sep 19 '24 10:09 nikitaxgusev