imgutils icon indicating copy to clipboard operation
imgutils copied to clipboard

use multiprocessing cause error

Open segalinc opened this issue 1 year ago • 2 comments

I am trying to use the library for a large dataset I am setting up a multiprocessing Pool to speed up the processing However, for example for the function detect_censors I get this error

RuntimeError: [/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:121](https://vscode-remote+ssh-002dremote-002bwb1a10.vscode-resource.vscode-cdn.net/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:121) std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] [/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:114](https://vscode-remote+ssh-002dremote-002bwb1a10.vscode-resource.vscode-cdn.net/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:114) std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 100: no CUDA-capable device is detected ; GPU=-1905859077 ; hostname=2edfd084-8003-4bed-a5e6-d03d1198eede ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=236 ; expr=cudaSetDevice(info_.device_id);

any idea?

segalinc avatar Mar 06 '24 21:03 segalinc

@segalinc im not able to reproduce this error

from concurrent.futures import ProcessPoolExecutor

from imgutils.tagging import get_wd14_tags
from test.testings import get_testfile


def f(i):
    print(f'start {i}')
    rating, tags, chars = get_wd14_tags(get_testfile('nude_girl.png'), drop_overlap=True)
    print(f'end {i}')


if __name__ == '__main__':
    ex = ProcessPoolExecutor(max_workers=4)
    for i in range(8):
        ex.submit(f, i)

    ex.shutdown()

works fine on onnxruntime-gpu 1.17.1, can u provide your parallel code?

narugo1992 avatar Mar 15 '24 08:03 narugo1992

now this has been fixed with new function ts_lru_cache.

but the yolo model parallel running will still cause cuda error.

i think that should be an error of cuda.

narugo1992 avatar Mar 01 '25 08:03 narugo1992