Transformers-Tutorials icon indicating copy to clipboard operation
Transformers-Tutorials copied to clipboard

DetrObjectDetector training leads to `OS Error[24]: Too many open files` when training with custom dataset

Open nasheedyasin opened this issue 3 years ago • 1 comments

When I try to train DetrObjectDetector (following the notebook here) on a custom dataset I always get OS Error[24]: Too many open files after about 7-8 epochs.

My system information:

OS: Windows 10 21H2 build GPU: nVidia RTX A4000 (16GB GDDR6) RAM: 64 GB CPU: Intel i7 11700K

Lib information:

CUDA Version: 11.3 Pytorch Version: 1.11+cu113 Pytorch-Lightning Version: 1.6 Transformers Version: 4.19.4

My DataLoader operates in the single-threaded default mode (num_processes=0).

PS: I initially thought it was to do with the data-loading being multi-threaded, so have followed this StackOverflow suggestion, to no avail. Have also done as suggested here and here

nasheedyasin avatar Jul 01 '22 10:07 nasheedyasin

The problem so far seems to be with the pytorch_lightning.loggers.tensorboard callback. Removing this worked for me as a temporary workaround. Though I think this is resolved in the latest version (1.6.4) of pytorch_lightning.

nasheedyasin avatar Jul 05 '22 10:07 nasheedyasin