Retinanet-Tutorial icon indicating copy to clipboard operation
Retinanet-Tutorial copied to clipboard

Error occurred when finalizing GeneratorDataset iterator: FAILED_PRECONDITION: Python interpreter state is not initialized. The process may be terminated.

Open lucysumi opened this issue 1 year ago • 2 comments

Hi, I've been really stuck for days trying to figure this problem but couldn't..

So the training stops after like 15-17 epochs (I'm trying to run 300 epochs for my work), and displays this error message: Epoch 14: ReduceLROnPlateau reducing learning rate to 9.999999747378752e-07. 50/50 - 32s - loss: 2.0636 - regression_loss: 1.8629 - classification_loss: 0.2007 - mAP: 0.6838 - lr: 1.0000e-05 - 32s/epoch - 649ms/step 2024-05-01 12:26:14.969075: W tensorflow/core/kernels/data/generator_dataset_op.cc:108] Error occurred when finalizing GeneratorDataset iterator: FAILED_PRECONDITION: Python interpreter state is not initialized. The process may be terminated. [[{{node PyFunc}}]] #7 and #9 doesn't help. I did try the --step suggestion you mentioned but still didn't do much. At the max, I was able to run additional 3-4 epochs. i tried --steps 100, 50 and 10 .

Please help? @jaspereb

lucysumi avatar May 01 '24 12:05 lucysumi

hi @lucysumi , I've not maintained this repo in a long time, the versioning was not done very well to begin with so it might be an environment thing.

Are you using compatible and up to date versions of all the key libraries?

Have you checked you're not running out of RAM/VRAM?

Are you running it on the provided data or your own?

Have you made any code changes?

jaspereb avatar May 02 '24 01:05 jaspereb

hi @lucysumi , I've not maintained this repo in a long time, the versioning was not done very well to begin with so it might be an environment thing.

Are you using compatible and up to date versions of all the key libraries?

Have you checked you're not running out of RAM/VRAM?

Are you running it on the provided data or your own?

Have you made any code changes?

Hi, thanks for the quick reply.

so heres my configuration: NVIDIA RTX 2020Ti CUDA 11.2.0 CUDNN 8.1 Python 3.8 Tensorflow-gpu 2.10 I have enough RAM space of 64GB. Yes, I'm trying on my custom dataset. And no, I haven't changed any codes.

I followed this tutorial exactly to implement Retinanet and this is the error I'm getting https://www.youtube.com/watch?v=mr8Y_Nuxciw Screenshot (199)

Should I be upgrading/downgrading tensoflow-gpu? The batch-size, steps, tip didn't work for me. I tried, Could you please tell what I've missed here.. :-( or do you have any alternate tutorials suggestion?

lucysumi avatar May 02 '24 04:05 lucysumi