models icon indicating copy to clipboard operation
models copied to clipboard

training stops proceeding

Open ftnabil97 opened this issue 2 years ago • 2 comments

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • [y] I am using the latest TensorFlow Model Garden release and TensorFlow 2.
  • [y] I am reporting the issue to the correct repository. (Model Garden official or research directory)
  • [y] I checked to make sure that this issue has not already been filed.

1. The entire URL of the file you are using

https://github.com/tensorflow/models/blob/master/research/object_detection/model_main_tf2.py

2. Describe the bug

I used ssd_resnet50_v1_fpn_640x640_coco17_tpu model for training, but the training stops and gives the following instructions. Use fn_output_signature instead I0604 08:48:51.263917 140179947910912 api.py:459] feature_map_spatial_dims: [(80, 80), (40, 40), (20, 20), (10, 10), (5, 5)] I0604 08:49:02.712822 140179947910912 api.py:459] feature_map_spatial_dims: [(80, 80), (40, 40), (20, 20), (10, 10), (5, 5)] I0604 08:49:10.045338 140179947910912 api.py:459] feature_map_spatial_dims: [(80, 80), (40, 40), (20, 20), (10, 10), (5, 5)] I0604 08:49:20.309416 140179947910912 api.py:459] feature_map_spatial_dims: [(80, 80), (40, 40), (20, 20), (10, 10), (5, 5)]

3. Steps to reproduce

1, installing tensorflow 2. cloning github model 3. protoc 4. cocoapi 5. compile 6. object detection api steps 7. training model extract 8. .pbtxt, .record file creation 9. pipeline.config update 10. run training

4. Expected behavior

A successfully trained api

5. Additional context

Include any logs that would be helpful to diagnose the problem.

6. System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows
  • Mobile device name if the issue happens on a mobile device:
  • TensorFlow installed from (source or binary):
  • TensorFlow version (use command below):2.12.0
  • Python version:3
  • Bazel version (if compiling from source):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version:
  • GPU model and memory:no gpu. ran it from colab

ftnabil97 avatar Jun 05 '23 06:06 ftnabil97

I'm facing this problem too, apparently its a RAM limitation. In my case i'm using COLAB free tier, and it just started working when i lowed to batch_size 32 in the config file for the model.

matheusschreiber avatar Jun 09 '23 00:06 matheusschreiber

Hey, yea I had to drop my batch_size from 64 to 4, for my model to start training! I am probably going to upgrade to COLAB Pro soon!

M1Z8N avatar Oct 09 '23 03:10 M1Z8N