Object Detection API: different results for inference on same input

Open michielva opened this issue 3 years ago • 0 comments

1. The entire URL of the file you are using

https://github.com/tensorflow/models/blob/master/research/object_detection/model_lib_v2.py https://github.com/tensorflow/models/blob/master/research/object_detection/exporter_lib_v2.py

2. Describe the bug

We use the Object Detection API to train a model to find certain objects in images. In this case, we are using EfficientDet-D0 & EfficientDet-D1, for which the weights were downloaded in the Tensorflow Model Zoo. The model is exported to a TF saved model, which is used in tensorflow-serving containers.

The model is working well, but we do notice some weird behaviour during inference. If the model is shown the same input 10 times sequentially, the results are different each of the 10 times. There seems to be some random factor to it, but we do not seem to find what it is. At first, we did not notice it because on images it can predict well, the difference are not that big (confidence score of certain object of 0.995, 0.993, 0.996,...). On images where it is less sure the difference are a lot bigger (confidence score of certain object 0.681, 0.394, 0.512,...).

We have tried certain things:

run the model on GPU or CPU (because maybe there was some kind of optimization on GPU?)
run the model in a tensorflow-serving container or just in a local script (load model + predict)

In each of the cases we see this behaviour where results are different for the same input. With other models (configured and trained with Keras) we do not see this kind of behaviour and results are exactly the same.

I have searched the issues page of the Tensorflow. In the one entry that was similar the issue was that dropout was included in inference too. However, at first sight, this is not the case for us.

3. Steps to reproduce

I can easily run inference tests in the different scenarios (GPU/CPU, tf-serving container/local script) on the same input. For simplicity we test with an image with only one object.

4. Expected behavior

As the model is trained and weights are fixed, we would expect to have the exact same result each time.

5. Additional context

pipeline.config

model {
  ssd {
    num_classes: 1
    image_resizer {
      keep_aspect_ratio_resizer {
        min_dimension: 640
        max_dimension: 640
        pad_to_max_dimension: true
      }
    }
    feature_extractor {
      type: "ssd_efficientnet-b1_bifpn_keras"
      conv_hyperparams {
        regularizer {
          l2_regularizer {
            weight: 4e-05
          }
        }
        initializer {
          truncated_normal_initializer {
            mean: 0.0
            stddev: 0.03
          }
        }
        activation: SWISH
        batch_norm {
          decay: 0.99
          scale: true
          epsilon: 0.001
        }
        force_use_bias: true
      }
      bifpn {
        min_level: 3
        max_level: 7
        num_iterations: 4
        num_filters: 88
      }
    }
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 1.0
        x_scale: 1.0
        height_scale: 1.0
        width_scale: 1.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
        use_matmul_gather: true
      }
    }
    similarity_calculator {
      iou_similarity {
      }
    }
    box_predictor {
      weight_shared_convolutional_box_predictor {
        conv_hyperparams {
          regularizer {
            l2_regularizer {
              weight: 4e-05
            }
          }
          initializer {
            random_normal_initializer {
              mean: 0.0
              stddev: 0.01
            }
          }
          activation: SWISH
          batch_norm {
            decay: 0.99
            scale: true
            epsilon: 0.001
          }
          force_use_bias: true
        }
        depth: 88
        num_layers_before_predictor: 3
        kernel_size: 3
        class_prediction_bias_init: -4.6
        use_depthwise: true
      }
    }
    anchor_generator {
      multiscale_anchor_generator {
        min_level: 3
        max_level: 7
        anchor_scale: 4.0
        aspect_ratios: 1.0
        aspect_ratios: 2.0
        aspect_ratios: 0.5
        scales_per_octave: 3
      }
    }
    post_processing {
      batch_non_max_suppression {
        score_threshold: 1e-08
        iou_threshold: 0.5
        max_detections_per_class: 100
        max_total_detections: 100
      }
      score_converter: SIGMOID
    }
    normalize_loss_by_num_matches: true
    loss {
      localization_loss {
        weighted_smooth_l1 {
        }
      }
      classification_loss {
        weighted_sigmoid_focal {
          gamma: 1.5
          alpha: 0.25
        }
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
    encode_background_as_zeros: true
    normalize_loc_loss_by_codesize: true
    inplace_batchnorm_update: true
    freeze_batchnorm: false
    add_background_class: false
  }
}
train_config {
  batch_size: {{ train_batch_size }}
  sync_replicas: true
  optimizer {
    adam_optimizer {
      learning_rate {
        cosine_decay_learning_rate {
          learning_rate_base: 0.0001
          total_steps: 60000
          warmup_learning_rate: 1e-05
          warmup_steps: 5000
          hold_base_rate_steps: 3000
        }
      }
    }
    use_moving_average: false
  }
  fine_tune_checkpoint: "{{ pretrained_model_dir }}/checkpoint/ckpt-0"
  num_steps: {{ train_num_steps }}
  startup_delay_steps: 0.0
  replicas_to_aggregate: 8
  max_number_of_boxes: 1
  unpad_groundtruth_tensors: false
  fine_tune_checkpoint_type: "detection"
  retain_original_images: true
  use_bfloat16: false
  fine_tune_checkpoint_version: V2
}
train_input_reader {
  label_map_path: "{{ labelmap_path }}"
  tf_record_input_reader {
    input_path: "{{ tfrecord_train_path }}"
  }
}
eval_config {
  metrics_set: "coco_detection_metrics"
  use_moving_averages: false
  batch_size: 1
}
eval_input_reader {
  label_map_path: "{{ labelmap_path }}"
  shuffle: false
  num_epochs: 1
  tf_record_input_reader {
    input_path: "{{ tfrecord_val_path }}"
  }
}

6. System information

OS Platform and Distribution: Linux Ubuntu 20.04 / Windows 10
TensorFlow installed from (source or binary): binary
TensorFlow version (use command below): 2.10.0
Python version: 3.8
CUDA/cuDNN version: CUDA v11.4
GPU model and memory: NVIDIA Quadro RTX 6000

Dec 20 '22 15:12 michielva