Object Detection API: different results for inference on same input
1. The entire URL of the file you are using
https://github.com/tensorflow/models/blob/master/research/object_detection/model_lib_v2.py https://github.com/tensorflow/models/blob/master/research/object_detection/exporter_lib_v2.py
2. Describe the bug
We use the Object Detection API to train a model to find certain objects in images. In this case, we are using EfficientDet-D0 & EfficientDet-D1, for which the weights were downloaded in the Tensorflow Model Zoo. The model is exported to a TF saved model, which is used in tensorflow-serving containers.
The model is working well, but we do notice some weird behaviour during inference. If the model is shown the same input 10 times sequentially, the results are different each of the 10 times. There seems to be some random factor to it, but we do not seem to find what it is. At first, we did not notice it because on images it can predict well, the difference are not that big (confidence score of certain object of 0.995, 0.993, 0.996,...). On images where it is less sure the difference are a lot bigger (confidence score of certain object 0.681, 0.394, 0.512,...).
We have tried certain things:
- run the model on GPU or CPU (because maybe there was some kind of optimization on GPU?)
- run the model in a tensorflow-serving container or just in a local script (load model + predict)
In each of the cases we see this behaviour where results are different for the same input. With other models (configured and trained with Keras) we do not see this kind of behaviour and results are exactly the same.
I have searched the issues page of the Tensorflow. In the one entry that was similar the issue was that dropout was included in inference too. However, at first sight, this is not the case for us.
3. Steps to reproduce
I can easily run inference tests in the different scenarios (GPU/CPU, tf-serving container/local script) on the same input. For simplicity we test with an image with only one object.
4. Expected behavior
As the model is trained and weights are fixed, we would expect to have the exact same result each time.
5. Additional context
pipeline.config
model {
ssd {
num_classes: 1
image_resizer {
keep_aspect_ratio_resizer {
min_dimension: 640
max_dimension: 640
pad_to_max_dimension: true
}
}
feature_extractor {
type: "ssd_efficientnet-b1_bifpn_keras"
conv_hyperparams {
regularizer {
l2_regularizer {
weight: 4e-05
}
}
initializer {
truncated_normal_initializer {
mean: 0.0
stddev: 0.03
}
}
activation: SWISH
batch_norm {
decay: 0.99
scale: true
epsilon: 0.001
}
force_use_bias: true
}
bifpn {
min_level: 3
max_level: 7
num_iterations: 4
num_filters: 88
}
}
box_coder {
faster_rcnn_box_coder {
y_scale: 1.0
x_scale: 1.0
height_scale: 1.0
width_scale: 1.0
}
}
matcher {
argmax_matcher {
matched_threshold: 0.5
unmatched_threshold: 0.5
ignore_thresholds: false
negatives_lower_than_unmatched: true
force_match_for_each_row: true
use_matmul_gather: true
}
}
similarity_calculator {
iou_similarity {
}
}
box_predictor {
weight_shared_convolutional_box_predictor {
conv_hyperparams {
regularizer {
l2_regularizer {
weight: 4e-05
}
}
initializer {
random_normal_initializer {
mean: 0.0
stddev: 0.01
}
}
activation: SWISH
batch_norm {
decay: 0.99
scale: true
epsilon: 0.001
}
force_use_bias: true
}
depth: 88
num_layers_before_predictor: 3
kernel_size: 3
class_prediction_bias_init: -4.6
use_depthwise: true
}
}
anchor_generator {
multiscale_anchor_generator {
min_level: 3
max_level: 7
anchor_scale: 4.0
aspect_ratios: 1.0
aspect_ratios: 2.0
aspect_ratios: 0.5
scales_per_octave: 3
}
}
post_processing {
batch_non_max_suppression {
score_threshold: 1e-08
iou_threshold: 0.5
max_detections_per_class: 100
max_total_detections: 100
}
score_converter: SIGMOID
}
normalize_loss_by_num_matches: true
loss {
localization_loss {
weighted_smooth_l1 {
}
}
classification_loss {
weighted_sigmoid_focal {
gamma: 1.5
alpha: 0.25
}
}
classification_weight: 1.0
localization_weight: 1.0
}
encode_background_as_zeros: true
normalize_loc_loss_by_codesize: true
inplace_batchnorm_update: true
freeze_batchnorm: false
add_background_class: false
}
}
train_config {
batch_size: {{ train_batch_size }}
sync_replicas: true
optimizer {
adam_optimizer {
learning_rate {
cosine_decay_learning_rate {
learning_rate_base: 0.0001
total_steps: 60000
warmup_learning_rate: 1e-05
warmup_steps: 5000
hold_base_rate_steps: 3000
}
}
}
use_moving_average: false
}
fine_tune_checkpoint: "{{ pretrained_model_dir }}/checkpoint/ckpt-0"
num_steps: {{ train_num_steps }}
startup_delay_steps: 0.0
replicas_to_aggregate: 8
max_number_of_boxes: 1
unpad_groundtruth_tensors: false
fine_tune_checkpoint_type: "detection"
retain_original_images: true
use_bfloat16: false
fine_tune_checkpoint_version: V2
}
train_input_reader {
label_map_path: "{{ labelmap_path }}"
tf_record_input_reader {
input_path: "{{ tfrecord_train_path }}"
}
}
eval_config {
metrics_set: "coco_detection_metrics"
use_moving_averages: false
batch_size: 1
}
eval_input_reader {
label_map_path: "{{ labelmap_path }}"
shuffle: false
num_epochs: 1
tf_record_input_reader {
input_path: "{{ tfrecord_val_path }}"
}
}
6. System information
- OS Platform and Distribution: Linux Ubuntu 20.04 / Windows 10
- TensorFlow installed from (source or binary): binary
- TensorFlow version (use command below): 2.10.0
- Python version: 3.8
- CUDA/cuDNN version: CUDA v11.4
- GPU model and memory: NVIDIA Quadro RTX 6000