Loss increases with training different stages
Training - Stage 1
print("Training network heads") model.train(dataset_train, dataset_val, learning_rate=config.LEARNING_RATE, epochs=40, layers='heads',augmentation=augmentation)
Training - Stage 2
model.train(dataset_train, dataset_val, learning_rate=config.LEARNING_RATE/10, epochs=120, layers='4+', augmentation=augmentation)
Fine tune all layers
Training - Stage 3
print("Fine tune all layers") model.train(dataset_train, dataset_val, learning_rate=config.LEARNING_RATE / 10, epochs=160, layers='all',augmentation=augmentation)
The image size is 512x512.
2 or more classes
2080 Ti GPU-1
Tried with imagnet and coco weight files because the size of detections are small.

With each different stage of training, the loss suddenly increases. I didn't find this issue in matterport/maskrcnn with TF 1.14 with similar stages. All the parameters used are the same but the loss is increasing with stages. Can you please explain why this could happen? Is it because of the optimizer?