Training and Test mAP@50 and mAP50:95 are showing very strange results in fedcv object detection
I have training Cross-Silo Horizontal distributed training mode with following configuration settings:
common_args:
training_type: "cross_silo"
random_seed: 0
scenario: "horizontal"
using_mlops: false
config_version: release
name: "exp" # yolo
project: "runs/train" # yolo
exist_ok: true # yolo
environment_args:
bootstrap: /home/gpuadmin/Project/FedML/python/app/fedcv/object_detection/config/bootstrap.sh
data_args:
dataset: "bdd"
data_cache_dir: ~/fedcv_data
partition_method: "homo"
partition_alpha: 0.5
data_conf: "/home/gpuadmin/Project/FedML/python/app/fedcv/object_detection/data/bdd.yaml" # yolo
img_size: [640, 640] # [640, 640]
model_args:
model: "yolov5" # "yolov5"
class_num: 13
yolo_cfg: "/home/gpuadmin/Project/FedML/python/app/fedcv/object_detection/model/yolov5/models/yolov5s.yaml" # "./model/yolov6/configs/yolov6s.py" # yolo
yolo_hyp: "/home/gpuadmin/Project/FedML/python/app/fedcv/object_detection/config/hyps/hyp.scratch.yaml" # yolo
weights: "none" # "best.pt" # yolo
single_cls: false # yolo
conf_thres: 0.001 # yolo
iou_thres: 0.6 # for yolo NMS
yolo_verbose: true # yolo
train_args:
federated_optimizer: "FedAvg"
client_id_list:
client_num_in_total: 2
client_num_per_round: 2
comm_round: 10
epochs: 4
batch_size: 64
client_optimizer: sgd
lr: 0.01
weight_decay: 0.001
checkpoint_interval: 1
server_checkpoint_interval: 1
validation_args:
frequency_of_the_test: 2
device_args:
worker_num: 2
using_gpu: true
gpu_mapping_file: /home/gpuadmin/Project/FedML/python/app/fedcv/object_detection/config/gpu_mapping.yaml
gpu_mapping_key: mapping_config5_2
gpu_ids: [0,1,2,3,4,5,6,7]
comm_args:
backend: "MQTT_S3"
mqtt_config_path: /home/gpuadmin/Project/FedML/python/app/fedcv/object_detection/config/mqtt_config.yaml
s3_config_path: /home/gpuadmin/Project/FedML/python/app/fedcv/object_detection/config/s3_config.yaml
tracking_args:
log_file_dir: /home/gpuadmin/Project/FedML/python/app/fedcv/object_detection/log
enable_wandb: true
wandb_key: ee0b5f53d949c84cee7decbe7a6
wandb_project: fedml
wandb_name: fedml_torch_object_detection
During the training I have got very high mAP@50 and mAP@50:95 almost for every epoch from beginning till the end. Normally, mAP should be small for early training epochs and it should grow slowly in the later epochs. But in my case it is just fluctuating in range of 0.985 ~ 0.9885 for both clients. I have checked metric calculation functions borrowed from original YOLOv5 PyTorch platform. They are working fine. IF ANYBODY CAN SHARE THEIR PRELIMINARY RESULTS FOR DISTRIBUTED OBJECT DETECTION FOR ANY DATASET (COCO or PASCAL VOC). I would like to verify my result with their result. For solo YOLOv5s model training, ,mAP is much smaller and it is growing epochs by epochs.
Any clue from the Authors would be very much appreciated.
P.S. For mAP calculation, I used default val(train_data, device, args) function inside the YOLOv5Trainer class in the yolov5_trainer.py .