Integration tests may produce different accuracy outputs
As outlined in https://github.com/mlcommons/mobile_app_closed/pull/21#issuecomment-2681642863 by @mohitmundhragithub, the CI test might yield different results for performance and accuracy modes.
Performance mode:
Line 479: [image_classification_v2: performance mode] result: NativeRunResult(accuracy:0.8999999761581421, accuracy2:null) Line 592: [object_detection: performance mode] result: NativeRunResult(accuracy:0.3445338010787964, accuracy2:null) Line 710: [image_segmentation_v2: performance mode] result: NativeRunResult(accuracy:0.3669957220554352, accuracy2:null) Line 833: [natural_language_processing: performance mode] result: NativeRunResult(accuracy:1.0, accuracy2:null) Line 957: [super_resolution: performance mode] result: NativeRunResult(accuracy:0.33657199144363403, accuracy2:null) Line 1091: [image_classification_offline_v2: performance mode] result: NativeRunResult(accuracy:0.4000000059604645, accuracy2:null)
Accuracy Mode:
Line 1204: [image_classification_v2: accuracy mode] result: NativeRunResult(accuracy:0.0, accuracy2:null) Line 1309: [object_detection: accuracy mode] result: NativeRunResult(accuracy:0.3445338010787964, accuracy2:null) Line 1422: [image_segmentation_v2: accuracy mode] result: NativeRunResult(accuracy:0.0, accuracy2:null) Line 1537: [natural_language_processing: accuracy mode] result: NativeRunResult(accuracy:1.0, accuracy2:null) Line 1654: [super_resolution: accuracy mode] result: NativeRunResult(accuracy:0.05482751503586769, accuracy2:null) Line 1783: [image_classification_offline_v2: accuracy mode] result: NativeRunResult(accuracy:0.0, accuracy2:null)
Seems like the accuracy mode being run on the device has some issues. during the performance mode, the accuracy results seems reasonable, but during accuracy mode, the results are all messed up. for few its 0, and for few others seems good.
We should determine whether this behavior is normal or if there's an issue with it.
This issue is seen only for the CI tests. For normal submission mode, it seems okay.
The accuracy on the S25 Ultra appears to be ok. Therefore, the potential issue might be specific to the device.
'03-25 06:34:49.133 I/flutter (29112): [image_classification_v2: performance mode] result: NativeRunResult(accuracy:0.8399999737739563, accuracy2:null)',
'03-25 06:35:00.271 I/flutter (29112): [object_detection: performance mode] result: NativeRunResult(accuracy:0.3445338010787964, accuracy2:null)',
'03-25 06:35:11.587 I/flutter (29112): [image_segmentation_v2: performance mode] result: NativeRunResult(accuracy:0.3669957220554352, accuracy2:null)',
'03-25 06:35:23.393 I/flutter (29112): [natural_language_processing: performance mode] result: NativeRunResult(accuracy:1.0, accuracy2:null)',
'03-25 06:35:34.547 I/flutter (29112): [super_resolution: performance mode] result: NativeRunResult(accuracy:0.33657199144363403, accuracy2:null)',
'03-25 06:35:47.043 I/flutter (29112): [image_classification_offline_v2: performance mode] result: NativeRunResult(accuracy:0.4000000059604645, accuracy2:null)',
'03-25 06:35:48.137 I/flutter (29112): [image_classification_v2: accuracy mode] result: NativeRunResult(accuracy:0.8399999737739563, accuracy2:null)',
'03-25 06:35:48.239 I/flutter (29112): [object_detection: accuracy mode] result: NativeRunResult(accuracy:0.3445338010787964, accuracy2:null)',
'03-25 06:35:49.000 I/flutter (29112): [image_segmentation_v2: accuracy mode] result: NativeRunResult(accuracy:0.3669957220554352, accuracy2:null)',
'03-25 06:35:49.804 I/flutter (29112): [natural_language_processing: accuracy mode] result: NativeRunResult(accuracy:1.0, accuracy2:null)',
'03-25 06:35:50.289 I/flutter (29112): [super_resolution: accuracy mode] result: NativeRunResult(accuracy:0.33657199144363403, accuracy2:null)',
'03-25 06:35:57.915 I/flutter (29112): [image_classification_offline_v2: accuracy mode] result: NativeRunResult(accuracy:0.8999999761581421, accuracy2:null)'
close for now since there is no such issue with current deivice.