mobile_app_open icon indicating copy to clipboard operation
mobile_app_open copied to clipboard

Integration tests may produce different accuracy outputs

Open anhappdev opened this issue 11 months ago • 2 comments

As outlined in https://github.com/mlcommons/mobile_app_closed/pull/21#issuecomment-2681642863 by @mohitmundhragithub, the CI test might yield different results for performance and accuracy modes.

Performance mode:

Line 479: [image_classification_v2: performance mode] result: NativeRunResult(accuracy:0.8999999761581421, accuracy2:null) Line 592: [object_detection: performance mode] result: NativeRunResult(accuracy:0.3445338010787964, accuracy2:null) Line 710: [image_segmentation_v2: performance mode] result: NativeRunResult(accuracy:0.3669957220554352, accuracy2:null) Line 833: [natural_language_processing: performance mode] result: NativeRunResult(accuracy:1.0, accuracy2:null) Line 957: [super_resolution: performance mode] result: NativeRunResult(accuracy:0.33657199144363403, accuracy2:null) Line 1091: [image_classification_offline_v2: performance mode] result: NativeRunResult(accuracy:0.4000000059604645, accuracy2:null)

Accuracy Mode:

Line 1204: [image_classification_v2: accuracy mode] result: NativeRunResult(accuracy:0.0, accuracy2:null) Line 1309: [object_detection: accuracy mode] result: NativeRunResult(accuracy:0.3445338010787964, accuracy2:null) Line 1422: [image_segmentation_v2: accuracy mode] result: NativeRunResult(accuracy:0.0, accuracy2:null) Line 1537: [natural_language_processing: accuracy mode] result: NativeRunResult(accuracy:1.0, accuracy2:null) Line 1654: [super_resolution: accuracy mode] result: NativeRunResult(accuracy:0.05482751503586769, accuracy2:null) Line 1783: [image_classification_offline_v2: accuracy mode] result: NativeRunResult(accuracy:0.0, accuracy2:null)

Seems like the accuracy mode being run on the device has some issues. during the performance mode, the accuracy results seems reasonable, but during accuracy mode, the results are all messed up. for few its 0, and for few others seems good.

We should determine whether this behavior is normal or if there's an issue with it.

anhappdev avatar Mar 02 '25 06:03 anhappdev

This issue is seen only for the CI tests. For normal submission mode, it seems okay.

mohitmundhragithub avatar Mar 03 '25 05:03 mohitmundhragithub

The accuracy on the S25 Ultra appears to be ok. Therefore, the potential issue might be specific to the device.

s25ultra.log

 '03-25 06:34:49.133 I/flutter (29112): [image_classification_v2: performance mode] result: NativeRunResult(accuracy:0.8399999737739563, accuracy2:null)',
 '03-25 06:35:00.271 I/flutter (29112): [object_detection: performance mode] result: NativeRunResult(accuracy:0.3445338010787964, accuracy2:null)',
 '03-25 06:35:11.587 I/flutter (29112): [image_segmentation_v2: performance mode] result: NativeRunResult(accuracy:0.3669957220554352, accuracy2:null)',
 '03-25 06:35:23.393 I/flutter (29112): [natural_language_processing: performance mode] result: NativeRunResult(accuracy:1.0, accuracy2:null)',
 '03-25 06:35:34.547 I/flutter (29112): [super_resolution: performance mode] result: NativeRunResult(accuracy:0.33657199144363403, accuracy2:null)',
 '03-25 06:35:47.043 I/flutter (29112): [image_classification_offline_v2: performance mode] result: NativeRunResult(accuracy:0.4000000059604645, accuracy2:null)',
 
 '03-25 06:35:48.137 I/flutter (29112): [image_classification_v2: accuracy mode] result: NativeRunResult(accuracy:0.8399999737739563, accuracy2:null)',
 '03-25 06:35:48.239 I/flutter (29112): [object_detection: accuracy mode] result: NativeRunResult(accuracy:0.3445338010787964, accuracy2:null)',
 '03-25 06:35:49.000 I/flutter (29112): [image_segmentation_v2: accuracy mode] result: NativeRunResult(accuracy:0.3669957220554352, accuracy2:null)',
 '03-25 06:35:49.804 I/flutter (29112): [natural_language_processing: accuracy mode] result: NativeRunResult(accuracy:1.0, accuracy2:null)',
 '03-25 06:35:50.289 I/flutter (29112): [super_resolution: accuracy mode] result: NativeRunResult(accuracy:0.33657199144363403, accuracy2:null)',
 '03-25 06:35:57.915 I/flutter (29112): [image_classification_offline_v2: accuracy mode] result: NativeRunResult(accuracy:0.8999999761581421, accuracy2:null)'

anhappdev avatar Mar 30 '25 13:03 anhappdev

close for now since there is no such issue with current deivice.

freedomtan avatar Apr 08 '25 05:04 freedomtan