MIOpen icon indicating copy to clipboard operation
MIOpen copied to clipboard

Implement getitem backward

Open seungmanhan opened this issue 1 year ago • 4 comments

Added getitem operation backward kernel with solver Added driver test and gtest for getitem backward Compared to ROCm pytorch, there is a performance improvement

getitem float16
op_name dtype dx_size index_size dy_size dim model dir ROCm pytorch (op time) MIOpen HIP Improvement
getitem float16 [128 128] [128] [128 128] 0 llama2 bwd 321654 53474 6.02
getitem float16 [3234 4] [16] [16 4] 0 ssdlite bwd 249380 60070 4.15
getitem float16 [3234 4] [12] [12 4] 0 ssdlite bwd 211604 57741 3.66
getitem float16 [3234 4] [11] [11 4] 0 ssdlite bwd 254389 54470 4.67
getitem float16 [3234 4] [13] [13 4] 0 ssdlite bwd 240549 58079 4.14
getitem float16 [3234 4] [14] [14 4] 0 ssdlite bwd 220932 59892 3.69
getitem float16 [3234 4] [15] [15 4] 0 ssdlite bwd 243876 58523 4.17
getitem float16 [3234 4] [10] [10 4] 0 ssdlite bwd 234308 60301 3.89
getitem float16 [3234 4] [18] [18 4] 0 ssdlite bwd 214548 59323 3.62
getitem float16 [3234 4] [20] [20 4] 0 ssdlite bwd 221380 61030 3.63
getitem float16 [3234 4] [23] [23 4] 0 ssdlite bwd 243877 59946 4.07
getitem float16 [3234 4] [8] [8 4] 0 ssdlite bwd 210948 59039 3.57
getitem float16 [3234 4] [22] [22 4] 0 ssdlite bwd 223524 59021 3.79
getitem float16 [3234 4] [25] [25 4] 0 ssdlite bwd 246389 59394 4.15
getitem float16 [3234 4] [7] [7 4] 0 ssdlite bwd 206372 57030 3.62
getitem float16 [3234 4] [28] [28 4] 0 ssdlite bwd 226452 61297 3.69
getitem float16 [3234 4] [30] [30 4] 0 ssdlite bwd 206788 60265 3.43
getitem float16 [3234 4] [31] [31 4] 0 ssdlite bwd 223364 62025 3.60
getitem float16 [3234 4] [32] [32 4] 0 ssdlite bwd 215620 60354 3.57
getitem float16 [3234 4] [33] [33 4] 0 ssdlite bwd 203548 55057 3.70
getitem float16 [3234 4] [34] [34 4] 0 ssdlite bwd 223060 61474 3.63
getitem float16 [3234] [16] [16] 0 ssdlite bwd 224244 60408 3.71
getitem float16 [149 128] [1490] [1490 128] 0 llama2_7b bwd 287158 65972 4.35
getitem float16 [150 128] [10] [10 128] 0 llama2_7b bwd 235060 63465 3.70
getitem float16 [174 128] [10] [10 128] 0 llama2_7b bwd 266485 62950 4.23
getitem float16 [205 128] [10] [10 128] 0 llama2_7b bwd 247861 63604 3.90
getitem float16 [232 128] [10] [10 128] 0 llama2_7b bwd 257685 61379 4.20
getitem float16 [3234] [15] [15] 0 ssdlite bwd 242532 58479 4.15
getitem float16 [3234 4] [2] [2 4] 0 ssdlite bwd 219716 60026 3.66
getitem float16 [3234 4] [37] [37 4] 0 ssdlite bwd 267941 62142 4.31
getitem float16 [3234] [10] [10] 0 ssdlite bwd 263285 62337 4.22
getitem float16 [3234 4] [38] [38 4] 0 ssdlite bwd 219188 63012 3.48
getitem float16 [3234 4] [41] [41 4] 0 ssdlite bwd 181827 60665 3.00
getitem float16 [3234 4] [6] [6 4] 0 ssdlite bwd 232612 64968 3.58
getitem float16 [3234 4] [44] [44 4] 0 ssdlite bwd 299382 55455 5.40
getitem float16 [3234] [22] [22] 0 ssdlite bwd 265733 55438 4.79
getitem float16 [3234] [25] [25] 0 ssdlite bwd 193556 60469 3.20
getitem float16 [3234 4] [43] [43 4] 0 ssdlite bwd 222116 59971 3.70
getitem float16 [3234] [21] [21] 0 ssdlite bwd 204452 59811 3.42
getitem float16 [3234] [27] [27] 0 ssdlite bwd 187636 60967 3.08
getitem float16 [3234] [24] [24] 0 ssdlite bwd 213284 56219 3.79
getitem float16 [3234 4] [42] [42 4] 0 ssdlite bwd 209716 61002 3.44
getitem float16 [3234 4] [3] [3 4] 0 ssdlite bwd 184708 58247 3.17
getitem float16 [3234 4] [45] [45 4] 0 ssdlite bwd 220964 61909 3.57
getitem float16 [3234] [40] [40] 0 ssdlite bwd 248037 54475 4.55
getitem float16 [1 4] [8] [8 4] 0 ssdlite bwd 214212 59774 3.58
getitem float16 [1] [13] [13] 0 ssdlite bwd 638684 61516 10.38
getitem float16 [3234] [45] [45] 0 ssdlite bwd 214692 66920 3.21
getitem float16 [1 4] [7] [7 4] 0 ssdlite bwd 219844 60449 3.64
getitem float16 [1] [7] [7] 0 ssdlite bwd 199988 60822 3.29
getitem float16 [3234] [4] [4] 0 ssdlite bwd 220484 58760 3.75
getitem float16 [3234 4] [59] [59 4] 0 ssdlite bwd 185812 61853 3.00
getitem float16 [3234] [5] [5] 0 ssdlite bwd 223844 59720 3.75
getitem float16 [2 4] [15] [15 4] 0 ssdlite bwd 229764 58048 3.96
getitem float16 [3234 4] [62] [62 4] 0 ssdlite bwd 228916 61035 3.75
getitem float16 [3234] [57] [57] 0 ssdlite bwd 175347 61887 2.83
getitem float16 [3234 4] [77] [77 4] 0 ssdlite bwd 185524 63025 2.94
getitem float16 [3234 4] [79] [79 4] 0 ssdlite bwd 185380 62545 2.96
getitem float16 [3234 4] [75] [75 4] 0 ssdlite bwd 166003 64340 2.58
getitem float16 [3234] [59] [59] 0 ssdlite bwd 160531 60767 2.64
getitem float16 [8741 4] [2000] [2000 4] 0 maskrcnn bwd 228196 60553 3.77
getitem float16 [8741] [2000] [2000] 0 maskrcnn bwd 198628 61833 3.21
getitem float16 [3234 4] [89] [89 4] 0 ssdlite bwd 138706 64713 2.14
getitem float16 [3234 4] [84] [84 4] 0 ssdlite bwd 135922 64038 2.12
getitem float16 [8741 4] [8741] [8741 4] 0 maskrcnn bwd 211428 58420 3.62
getitem float16 [8741] [8741] [8741] 0 maskrcnn bwd 202004 53620 3.77
getitem float16 [3234] [58] [58] 0 ssdlite bwd 130242 61140 2.13
getitem float16 [3 4] [20] [20 4] 0 ssdlite bwd 137362 62900 2.18
getitem float16 [3] [20] [20] 0 ssdlite bwd 135986 60251 2.26
getitem float16 [3234] [64] [64] 0 ssdlite bwd 138371 63077 2.19
getitem float16 [3234 4] [83] [83 4] 0 ssdlite bwd 132819 64002 2.08
getitem float16 [3 4] [17] [17 4] 0 ssdlite bwd 133779 62331 2.15
getitem float16 [3] [17] [17] 0 ssdlite bwd 131394 62953 2.09
getitem float16 [3234 4] [86] [86 4] 0 ssdlite bwd 144003 63131 2.28
getitem float16 [3234 4] [88] [88 4] 0 ssdlite bwd 135123 63682 2.12
getitem float16 [3234] [65] [65] 0 ssdlite bwd 129378 62065 2.08
getitem float16 [3234] [73] [73] 0 ssdlite bwd 138162 64962 2.13
getitem float16 [3234] [61] [61] 0 ssdlite bwd 133827 59540 2.25
getitem float16 [3 4] [23] [23 4] 0 ssdlite bwd 131506 62598 2.10
getitem float16 [3] [23] [23] 0 ssdlite bwd 136962 61229 2.24
getitem float16 [3 4] [21] [21 4] 0 ssdlite bwd 139315 58740 2.37
getitem float16 [3] [21] [21] 0 ssdlite bwd 133459 61975 2.15
getitem float16 [2 4] [8] [8 4] 0 ssdlite bwd 136290 58775 2.32
getitem float16 [2] [8] [8] 0 ssdlite bwd 135587 61975 2.19
getitem float16 [3 4] [28] [28 4] 0 ssdlite bwd 135827 58473 2.32
getitem float16 [3] [28] [28] 0 ssdlite bwd 137091 67487 2.03
getitem float16 [2 4] [27] [27 4] 0 ssdlite bwd 144339 60873 2.37
getitem float16 [2] [27] [27] 0 ssdlite bwd 141299 64411 2.19
getitem float16 [3 4] [22] [22 4] 0 ssdlite bwd 134851 60535 2.23
getitem float16 [3] [22] [22] 0 ssdlite bwd 139475 60517 2.30
getitem float16 [3234 4] [78] [78 4] 0 ssdlite bwd 135538 62011 2.19
getitem float16 [2 4] [28] [28 4] 0 ssdlite bwd 136659 61744 2.21
getitem float16 [2] [28] [28] 0 ssdlite bwd 144307 60198 2.40
getitem float16 [3234] [67] [67] 0 ssdlite bwd 136146 62367 2.18
getitem float16 [3234] [74] [74] 0 ssdlite bwd 132786 61584 2.16
getitem float16 [3234] [70] [70] 0 ssdlite bwd 147762 59860 2.47
getitem float16 [3 4] [26] [26 4] 0 ssdlite bwd 159059 61015 2.61
getitem float16 [3] [26] [26] 0 ssdlite bwd 152835 67380 2.27
getitem float16 [3 4] [29] [29 4] 0 ssdlite bwd 157571 65264 2.41
getitem float16 [3] [29] [29] 0 ssdlite bwd 161635 63255 2.56
getitem float16 [3 4] [18] [18 4] 0 ssdlite bwd 151955 60553 2.51
getitem float16 [3] [18] [18] 0 ssdlite bwd 155939 61246 2.55
getitem float16 [3234] [68] [68] 0 ssdlite bwd 161587 62722 2.58
getitem float16 [3234] [66] [66] 0 ssdlite bwd 160131 62224 2.57
getitem float16 [3 4] [19] [19 4] 0 ssdlite bwd 154099 59024 2.61
getitem float16 [3] [19] [19] 0 ssdlite bwd 148547 56250 2.64
getitem float16 [3234 4] [90] [90 4] 0 ssdlite bwd 158995 63060 2.52
getitem float16 [3] [25] [25] 0 ssdlite bwd 150099 60108 2.50
getitem float16 [3234 4] [87] [87 4] 0 ssdlite bwd 159315 63077 2.53
getitem float16 [3 4] [10] [10 4] 0 ssdlite bwd 158275 59895 2.64
getitem float16 [3] [10] [10] 0 ssdlite bwd 148851 58188 2.56
getitem float16 [3 4] [27] [27 4] 0 ssdlite bwd 157251 60784 2.59
getitem float16 [3] [27] [27] 0 ssdlite bwd 158787 64838 2.45
getitem float16 [3 4] [30] [30 4] 0 ssdlite bwd 154851 61975 2.50
getitem float16 [3] [30] [30] 0 ssdlite bwd 159859 54526 2.93
getitem float16 [3 4] [16] [16 4] 0 ssdlite bwd 159331 61122 2.61
getitem float16 [3] [16] [16] 0 ssdlite bwd 151491 64322 2.36
getitem float16 [3 4] [33] [33 4] 0 ssdlite bwd 158995 63539 2.50
getitem float16 [3] [33] [33] 0 ssdlite bwd 163427 58775 2.78
getitem float16 [3234 4] [81] [81 4] 0 ssdlite bwd 155011 63522 2.44
getitem float16 [3 4] [34] [34 4] 0 ssdlite bwd 154659 63415 2.44
getitem float16 [3] [34] [34] 0 ssdlite bwd 161091 62028 2.60
getitem float16 [4300 4] [4261] [4261 4] 0 fasterrcnn bwd 208404 60410 3.45
getitem float16 [4300] [4261] [4261] 0 fasterrcnn bwd 208388 63860 3.26
getitem float16 [4261 4] [1000] [1000 4] 0 fasterrcnn bwd 160339 61815 2.59
getitem float16 [4261] [1000] [1000] 0 fasterrcnn bwd 165416 61317 2.70
getitem float16 [3 4] [31] [31 4] 0 ssdlite bwd 168131 61975 2.71
getitem float16 [3] [31] [31] 0 ssdlite bwd 148131 63255 2.34
getitem float16 [2 4] [29] [29 4] 0 ssdlite bwd 157119 62206 2.53
getitem float16 [2] [29] [29] 0 ssdlite bwd 161795 60873 2.66
getitem float16 [3234] [72] [72] 0 ssdlite bwd 152995 61139 2.50
getitem float16 [3234] [69] [69] 0 ssdlite bwd 152755 62971 2.43
getitem float16 [3234 4] [94] [94 4] 0 ssdlite bwd 157939 65477 2.41
getitem float16 [3 4] [11] [11 4] 0 ssdlite bwd 151907 54561 2.78
getitem float16 [3] [11] [11] 0 ssdlite bwd 159091 60357 2.64
getitem float16 [3 4] [13] [13 4] 0 ssdlite bwd 149555 59486 2.51
getitem float16 [3] [13] [13] 0 ssdlite bwd 153779 57797 2.66
getitem float16 [3 4] [12] [12 4] 0 ssdlite bwd 157283 59504 2.64
getitem float16 [3] [12] [12] 0 ssdlite bwd 152963 60073 2.55
getitem float16 [3 4] [14] [14 4] 0 ssdlite bwd 151123 67291 2.25
getitem float16 [3] [14] [14] 0 ssdlite bwd 156915 63611 2.47
getitem float16 [3234] [71] [71] 0 ssdlite bwd 155427 61851 2.51
getitem float16 [3 4] [24] [24 4] 0 ssdlite bwd 155475 59770 2.60
getitem float16 [3] [24] [24] 0 ssdlite bwd 160387 62028 2.59
getitem float16 [3234 4] [101] [101 4] 0 ssdlite bwd 154627 63415 2.44
getitem float16 [1 4] [4] [4 4] 0 ssdlite bwd 148931 61566 2.42
getitem float16 [1] [4] [4] 0 ssdlite bwd 156211 61477 2.54
getitem float16 [1 4] [6] [6 4] 0 ssdlite bwd 154675 54366 2.85
getitem float16 [1] [6] [6] 0 ssdlite bwd 150579 59433 2.53
getitem float16 [3234 4] [85] [85 4] 0 ssdlite bwd 158851 62490 2.54
getitem float16 [3234 4] [92] [92 4] 0 ssdlite bwd 156035 64873 2.41
getitem float16 [4 4] [4] [4 4] 0 ssdlite bwd 154755 58668 2.64
getitem float16 [4] [4] [4] 0 ssdlite bwd 150051 62330 2.41
getitem float16 [3234] [76] [76] 0 ssdlite bwd 159507 56588 2.82
getitem float16 [3 4] [15] [15 4] 0 ssdlite bwd 153107 59646 2.57
getitem float16 [3] [15] [15] 0 ssdlite bwd 144019 60001 2.40
getitem float16 [3234] [80] [80] 0 ssdlite bwd 146691 61122 2.40
getitem float16 [3 4] [32] [32 4] 0 ssdlite bwd 145331 61353 2.37
getitem float16 [3] [32] [32] 0 ssdlite bwd 142178 57050 2.49
getitem float16 [3234 4] [91] [91 4] 0 ssdlite bwd 137411 62881 2.19
getitem float16 [4300 4] [4262] [4262 4] 0 fasterrcnn bwd 202772 58241 3.48
getitem float16 [4300] [4262] [4262] 0 fasterrcnn bwd 188083 57939 3.25
getitem float16 [4262 4] [1000] [1000 4] 0 fasterrcnn bwd 152211 60819 2.50
getitem float16 [4262] [1000] [1000] 0 fasterrcnn bwd 143955 61548 2.34
getitem float16 [1] [5] [5] 0 maskrcnn_ssdlite bwd 133970 61139 2.19
getitem float16 [4 4] [27] [27 4] 0 ssdlite bwd 139363 62117 2.24
getitem float16 [4] [27] [27] 0 ssdlite bwd 135491 62028 2.18
getitem float16 [1 4] [5] [5 4] 0 ssdlite bwd 130291 58935 2.21
getitem float16 [4 4] [28] [28 4] 0 ssdlite bwd 138797 59895 2.32
getitem float16 [4] [28] [28] 0 ssdlite bwd 137699 60730 2.27
getitem float16 [4300 4] [4194] [4194 4] 0 fasterrcnn bwd 194276 59468 3.27
getitem float16 [4300] [4194] [4194] 0 fasterrcnn bwd 190578 61886 3.08
getitem float16 [4194 4] [1000] [1000 4] 0 fasterrcnn bwd 171235 60410 2.83
getitem float16 [2 4] [30] [30 4] 0 ssdlite bwd 158163 60837 2.60
getitem float16 [2] [30] [30] 0 ssdlite bwd 151331 62153 2.43
getitem float16 [4 4] [33] [33 4] 0 ssdlite bwd 157619 61459 2.56
getitem float16 [4] [33] [33] 0 ssdlite bwd 156979 61157 2.57
getitem float16 [3234] [77] [77] 0 ssdlite bwd 152355 61993 2.46
getitem float16 [3234 4] [93] [93 4] 0 ssdlite bwd 149283 62935 2.37
getitem float16 [3234 4] [105] [105 4] 0 ssdlite bwd 158163 63344 2.50
getitem float16 [3234 4] [113] [113 4] 0 ssdlite bwd 150099 64464 2.33
getitem float16 [4194] [1000] [1000] 0 fasterrcnn bwd 160915 70206 2.29
getitem float16 [3 4] [37] [37 4] 0 ssdlite bwd 145811 59148 2.47
getitem float16 [3] [37] [37] 0 ssdlite bwd 146574 57595 2.54
getitem float16 [4 4] [37] [37 4] 0 ssdlite bwd 146515 64866 2.26
getitem float16 [4] [37] [37] 0 ssdlite bwd 148307 62217 2.38
getitem float16 [2 4] [7] [7 4] 0 ssdlite bwd 134354 64244 2.09
getitem float16 [2] [7] [7] 0 ssdlite bwd 150131 59159 2.54
getitem float16 [4 4] [18] [18 4] 0 ssdlite bwd 140434 55355 2.54
getitem float16 [4] [18] [18] 0 ssdlite bwd 134099 59800 2.24
getitem float16 [3 4] [9] [9 4] 0 ssdlite bwd 135986 59462 2.29
getitem float16 [3] [9] [9] 0 ssdlite bwd 154323 67159 2.30
getitem float16 [3234] [79] [79] 0 ssdlite bwd 140083 63515 2.21
getitem float16 [4 4] [30] [30 4] 0 ssdlite bwd 133794 59604 2.24
getitem float16 [4] [30] [30] 0 ssdlite bwd 149619 61115 2.45
getitem float16 [3234] [75] [75] 0 ssdlite bwd 139619 60600 2.30
getitem float16 [3 4] [35] [35 4] 0 ssdlite bwd 136691 62360 2.19
getitem float16 [4 4] [39] [39 4] 0 ssdlite bwd 216420 68975 3.14
getitem float16 [8] [2008] [2008] 0 maskrcnn bwd 263685 128797 2.05
getitem float16 [5] [14] [14] 0 maskrcnn_ssdlite bwd 196612 61884 3.18
getitem float16 [8 4] [8] [8 4] 0 ssdlite bwd 168851 60035 2.81
getitem float16 [4197 4] [1000] [1000 4] 0 fasterrcnn bwd 166435 64302 2.59
getitem float16 [4197] [1000] [1000] 0 fasterrcnn bwd 171219 59858 2.86
getitem float16 [16] [2016] [2016] 0 maskrcnn bwd 238692 95396 2.50
getitem float16 [7 4] [33] [33 4] 0 ssdlite bwd 155811 59449 2.62
getitem float16
op_name dtype dx_size index_size dy_size dim model dir ROCm pytorch (op time) MIOpen HIP Improvement
getitem float32 [128 128] [128] [128 128] 0 llama2 bwd 259861 61190 4.25
getitem float32 [3234 4] [16] [16 4] 0 ssdlite bwd 239348 61048 3.92
getitem float32 [3234 4] [12] [12 4] 0 ssdlite bwd 249861 59945 4.17
getitem float32 [3234 4] [11] [11 4] 0 ssdlite bwd 233364 59288 3.94
getitem float32 [3234 4] [13] [13 4] 0 ssdlite bwd 230340 58506 3.94
getitem float32 [3234 4] [14] [14 4] 0 ssdlite bwd 253285 60248 4.20
getitem float32 [3234 4] [15] [15 4] 0 ssdlite bwd 227812 58808 3.87
getitem float32 [3234 4] [10] [10 4] 0 ssdlite bwd 219668 60248 3.65
getitem float32 [3234 4] [18] [18 4] 0 ssdlite bwd 236724 63963 3.70
getitem float32 [3234 4] [20] [20 4] 0 ssdlite bwd 234116 55679 4.20
getitem float32 [3234 4] [23] [23 4] 0 ssdlite bwd 197892 60443 3.27
getitem float32 [3234 4] [8] [8 4] 0 ssdlite bwd 228756 54541 4.19
getitem float32 [3234 4] [22] [22 4] 0 ssdlite bwd 233748 60870 3.84
getitem float32 [3234 4] [25] [25 4] 0 ssdlite bwd 200356 62221 3.22
getitem float32 [3234 4] [7] [7 4] 0 ssdlite bwd 213124 58541 3.64
getitem float32 [3234 4] [28] [28 4] 0 ssdlite bwd 214148 60159 3.56
getitem float32 [3234 4] [30] [30 4] 0 ssdlite bwd 224036 62506 3.58
getitem float32 [3234 4] [31] [31 4] 0 ssdlite bwd 211092 59110 3.57
getitem float32 [3234 4] [32] [32 4] 0 ssdlite bwd 213524 61634 3.46
getitem float32 [3234 4] [33] [33 4] 0 ssdlite bwd 223892 60195 3.72
getitem float32 [3234 4] [34] [34 4] 0 ssdlite bwd 210660 59999 3.51
getitem float32 [3234] [16] [16] 0 ssdlite bwd 214116 60426 3.54
getitem float32 [149 128] [1490] [1490 128] 0 llama2_7b bwd 254165 83092 3.06
getitem float32 [150 128] [10] [10 128] 0 llama2_7b bwd 267893 61794 4.34
getitem float32 [174 128] [10] [10 128] 0 llama2_7b bwd 272789 61705 4.42
getitem float32 [205 128] [10] [10 128] 0 llama2_7b bwd 298054 62004 4.81
getitem float32 [232 128] [10] [10 128] 0 llama2_7b bwd 256661 62285 4.12
getitem float32 [248 128] [10] [10 128] 0 llama2_7b bwd 237956 63600 3.74
getitem float32 [3234] [15] [15] 0 ssdlite bwd 247045 57679 4.28
getitem float32 [3234 4] [2] [2 4] 0 ssdlite bwd 263477 63973 4.12
getitem float32 [3234 4] [37] [37 4] 0 ssdlite bwd 238756 62942 3.79
getitem float32 [3234] [10] [10] 0 ssdlite bwd 258229 57803 4.47
getitem float32 [3234 4] [38] [38 4] 0 ssdlite bwd 218180 61021 3.58
getitem float32 [3234 4] [41] [41 4] 0 ssdlite bwd 277029 60487 4.58
getitem float32 [3234 4] [6] [6 4] 0 ssdlite bwd 269109 58425 4.61
getitem float32 [3234 4] [44] [44 4] 0 ssdlite bwd 156131 61536 2.54
getitem float32 [3234] [22] [22] 0 ssdlite bwd 230196 59029 3.90
getitem float32 [3234] [25] [25] 0 ssdlite bwd 210116 60540 3.47
getitem float32 [3234 4] [43] [43 4] 0 ssdlite bwd 208052 58158 3.58
getitem float32 [3234] [21] [21] 0 ssdlite bwd 206996 60007 3.45
getitem float32 [3234] [27] [27] 0 ssdlite bwd 213444 61749 3.46
getitem float32 [3234] [24] [24] 0 ssdlite bwd 199588 59385 3.36
getitem float32 [3234 4] [42] [42 4] 0 ssdlite bwd 199860 55419 3.61
getitem float32 [3234 4] [3] [3 4] 0 ssdlite bwd 212100 58958 3.60
getitem float32 [3234 4] [45] [45 4] 0 ssdlite bwd 196100 56628 3.46
getitem float32 [3234] [40] [40] 0 ssdlite bwd 240628 60752 3.96
getitem float32 [1 4] [8] [8 4] 0 ssdlite bwd 177076 58885 3.01
getitem float32 [1] [13] [13] 0 ssdlite bwd 192084 60449 3.18
getitem float32 [3234] [45] [45] 0 ssdlite bwd 222484 60840 3.66
getitem float32 [1 4] [7] [7 4] 0 ssdlite bwd 189156 62013 3.05
getitem float32 [1] [7] [7] 0 ssdlite bwd 215748 61160 3.53
getitem float32 [3234] [4] [4] 0 ssdlite bwd 196884 59951 3.28
getitem float32 [3234 4] [59] [59 4] 0 ssdlite bwd 229012 61515 3.72
getitem float32 [3234] [5] [5] 0 ssdlite bwd 213780 59239 3.61
getitem float32 [2 4] [15] [15 4] 0 ssdlite bwd 188436 64929 2.90
getitem float32 [3234 4] [62] [62 4] 0 ssdlite bwd 183491 64395 2.85
getitem float32 [3234] [57] [57] 0 ssdlite bwd 182788 60642 3.01
getitem float32 [3234 4] [77] [77 4] 0 ssdlite bwd 184499 65460 2.82
getitem float32 [3234 4] [79] [79 4] 0 ssdlite bwd 163174 63558 2.57
getitem float32 [3234 4] [75] [75 4] 0 ssdlite bwd 156243 65496 2.39
getitem float32 [3234] [59] [59] 0 ssdlite bwd 148819 63007 2.36
getitem float32 [8741 4] [2000] [2000 4] 0 maskrcnn bwd 199124 62598 3.18
getitem float32 [8741] [2000] [2000] 0 maskrcnn bwd 179811 65158 2.76
getitem float32 [3234 4] [89] [89 4] 0 ssdlite bwd 139987 66776 2.10
getitem float32 [3234 4] [84] [84 4] 0 ssdlite bwd 138851 63202 2.20
getitem float32 [8741 4] [8741] [8741 4] 0 maskrcnn bwd 220772 60304 3.66
getitem float32 [8741] [8741] [8741] 0 maskrcnn bwd 183843 60731 3.03
getitem float32 [3234] [58] [58] 0 ssdlite bwd 133907 56109 2.39
getitem float32 [3 4] [20] [20 4] 0 ssdlite bwd 136386 61425 2.22
getitem float32 [3] [20] [20] 0 ssdlite bwd 131311 63095 2.08
getitem float32 [3234] [64] [64] 0 ssdlite bwd 134962 61976 2.18
getitem float32 [3234 4] [83] [83 4] 0 ssdlite bwd 138754 64713 2.14
getitem float32 [3 4] [17] [17 4] 0 ssdlite bwd 134258 63380 2.12
getitem float32 [3] [17] [17] 0 ssdlite bwd 135427 64411 2.10
getitem float32 [3234 4] [86] [86 4] 0 ssdlite bwd 132386 66793 1.98
getitem float32 [3234 4] [88] [88 4] 0 ssdlite bwd 133282 64891 2.05
getitem float32 [3234] [65] [65] 0 ssdlite bwd 137762 63113 2.18
getitem float32 [3234] [73] [73] 0 ssdlite bwd 135731 60037 2.26
getitem float32 [3234] [61] [61] 0 ssdlite bwd 128850 62882 2.05
getitem float32 [3 4] [23] [23 4] 0 ssdlite bwd 139475 64944 2.15
getitem float32 [3] [23] [23] 0 ssdlite bwd 139315 64856 2.15
getitem float32 [3 4] [21] [21 4] 0 ssdlite bwd 135043 64784 2.08
getitem float32 [3] [21] [21] 0 ssdlite bwd 135714 65869 2.06
getitem float32 [2 4] [8] [8 4] 0 ssdlite bwd 134066 61922 2.17
getitem float32 [2] [8] [8] 0 ssdlite bwd 130682 61602 2.12
getitem float32 [3 4] [28] [28 4] 0 ssdlite bwd 144947 63469 2.28
getitem float32 [3] [28] [28] 0 ssdlite bwd 139859 64429 2.17
getitem float32 [2 4] [27] [27 4] 0 ssdlite bwd 137219 70011 1.96
getitem float32 [2] [27] [27] 0 ssdlite bwd 143363 69389 2.07
getitem float32 [3 4] [22] [22 4] 0 ssdlite bwd 138963 60553 2.29
getitem float32 [3] [22] [22] 0 ssdlite bwd 132994 61691 2.16
getitem float32 [3234 4] [78] [78 4] 0 ssdlite bwd 138707 64749 2.14
getitem float32 [2 4] [28] [28 4] 0 ssdlite bwd 143283 68927 2.08
getitem float32 [2] [28] [28] 0 ssdlite bwd 140947 66758 2.11
getitem float32 [3234] [67] [67] 0 ssdlite bwd 136942 60998 2.25
getitem float32 [3234] [74] [74] 0 ssdlite bwd 140643 62331 2.26
getitem float32 [3234] [70] [70] 0 ssdlite bwd 153427 59931 2.56
getitem float32 [3 4] [26] [26 4] 0 ssdlite bwd 156683 63984 2.45
getitem float32 [3] [26] [26] 0 ssdlite bwd 161795 72767 2.22
getitem float32 [3 4] [29] [29 4] 0 ssdlite bwd 155251 61531 2.52
getitem float32 [3] [29] [29] 0 ssdlite bwd 149539 65193 2.29
getitem float32 [3 4] [18] [18 4] 0 ssdlite bwd 157059 62669 2.51
getitem float32 [3] [18] [18] 0 ssdlite bwd 153619 57139 2.69
getitem float32 [3234] [68] [68] 0 ssdlite bwd 152035 65318 2.33
getitem float32 [3234] [66] [66] 0 ssdlite bwd 150563 60997 2.47
getitem float32 [3 4] [19] [19 4] 0 ssdlite bwd 154019 63380 2.43
getitem float32 [3] [19] [19] 0 ssdlite bwd 160147 63949 2.50
getitem float32 [3234 4] [90] [90 4] 0 ssdlite bwd 152963 65335 2.34
getitem float32 [3] [25] [25] 0 ssdlite bwd 158419 58224 2.72
getitem float32 [3234 4] [87] [87 4] 0 ssdlite bwd 151971 63931 2.38
getitem float32 [3 4] [10] [10 4] 0 ssdlite bwd 147987 60162 2.46
getitem float32 [3] [10] [10] 0 ssdlite bwd 157043 57993 2.71
getitem float32 [3 4] [27] [27 4] 0 ssdlite bwd 159027 65228 2.44
getitem float32 [3] [27] [27] 0 ssdlite bwd 151155 61246 2.47
getitem float32 [3 4] [30] [30 4] 0 ssdlite bwd 158163 64891 2.44
getitem float32 [3] [30] [30] 0 ssdlite bwd 158115 64055 2.47
getitem float32 [3 4] [16] [16 4] 0 ssdlite bwd 149187 61993 2.41
getitem float32 [3] [16] [16] 0 ssdlite bwd 158019 64109 2.46
getitem float32 [3 4] [33] [33 4] 0 ssdlite bwd 162515 62242 2.61
getitem float32 [3] [33] [33] 0 ssdlite bwd 152515 64624 2.36
getitem float32 [3234 4] [81] [81 4] 0 ssdlite bwd 157987 65264 2.42
getitem float32 [3 4] [34] [34 4] 0 ssdlite bwd 159235 66651 2.39
getitem float32 [3] [34] [34] 0 ssdlite bwd 151539 65104 2.33
getitem float32 [4300 4] [4261] [4261 4] 0 fasterrcnn bwd 208548 65353 3.19
getitem float32 [4300] [4261] [4261] 0 fasterrcnn bwd 201396 65691 3.07
getitem float32 [4261 4] [1000] [1000 4] 0 fasterrcnn bwd 171427 60748 2.82
getitem float32 [4261] [1000] [1000] 0 fasterrcnn bwd 164723 62242 2.65
getitem float32 [3 4] [31] [31 4] 0 ssdlite bwd 152595 69086 2.21
getitem float32 [3] [31] [31] 0 ssdlite bwd 156531 65726 2.38
getitem float32 [2 4] [29] [29 4] 0 ssdlite bwd 159283 69442 2.29
getitem float32 [2] [29] [29] 0 ssdlite bwd 150067 66828 2.25
getitem float32 [3234] [72] [72] 0 ssdlite bwd 158467 59771 2.65
getitem float32 [3234] [69] [69] 0 ssdlite bwd 161059 61335 2.63
getitem float32 [3234 4] [94] [94 4] 0 ssdlite bwd 146515 64144 2.28
getitem float32 [3 4] [11] [11 4] 0 ssdlite bwd 155843 59166 2.63
getitem float32 [3] [11] [11] 0 ssdlite bwd 147763 60428 2.45
getitem float32 [3 4] [13] [13 4] 0 ssdlite bwd 154563 65334 2.37
getitem float32 [3] [13] [13] 0 ssdlite bwd 154659 59148 2.61
getitem float32 [3 4] [12] [12 4] 0 ssdlite bwd 149299 74989 1.99
getitem float32 [3] [12] [12] 0 ssdlite bwd 156019 59504 2.62
getitem float32 [3 4] [14] [14 4] 0 ssdlite bwd 155619 62259 2.50
getitem float32 [3] [14] [14] 0 ssdlite bwd 146131 61691 2.37
getitem float32 [3234] [71] [71] 0 ssdlite bwd 157123 59788 2.63
getitem float32 [3 4] [24] [24 4] 0 ssdlite bwd 162227 65104 2.49
getitem float32 [3] [24] [24] 0 ssdlite bwd 151939 65264 2.33
getitem float32 [3234 4] [101] [101 4] 0 ssdlite bwd 153763 65637 2.34
getitem float32 [1 4] [4] [4 4] 0 ssdlite bwd 156499 59788 2.62
getitem float32 [1] [4] [4] 0 ssdlite bwd 150691 65228 2.31
getitem float32 [1 4] [6] [6 4] 0 ssdlite bwd 151811 59717 2.54
getitem float32 [1] [6] [6] 0 ssdlite bwd 154915 61050 2.54
getitem float32 [3234 4] [85] [85 4] 0 ssdlite bwd 152451 62704 2.43
getitem float32 [3234 4] [92] [92 4] 0 ssdlite bwd 152867 61762 2.48
getitem float32 [4 4] [4] [4 4] 0 ssdlite bwd 151299 61139 2.47
getitem float32 [4] [4] [4] 0 ssdlite bwd 152755 59593 2.56
getitem float32 [3234] [76] [76] 0 ssdlite bwd 152019 60268 2.52
getitem float32 [3 4] [15] [15 4] 0 ssdlite bwd 141939 63894 2.22
getitem float32 [3] [15] [15] 0 ssdlite bwd 146403 61655 2.37
getitem float32 [3234] [80] [80] 0 ssdlite bwd 148051 60446 2.45
getitem float32 [3 4] [32] [32 4] 0 ssdlite bwd 139203 66419 2.10
getitem float32 [3] [32] [32] 0 ssdlite bwd 136946 67166 2.04
getitem float32 [3234 4] [91] [91 4] 0 ssdlite bwd 142242 66508 2.14
getitem float32 [4300 4] [4262] [4262 4] 0 fasterrcnn bwd 189828 63521 2.99
getitem float32 [4300] [4262] [4262] 0 fasterrcnn bwd 191028 62793 3.04
getitem float32 [4262 4] [1000] [1000 4] 0 fasterrcnn bwd 144659 62153 2.33
getitem float32 [4262] [1000] [1000] 0 fasterrcnn bwd 149171 62313 2.39
getitem float32 [1] [5] [5] 0 maskrcnn_ssdlite bwd 135635 61388 2.21
getitem float32 [4 4] [27] [27 4] 0 ssdlite bwd 135778 65353 2.08
getitem float32 [4] [27] [27] 0 ssdlite bwd 136578 64268 2.13
getitem float32 [1 4] [5] [5 4] 0 ssdlite bwd 136754 59699 2.29
getitem float32 [4 4] [28] [28 4] 0 ssdlite bwd 133634 64890 2.06
getitem float32 [4] [28] [28] 0 ssdlite bwd 136627 65299 2.09
getitem float32 [4300 4] [4194] [4194 4] 0 fasterrcnn bwd 192116 63984 3.00
getitem float32 [4300] [4194] [4194] 0 fasterrcnn bwd 190356 63344 3.01
getitem float32 [4194 4] [1000] [1000 4] 0 fasterrcnn bwd 159907 63450 2.52
getitem float32 [2 4] [30] [30 4] 0 ssdlite bwd 152867 73762 2.07
getitem float32 [2] [30] [30] 0 ssdlite bwd 159491 68517 2.33
getitem float32 [4 4] [33] [33 4] 0 ssdlite bwd 159107 65477 2.43
getitem float32 [4] [33] [33] 0 ssdlite bwd 153363 64997 2.36
getitem float32 [3234] [77] [77] 0 ssdlite bwd 155235 60232 2.58
getitem float32 [3234 4] [93] [93 4] 0 ssdlite bwd 156259 60179 2.60
getitem float32 [3234 4] [105] [105 4] 0 ssdlite bwd 146979 65922 2.23
getitem float32 [3234 4] [113] [113 4] 0 ssdlite bwd 155555 64197 2.42
getitem float32 [4194] [1000] [1000] 0 fasterrcnn bwd 165587 61833 2.68
getitem float32 [3 4] [37] [37 4] 0 ssdlite bwd 145331 71859 2.02
getitem float32 [3] [37] [37] 0 ssdlite bwd 151763 61808 2.46
getitem float32 [4 4] [37] [37 4] 0 ssdlite bwd 148531 63763 2.33
getitem float32 [4] [37] [37] 0 ssdlite bwd 141762 63835 2.22
getitem float32 [2 4] [7] [7 4] 0 ssdlite bwd 139202 61222 2.27
getitem float32 [2] [7] [7] 0 ssdlite bwd 138098 60439 2.28
getitem float32 [4 4] [18] [18 4] 0 ssdlite bwd 133170 62715 2.12
getitem float32 [4] [18] [18] 0 ssdlite bwd 135507 68119 1.99
getitem float32 [3 4] [9] [9 4] 0 ssdlite bwd 142835 62040 2.30
getitem float32 [3] [9] [9] 0 ssdlite bwd 139107 55018 2.53
getitem float32 [3234] [79] [79] 0 ssdlite bwd 134882 59729 2.26
getitem float32 [4 4] [30] [30 4] 0 ssdlite bwd 139651 62466 2.24
getitem float32 [4] [30] [30] 0 ssdlite bwd 135970 64244 2.12
getitem float32 [3234] [75] [75] 0 ssdlite bwd 135858 62840 2.16
getitem float32 [3 4] [35] [35 4] 0 ssdlite bwd 142610 69612 2.05
getitem float32 [4 4] [39] [39 4] 0 ssdlite bwd 190996 57118 3.34
getitem float32 [8] [2008] [2008] 0 maskrcnn bwd 260261 131362 1.98
getitem float32 [5] [14] [14] 0 maskrcnn_ssdlite bwd 169587 60871 2.79
getitem float32 [8 4] [8] [8 4] 0 ssdlite bwd 174323 63111 2.76
getitem float32 [4197 4] [1000] [1000 4] 0 fasterrcnn bwd 178787 62987 2.84
getitem float32 [4197] [1000] [1000] 0 fasterrcnn bwd 169379 63964 2.65
getitem float32 [16] [2016] [2016] 0 maskrcnn bwd 234628 140481 1.67
getitem float32 [7 4] [33] [33 4] 0 ssdlite bwd 160179 64782 2.47
getitem float16
op_name dtype dx_size index_size dy_size dim model dir ROCm pytorch (op time) MIOpen HIP Improvement
getitem bfloat16 [128 128] [128] [128 128] 0 llama2 bwd 253429 65687 3.86
getitem bfloat16 [3234 4] [16] [16 4] 0 ssdlite bwd 242037 61545 3.93
getitem bfloat16 [3234 4] [12] [12 4] 0 ssdlite bwd 240565 61848 3.89
getitem bfloat16 [3234 4] [11] [11 4] 0 ssdlite bwd 231941 58150 3.99
getitem bfloat16 [3234 4] [13] [13 4] 0 ssdlite bwd 253237 61297 4.13
getitem bfloat16 [3234 4] [14] [14 4] 0 ssdlite bwd 226148 59874 3.78
getitem bfloat16 [3234 4] [15] [15 4] 0 ssdlite bwd 226548 58327 3.88
getitem bfloat16 [3234 4] [10] [10 4] 0 ssdlite bwd 248805 58737 4.24
getitem bfloat16 [3234 4] [18] [18 4] 0 ssdlite bwd 242149 61439 3.94
getitem bfloat16 [3234 4] [20] [20 4] 0 ssdlite bwd 213460 59003 3.62
getitem bfloat16 [3234 4] [23] [23 4] 0 ssdlite bwd 229716 55039 4.17
getitem bfloat16 [3234 4] [8] [8 4] 0 ssdlite bwd 225156 61190 3.68
getitem bfloat16 [3234 4] [22] [22 4] 0 ssdlite bwd 209716 62719 3.34
getitem bfloat16 [3234 4] [25] [25 4] 0 ssdlite bwd 219940 59003 3.73
getitem bfloat16 [3234 4] [7] [7 4] 0 ssdlite bwd 221348 59572 3.72
getitem bfloat16 [3234 4] [28] [28 4] 0 ssdlite bwd 217460 61937 3.51
getitem bfloat16 [3234 4] [30] [30 4] 0 ssdlite bwd 197748 55857 3.54
getitem bfloat16 [3234 4] [31] [31 4] 0 ssdlite bwd 222724 61101 3.65
getitem bfloat16 [3234 4] [32] [32 4] 0 ssdlite bwd 221268 61457 3.60
getitem bfloat16 [3234 4] [33] [33 4] 0 ssdlite bwd 209668 59768 3.51
getitem bfloat16 [3234 4] [34] [34 4] 0 ssdlite bwd 214500 64959 3.30
getitem bfloat16 [3234] [16] [16] 0 ssdlite bwd 220367 60425 3.65
getitem bfloat16 [149 128] [1490] [1490 128] 0 llama2_7b bwd 293670 77563 3.79
getitem bfloat16 [150 128] [10] [10 128] 0 llama2_7b bwd 223060 62665 3.56
getitem bfloat16 [174 128] [10] [10 128] 0 llama2_7b bwd 261013 62523 4.17
getitem bfloat16 [205 128] [10] [10 128] 0 llama2_7b bwd 257141 57576 4.47
getitem bfloat16 [232 128] [10] [10 128] 0 llama2_7b bwd 254901 63086 4.04
getitem bfloat16 [248 128] [10] [10 128] 0 llama2_7b bwd 265013 61751 4.29
getitem bfloat16 [3234] [15] [15] 0 ssdlite bwd 263381 59564 4.42
getitem bfloat16 [3234 4] [2] [2 4] 0 ssdlite bwd 219956 60399 3.64
getitem bfloat16 [3234 4] [37] [37 4] 0 ssdlite bwd 242933 61128 3.97
getitem bfloat16 [3234] [10] [10] 0 ssdlite bwd 272741 57821 4.72
getitem bfloat16 [3234 4] [38] [38 4] 0 ssdlite bwd 192020 59545 3.22
getitem bfloat16 [3234 4] [41] [41 4] 0 ssdlite bwd 226452 61607 3.68
getitem bfloat16 [3234 4] [6] [6 4] 0 ssdlite bwd 178259 58283 3.06
getitem bfloat16 [3234 4] [44] [44 4] 0 ssdlite bwd 247812 56842 4.36
getitem bfloat16 [3234] [22] [22] 0 ssdlite bwd 254565 60949 4.18
getitem bfloat16 [3234] [25] [25] 0 ssdlite bwd 188339 58140 3.24
getitem bfloat16 [3234 4] [43] [43 4] 0 ssdlite bwd 223300 62603 3.57
getitem bfloat16 [3234] [21] [21] 0 ssdlite bwd 222628 61553 3.62
getitem bfloat16 [3234] [27] [27] 0 ssdlite bwd 188820 59296 3.18
getitem bfloat16 [3234] [24] [24] 0 ssdlite bwd 209604 60096 3.49
getitem bfloat16 [3234 4] [42] [42 4] 0 ssdlite bwd 212964 62051 3.43
getitem bfloat16 [3234 4] [3] [3 4] 0 ssdlite bwd 181732 58051 3.13
getitem bfloat16 [3234 4] [45] [45 4] 0 ssdlite bwd 206852 62442 3.31
getitem bfloat16 [3234] [40] [40] 0 ssdlite bwd 183395 58120 3.16
getitem bfloat16 [1 4] [8] [8 4] 0 ssdlite bwd 211396 65321 3.24
getitem bfloat16 [1] [13] [13] 0 ssdlite bwd 218180 61498 3.55
getitem bfloat16 [3234] [45] [45] 0 ssdlite bwd 192628 62778 3.07
getitem bfloat16 [1 4] [7] [7 4] 0 ssdlite bwd 223796 58297 3.84
getitem bfloat16 [1] [7] [7] 0 ssdlite bwd 220084 58315 3.77
getitem bfloat16 [3234] [4] [4] 0 ssdlite bwd 221716 60378 3.67
getitem bfloat16 [3234 4] [59] [59 4] 0 ssdlite bwd 194932 60751 3.21
getitem bfloat16 [3234] [5] [5] 0 ssdlite bwd 200388 59293 3.38
getitem bfloat16 [2 4] [15] [15 4] 0 ssdlite bwd 216948 61231 3.54
getitem bfloat16 [3234 4] [62] [62 4] 0 ssdlite bwd 216724 62369 3.47
getitem bfloat16 [3234] [57] [57] 0 ssdlite bwd 185412 61531 3.01
getitem bfloat16 [3234 4] [77] [77 4] 0 ssdlite bwd 178740 65425 2.73
getitem bfloat16 [3234 4] [79] [79 4] 0 ssdlite bwd 152915 66118 2.31
getitem bfloat16 [3234 4] [75] [75 4] 0 ssdlite bwd 151603 64020 2.37
getitem bfloat16 [3234] [59] [59] 0 ssdlite bwd 157059 59842 2.62
getitem bfloat16 [8741 4] [2000] [2000 4] 0 maskrcnn bwd 216740 58811 3.69
getitem bfloat16 [8741] [2000] [2000] 0 maskrcnn bwd 186244 64305 2.90
getitem bfloat16 [3234 4] [89] [89 4] 0 ssdlite bwd 142355 65140 2.19
getitem bfloat16 [3234 4] [84] [84 4] 0 ssdlite bwd 130803 64144 2.04
getitem bfloat16 [8741 4] [8741] [8741 4] 0 maskrcnn bwd 234148 58473 4.00
getitem bfloat16 [8741] [8741] [8741] 0 maskrcnn bwd 191907 59646 3.22
getitem bfloat16 [3234] [58] [58] 0 ssdlite bwd 133955 61389 2.18
getitem bfloat16 [3 4] [20] [20 4] 0 ssdlite bwd 134866 56962 2.37
getitem bfloat16 [3] [20] [20] 0 ssdlite bwd 136547 69478 1.97
getitem bfloat16 [3234] [64] [64] 0 ssdlite bwd 134850 58082 2.32
getitem bfloat16 [3234 4] [83] [83 4] 0 ssdlite bwd 142067 58580 2.43
getitem bfloat16 [3 4] [17] [17 4] 0 ssdlite bwd 136002 65371 2.08
getitem bfloat16 [3] [17] [17] 0 ssdlite bwd 133619 61015 2.19
getitem bfloat16 [3234 4] [86] [86 4] 0 ssdlite bwd 135385 64376 2.10
getitem bfloat16 [3234 4] [88] [88 4] 0 ssdlite bwd 136114 65656 2.07
getitem bfloat16 [3234] [65] [65] 0 ssdlite bwd 134338 60838 2.21
getitem bfloat16 [3234] [73] [73] 0 ssdlite bwd 136867 55646 2.46
getitem bfloat16 [3234] [61] [61] 0 ssdlite bwd 137171 60642 2.26
getitem bfloat16 [3 4] [23] [23 4] 0 ssdlite bwd 133922 65051 2.06
getitem bfloat16 [3] [23] [23] 0 ssdlite bwd 131427 68642 1.91
getitem bfloat16 [3 4] [21] [21 4] 0 ssdlite bwd 137427 59860 2.30
getitem bfloat16 [3] [21] [21] 0 ssdlite bwd 135763 63362 2.14
getitem bfloat16 [2 4] [8] [8 4] 0 ssdlite bwd 128722 63077 2.04
getitem bfloat16 [2] [8] [8] 0 ssdlite bwd 137171 61442 2.23
getitem bfloat16 [3 4] [28] [28 4] 0 ssdlite bwd 137922 60855 2.27
getitem bfloat16 [3] [28] [28] 0 ssdlite bwd 134530 63736 2.11
getitem bfloat16 [2 4] [27] [27 4] 0 ssdlite bwd 142290 69122 2.06
getitem bfloat16 [2] [27] [27] 0 ssdlite bwd 140739 66473 2.12
getitem bfloat16 [3 4] [22] [22 4] 0 ssdlite bwd 132035 62989 2.10
getitem bfloat16 [3] [22] [22] 0 ssdlite bwd 138658 61851 2.24
getitem bfloat16 [3234 4] [78] [78 4] 0 ssdlite bwd 134595 65264 2.06
getitem bfloat16 [2 4] [28] [28 4] 0 ssdlite bwd 137602 71238 1.93
getitem bfloat16 [2] [28] [28] 0 ssdlite bwd 144419 66811 2.16
getitem bfloat16 [3234] [67] [67] 0 ssdlite bwd 136675 56180 2.43
getitem bfloat16 [3234] [74] [74] 0 ssdlite bwd 133058 61068 2.18
getitem bfloat16 [3234] [70] [70] 0 ssdlite bwd 156723 60891 2.57
getitem bfloat16 [3 4] [26] [26 4] 0 ssdlite bwd 159459 68518 2.33
getitem bfloat16 [3] [26] [26] 0 ssdlite bwd 150451 65175 2.31
getitem bfloat16 [3 4] [29] [29 4] 0 ssdlite bwd 155683 65567 2.37
getitem bfloat16 [3] [29] [29] 0 ssdlite bwd 156915 64464 2.43
getitem bfloat16 [3 4] [18] [18 4] 0 ssdlite bwd 150467 64304 2.34
getitem bfloat16 [3] [18] [18] 0 ssdlite bwd 150723 61620 2.45
getitem bfloat16 [3234] [68] [68] 0 ssdlite bwd 154467 62722 2.46
getitem bfloat16 [3234] [66] [66] 0 ssdlite bwd 155315 59059 2.63
getitem bfloat16 [3 4] [19] [19 4] 0 ssdlite bwd 156019 62295 2.50
getitem bfloat16 [3] [19] [19] 0 ssdlite bwd 159667 63237 2.52
getitem bfloat16 [3234 4] [90] [90 4] 0 ssdlite bwd 160435 64589 2.48
getitem bfloat16 [3] [25] [25] 0 ssdlite bwd 149315 58011 2.57
getitem bfloat16 [3234 4] [87] [87 4] 0 ssdlite bwd 157955 62882 2.51
getitem bfloat16 [3 4] [10] [10 4] 0 ssdlite bwd 154723 59468 2.60
getitem bfloat16 [3] [10] [10] 0 ssdlite bwd 151491 59593 2.54
getitem bfloat16 [3 4] [27] [27 4] 0 ssdlite bwd 148339 65778 2.26
getitem bfloat16 [3] [27] [27] 0 ssdlite bwd 159731 63664 2.51
getitem bfloat16 [3 4] [30] [30 4] 0 ssdlite bwd 152787 65140 2.35
getitem bfloat16 [3] [30] [30] 0 ssdlite bwd 150835 64837 2.33
getitem bfloat16 [3 4] [16] [16 4] 0 ssdlite bwd 158339 62882 2.52
getitem bfloat16 [3] [16] [16] 0 ssdlite bwd 157603 63451 2.48
getitem bfloat16 [3 4] [33] [33 4] 0 ssdlite bwd 151475 66651 2.27
getitem bfloat16 [3] [33] [33] 0 ssdlite bwd 161203 64962 2.48
getitem bfloat16 [3234 4] [81] [81 4] 0 ssdlite bwd 152003 64820 2.35
getitem bfloat16 [3 4] [34] [34 4] 0 ssdlite bwd 148995 68002 2.19
getitem bfloat16 [3] [34] [34] 0 ssdlite bwd 162515 62988 2.58
getitem bfloat16 [4300 4] [4261] [4261 4] 0 fasterrcnn bwd 202516 61833 3.28
getitem bfloat16 [4300] [4261] [4261] 0 fasterrcnn bwd 209684 63148 3.32
getitem bfloat16 [4261 4] [1000] [1000 4] 0 fasterrcnn bwd 162931 62597 2.60
getitem bfloat16 [4261] [1000] [1000] 0 fasterrcnn bwd 182643 61957 2.95
getitem bfloat16 [3 4] [31] [31 4] 0 ssdlite bwd 159603 65833 2.42
getitem bfloat16 [3] [31] [31] 0 ssdlite bwd 154259 64197 2.40
getitem bfloat16 [2 4] [29] [29 4] 0 ssdlite bwd 152947 67006 2.28
getitem bfloat16 [2] [29] [29] 0 ssdlite bwd 159267 68553 2.32
getitem bfloat16 [3234] [72] [72] 0 ssdlite bwd 157891 57921 2.73
getitem bfloat16 [3234] [69] [69] 0 ssdlite bwd 149843 62028 2.42
getitem bfloat16 [3234 4] [94] [94 4] 0 ssdlite bwd 157171 63664 2.47
getitem bfloat16 [3 4] [11] [11 4] 0 ssdlite bwd 151475 63486 2.39
getitem bfloat16 [3] [11] [11] 0 ssdlite bwd 158163 56179 2.82
getitem bfloat16 [3 4] [13] [13 4] 0 ssdlite bwd 152067 63557 2.39
getitem bfloat16 [3] [13] [13] 0 ssdlite bwd 148723 60499 2.46
getitem bfloat16 [3 4] [12] [12 4] 0 ssdlite bwd 159027 58775 2.71
getitem bfloat16 [3] [12] [12] 0 ssdlite bwd 152643 60002 2.54
getitem bfloat16 [3 4] [14] [14 4] 0 ssdlite bwd 149699 63219 2.37
getitem bfloat16 [3] [14] [14] 0 ssdlite bwd 155843 61033 2.55
getitem bfloat16 [3234] [71] [71] 0 ssdlite bwd 159891 61744 2.59
getitem bfloat16 [3 4] [24] [24 4] 0 ssdlite bwd 152115 66793 2.28
getitem bfloat16 [3] [24] [24] 0 ssdlite bwd 158819 64766 2.45
getitem bfloat16 [3234 4] [101] [101 4] 0 ssdlite bwd 163683 63220 2.59
getitem bfloat16 [1 4] [4] [4 4] 0 ssdlite bwd 147381 73549 2.00
getitem bfloat16 [1] [4] [4] 0 ssdlite bwd 155971 61602 2.53
getitem bfloat16 [1 4] [6] [6 4] 0 ssdlite bwd 154931 60766 2.55
getitem bfloat16 [1] [6] [6] 0 ssdlite bwd 150259 56659 2.65
getitem bfloat16 [3234 4] [85] [85 4] 0 ssdlite bwd 157619 65602 2.40
getitem bfloat16 [3234 4] [92] [92 4] 0 ssdlite bwd 156451 63166 2.48
getitem bfloat16 [4 4] [4] [4 4] 0 ssdlite bwd 158499 62632 2.53
getitem bfloat16 [4] [4] [4] 0 ssdlite bwd 155651 60588 2.57
getitem bfloat16 [3234] [76] [76] 0 ssdlite bwd 140515 62064 2.26
getitem bfloat16 [3 4] [15] [15 4] 0 ssdlite bwd 146547 62970 2.33
getitem bfloat16 [3] [15] [15] 0 ssdlite bwd 144627 62028 2.33
getitem bfloat16 [3234] [80] [80] 0 ssdlite bwd 139490 61406 2.27
getitem bfloat16 [3 4] [32] [32 4] 0 ssdlite bwd 143187 66135 2.17
getitem bfloat16 [3] [32] [32] 0 ssdlite bwd 139154 75397 1.85
getitem bfloat16 [3234 4] [91] [91 4] 0 ssdlite bwd 130467 64624 2.02
getitem bfloat16 [4300 4] [4262] [4262 4] 0 fasterrcnn bwd 191188 59930 3.19
getitem bfloat16 [4300] [4262] [4262] 0 fasterrcnn bwd 189482 58099 3.26
getitem bfloat16 [4262 4] [1000] [1000 4] 0 fasterrcnn bwd 149603 64677 2.31
getitem bfloat16 [4262] [1000] [1000] 0 fasterrcnn bwd 144979 61282 2.37
getitem bfloat16 [1] [5] [5] 0 maskrcnn_ssdlite bwd 130802 61833 2.12
getitem bfloat16 [4 4] [27] [27 4] 0 ssdlite bwd 136306 61086 2.23
getitem bfloat16 [4] [27] [27] 0 ssdlite bwd 136626 61175 2.23
getitem bfloat16 [1 4] [5] [5 4] 0 ssdlite bwd 130178 60037 2.17
getitem bfloat16 [4 4] [28] [28 4] 0 ssdlite bwd 137586 64624 2.13
getitem bfloat16 [4] [28] [28] 0 ssdlite bwd 138771 64446 2.15
getitem bfloat16 [4300 4] [4194] [4194 4] 0 fasterrcnn bwd 188211 60730 3.10
getitem bfloat16 [4300] [4194] [4194] 0 fasterrcnn bwd 203940 60642 3.36
getitem bfloat16 [4194 4] [1000] [1000 4] 0 fasterrcnn bwd 164147 60712 2.70
getitem bfloat16 [2 4] [30] [30 4] 0 ssdlite bwd 160111 67682 2.37
getitem bfloat16 [2] [30] [30] 0 ssdlite bwd 156755 77442 2.02
getitem bfloat16 [4 4] [33] [33 4] 0 ssdlite bwd 150579 63593 2.37
getitem bfloat16 [4] [33] [33] 0 ssdlite bwd 157987 57797 2.73
getitem bfloat16 [3234] [77] [77] 0 ssdlite bwd 154387 59539 2.59
getitem bfloat16 [3234 4] [93] [93 4] 0 ssdlite bwd 145795 65068 2.24
getitem bfloat16 [3234 4] [105] [105 4] 0 ssdlite bwd 157283 65033 2.42
getitem bfloat16 [3234 4] [113] [113 4] 0 ssdlite bwd 154243 65442 2.36
getitem bfloat16 [4194] [1000] [1000] 0 fasterrcnn bwd 156547 62828 2.49
getitem bfloat16 [3 4] [37] [37 4] 0 ssdlite bwd 143443 69866 2.05
getitem bfloat16 [3] [37] [37] 0 ssdlite bwd 152179 66252 2.30
getitem bfloat16 [4 4] [37] [37 4] 0 ssdlite bwd 139618 63674 2.19
getitem bfloat16 [4] [37] [37] 0 ssdlite bwd 149107 69843 2.13
getitem bfloat16 [2 4] [7] [7 4] 0 ssdlite bwd 139939 61844 2.26
getitem bfloat16 [2] [7] [7] 0 ssdlite bwd 129730 60244 2.15
getitem bfloat16 [4 4] [18] [18 4] 0 ssdlite bwd 137235 62271 2.20
getitem bfloat16 [4] [18] [18] 0 ssdlite bwd 138933 62377 2.23
getitem bfloat16 [3 4] [9] [9 4] 0 ssdlite bwd 137107 60457 2.27
getitem bfloat16 [3] [9] [9] 0 ssdlite bwd 136803 59426 2.30
getitem bfloat16 [3234] [79] [79] 0 ssdlite bwd 142323 61986 2.30
getitem bfloat16 [4 4] [30] [30 4] 0 ssdlite bwd 131635 65079 2.02
getitem bfloat16 [4] [30] [30] 0 ssdlite bwd 135666 64155 2.11
getitem bfloat16 [3234] [75] [75] 0 ssdlite bwd 137907 63124 2.18
getitem bfloat16 [3 4] [35] [35 4] 0 ssdlite bwd 135746 68813 1.97
getitem bfloat16 [4 4] [39] [39 4] 0 ssdlite bwd 182899 69011 2.65
getitem bfloat16 [8] [2008] [2008] 0 maskrcnn bwd 264645 178237 1.48
getitem bfloat16 [5] [14] [14] 0 maskrcnn_ssdlite bwd 158099 61244 2.58
getitem bfloat16 [8 4] [8] [8 4] 0 ssdlite bwd 187412 60266 3.11
getitem bfloat16 [4197 4] [1000] [1000 4] 0 fasterrcnn bwd 161043 56605 2.85
getitem bfloat16 [4197] [1000] [1000] 0 fasterrcnn bwd 176067 62755 2.81
getitem bfloat16 [16] [2016] [2016] 0 maskrcnn bwd 238645 125743 1.90
getitem bfloat16 [7 4] [33] [33 4] 0 ssdlite bwd 164963 63858 2.58
  • Average over all cases
Op Type average
getitem_backward float16 2.89
getitem_backward float32 2.75
getitem_backward bfloat16 2.76

seungmanhan avatar Apr 08 '24 19:04 seungmanhan

The driver is being modified while resolving merge conflicts.

seungmanhan avatar Apr 08 '24 19:04 seungmanhan

@junliume Even if I do a clean build and make check, i cannot determine the cause of the failure in Jenkins - HIP Package. Please check the reason of failure.

seungmanhan avatar Apr 16 '24 04:04 seungmanhan

@amberhassaan Have your review comments been addressed ?

JehandadKhan avatar May 08 '24 01:05 JehandadKhan

@amberhassaan Have your review comments been addressed ?

No, not quite. I haven't had a chance to leave a detailed review, but I'd like @seungmanhan to

  1. replace macros with functions that are __device__ __host__.
  2. give better comments. Are we converting an linear index to a 4D or 5D index? Are NCHW components dims or strides?
  3. It would help me if the documentation clarifies what getitem backward does. May be it's just my ignorance, but I feel some documentation or a link to one is needed here.

amberhassaan avatar May 08 '24 02:05 amberhassaan

@amberhassaan Have your review comments been addressed ?

No, not quite. I haven't had a chance to leave a detailed review, but I'd like @seungmanhan to

  1. replace macros with functions that are __device__ __host__.
  2. give better comments. Are we converting an linear index to a 4D or 5D index? Are NCHW components dims or strides?
  3. It would help me if the documentation clarifies what getitem backward does. May be it's just my ignorance, but I feel some documentation or a link to one is needed here.

The macro was changed to a constexpr function. Made function names easier and added comments. It it a backward of getitem. And getitem is a kind of tensor indexing, slicing, masking.(https://pytorch.org/tensordict/stable/tutorials/tensordict_slicing.html?highlight=tensor+slice)

seungmanhan avatar May 10 '24 12:05 seungmanhan

@CAHEK7 @amberhassaan @JehandadKhan re-request review

junliume avatar May 12 '24 05:05 junliume

@atamazov Any final opinions before we merge this?

JehandadKhan avatar May 22 '24 01:05 JehandadKhan

@seungmanhan Please see https://github.com/ROCm/MIOpen/pull/2883#pullrequestreview-2072076879. This review may also apply to other primitives added by your team.

atamazov avatar May 22 '24 19:05 atamazov

@seungmanhan we have a HIP tidy issue:

[2024-05-23T13:46:38.046Z] /home/jenkins/workspace/MLLIBS_MIOpen_impl_getitem_bwd/src/include/miopen/miopen_internal.h:119:1: warning: function 'miopenConvolutionCKBackwardWeightsGetWorkSpaceSize' has a definition with different parameter names [readability-inconsistent-declaration-parameter-name]

[2024-05-23T13:46:38.046Z]   119 | miopenConvolutionCKBackwardWeightsGetWorkSpaceSize(const miopenAlphaBetaCase_t ab_case,

[2024-05-23T13:46:38.046Z]       | ^

[2024-05-23T13:46:38.046Z] /home/jenkins/workspace/MLLIBS_MIOpen_impl_getitem_bwd/src/convolution_api.cpp:200:1: note: the definition seen here

[2024-05-23T13:46:38.046Z]   200 | miopenConvolutionCKBackwardWeightsGetWorkSpaceSize(const miopenAlphaBetaCase_t alpha_beta_case,

[2024-05-23T13:46:38.046Z]       | ^

[2024-05-23T13:46:38.046Z] /home/jenkins/workspace/MLLIBS_MIOpen_impl_getitem_bwd/src/include/miopen/miopen_internal.h:119:1: note: differing parameters are named here: ('ab_case'), in definition: ('alpha_beta_case')

[2024-05-23T13:46:38.046Z]   119 | miopenConvolutionCKBackwardWeightsGetWorkSpaceSize(const miopenAlphaBetaCase_t ab_case,

[2024-05-23T13:46:38.046Z]       | ^                                                                              ~~~~~~~

[2024-05-23T13:46:38.046Z]       |                                                                                alpha_beta_case

junliume avatar May 24 '24 00:05 junliume

@junliume Can you please confirm why the github action is failing?

seungmanhan avatar May 24 '24 06:05 seungmanhan

@junliume When I add miopen::IsUnset(MIOPEN_ENV(MIOPEN_TEST_ALL) in gtest, GitHub action fails. Can you please check the log 363dbe2?

seungmanhan avatar May 30 '24 07:05 seungmanhan

@seungmanhan

@junliume When I add miopen::IsUnset(MIOPEN_ENV(MIOPEN_TEST_ALL) in gtest, GitHub action fails. Can you please check the log 363dbe2?

You can check locally

  • Formatting - run this formatting script, check if any files modified, and commit changes.
#!/bin/bash
find . -iname '*.h' \
    -o -iname '*.hpp' \
    -o -iname '*.cpp' \
    -o -iname '*.h.in' \
    -o -iname '*.hpp.in' \
    -o -iname '*.cpp.in' \
    -o -iname '*.cl' \
| grep -v -E '(build/)|(install/)|(fin/)' \
| xargs -n 1 -P $(nproc) -I{} -t clang-format-12 -style=file {} -i 2>/dev/null
  • static checks
 make -j $((`nproc`-4)) -k analyze

atamazov avatar May 30 '24 09:05 atamazov

@seungmanhan

@junliume When I add miopen::IsUnset(MIOPEN_ENV(MIOPEN_TEST_ALL) in gtest, GitHub action fails. Can you please check the log 363dbe2?

You can check locally

  • Formatting - run this formatting script, check if any files modified, and commit changes.
#!/bin/bash
find . -iname '*.h' \
    -o -iname '*.hpp' \
    -o -iname '*.cpp' \
    -o -iname '*.h.in' \
    -o -iname '*.hpp.in' \
    -o -iname '*.cpp.in' \
    -o -iname '*.cl' \
| grep -v -E '(build/)|(install/)|(fin/)' \
| xargs -n 1 -P $(nproc) -I{} -t clang-format-12 -style=file {} -i 2>/dev/null
  • static checks
 make -j $((`nproc`-4)) -k analyze

I confirmed that this alone does not resolve problems that occur in github actions.

seungmanhan avatar May 31 '24 02:05 seungmanhan

@JehandadKhan @randyspauldingamd @CAHEK7 last ping before last round of CI and then plan to merge

junliume avatar Jun 05 '24 23:06 junliume

@seungmanhan we still get a few tests persistently failing at Fp32 Hip All gfx90a stage, could you help to check them out?

junliume avatar Jun 07 '24 22:06 junliume

@junliume, please ignore the Windows build failure; the #3043 will fix that.

apwojcik avatar Jun 11 '24 10:06 apwojcik

@junliume Can you find out why PRs keep failing at the build stage? (https://github.com/ROCm/MIOpen/pull/2883/commits/6255e87b863630c107bb830b1c9f9d5f69e58ef4)

seungmanhan avatar Jun 11 '24 23:06 seungmanhan

@seungmanhan could you help to resolve the conflict? We are having some CI issues and let me know if you need help. Thanks!

junliume avatar Jun 28 '24 00:06 junliume

@junliume Can you confirm what is being tested in 'Jenkins - Fp32 Hip All gfx90a' or send me the script?

seungmanhan avatar Jul 08 '24 05:07 seungmanhan

@seungmanhan @apwojcik The Windows build is failing while the Linux side is passing:

lld-link: error: undefined symbol: enum miopenStatus_t __cdecl miopen::GetitemBackward(struct miopen::Handle &, void *, unsigned __int64, struct miopen::TensorDescriptor const &, void const *, unsigned int, struct miopen::TensorDescriptor const *const *, void const **, struct miopen::TensorDescriptor const &, void *, struct miopen::TensorDescriptor const &, void *, unsigned int, int const *, unsigned int, int const *, unsigned int)

Can we either (1) this feature is provided on Linux only and we protect it to be so; (2) @apwojcik anyway to support this feature on Windows?

junliume avatar Jul 24 '24 00:07 junliume

@seungmanhan @apwojcik The Windows build is failing while the Linux side is passing:

lld-link: error: undefined symbol: enum miopenStatus_t __cdecl miopen::GetitemBackward(struct miopen::Handle &, void *, unsigned __int64, struct miopen::TensorDescriptor const &, void const *, unsigned int, struct miopen::TensorDescriptor const *const *, void const **, struct miopen::TensorDescriptor const &, void *, struct miopen::TensorDescriptor const &, void *, unsigned int, int const *, unsigned int, int const *, unsigned int)

Can we either (1) this feature is provided on Linux only and we protect it to be so; (2) @apwojcik anyway to support this feature on Windows?

Please export the missing (undefined) symbols with MOPEN_INTERNALS_EXPORT.

apwojcik avatar Jul 24 '24 10:07 apwojcik