vision box_area and box_iou functions for cxcywh format

🚀 The feature

Native box_area and box_iou functions for cxcywh format.

Motivation, pitch

Since the cxcywh format is common, we can use faster and simpler functions to calculate box area and box IoU directly in this format.

Currently, we first need to convert to the xyxy format before using box_area and box_iou in torchvision.

Alternatives

No response

Additional context

I can open a pull request from alperenunlu/vision@cf93d9e

Mar 11 '25 21:03 alperenunlu

Hey @alperenunlu, thanks for your neat PR with the attached test case!

This definitely provides a more straightforward way to compute area and IoU for cxcywhr format. At the same time, this also adds some code, which is slightly redundant with the functions for xyxy format.

I'd like to understand the benefits of using these new functions over a two-step process: first convert the bounding box format and then compute IoU and areas. Could you please help me understand the trade-offs between these approaches? What are the advantages of having separate functions for cxcywhr format, and how do they outweigh the added complexity? I am trying to understand if the box conversion is actually the bottleneck in data pipelines involving these operations, and what could be the gain in having dedicated optimized functions to address it.

Thanks in advance for your input! Best regards, Antoine

Apr 14 '25 14:04 AntoineSimoulin

Hi @AntoineSimoulin,

Thanks a lot for the thoughtful feedback!

Just to clarify up front — this PR targets the standard cxcywh format (center-x, center-y, width, height), not cxcywhr with rotation. There’s no orientation handling here (though box_area_center would technically still work with cxcywhr by ignoring the angle component).

The motivation behind these functions comes from workflows where bounding boxes are already in cxcywh format — not limited to, but including models like YOLO and calculating mAP if the format is already cxcywh. This format is common in various pipelines, both for models and preprocessing steps.

Here are a few key advantages of providing native support:

Performance: Avoiding the conversion to xyxy saves time in tight loops — particularly in training, where IoU and area computations are applied to many boxes across all cells and predictions, or during evaluation of mAP.
Precision and type handling: When input data is in integer format (as is common in datasets or quantized models), converting to xyxy typically requires casting to float, which adds computational overhead. If the data needs to remain in integer form, casting back from float can lead to loss of precision due to rounding. Native operations in cxcywh help avoid these extra conversions and maintain tighter control over types and numerical consistency. (This is why the test cases needed INT_BOXES_CXCYWH — converting to xyxy casts to float and recasting to integer remove fractional part.)
Cleaner, safer code: In pipelines that operate directly in cxcywh, having native functions avoids back-and-forth conversions and reduces the risk of subtle bugs or inconsistencies.

I understand the concern around added code — I’ve tried to keep the implementation minimal, well-contained and tested, and I’m definitely open to suggestions.

Thanks again — happy to discuss further!

Best,
Alperen

Apr 14 '25 16:04 alperenunlu

Hey @alperenunlu, yeah sorry for the confusion, I meant the cxcywh format (center-x, center-y, width, height). I think all of this makes sense. Would it be possible for you to produce a small benchmark to illustrate the gains in term of performance, precision and type handling? It will be extremely useful to justify the decision to add the code. Let me know what is possible for you. Thanks a lot for your time and efforts!

Apr 17 '25 12:04 AntoineSimoulin

Hey @AntoineSimoulin,

I've extensively profiled the code and included both the implementation and output below. Here's a summary of the results:

box_area_center is approximately 10x faster
box_iou_center is about 1.25x faster

Under more realistic conditions (fewer boxes), the improvements are even more meaningful:

box_area_center is 6x faster
box_iou_center is 2x faster

These benchmarks were run on a T4 GPU, and the results are consistent with my tests on an M1 MacBook (both CPU and MPS backends).

I also ran a separate benchmark using perf_counter_ns, which showed:

6x speedup for the area function
1.7x speedup for the IoU function

To ensure consistency, I ran 10 iterations across box counts ranging from 1 to 1001 (in steps of 5). The GPU performance gains remain consistent throughout.

One thing to note: while the GPU speedups are stable, on CPU, the performance gain for the IoU function diminishes once comparisons exceed 100x100 boxes. At that point, the IoU computation becomes the bottleneck, and the speedup drops to around 1x.

Feel free to test it further!

Profiling Code

-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg       CPU Mem  Self CPU Mem      CUDA Mem  Self CUDA Mem    # of Calls  
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                            Area 2 Step         0.00%       0.000us         0.00%       0.000us       0.000us        2.276s      1213.53%        2.276s      11.380ms           0 b           0 b           0 b           0 b           200  
void at::native::elementwise_kernel<128, 2, at::nati...         0.00%       0.000us         0.00%       0.000us       0.000us      94.302ms        50.28%      94.302ms       3.949us           0 b           0 b           0 b           0 b         23880  
                                            Area 2 Step        16.71%     716.465ms        54.50%        2.337s      11.687ms       0.000us         0.00%      93.617ms     468.083us           0 b          -8 b           0 b     -62.96 Mb           200  
                                              aten::mul        14.79%     634.346ms        24.56%        1.053s      50.630us      73.897ms        39.40%      73.929ms       3.554us       1.52 Mb       1.52 Mb      42.97 Mb      42.97 Mb         20800  
                                              aten::sub        11.42%     489.867ms        18.63%     799.095ms      47.565us      63.103ms        33.65%      63.129ms       3.758us       1.52 Mb       1.52 Mb      34.38 Mb      34.38 Mb         16800  
void at::native::elementwise_kernel<128, 2, at::nati...         0.00%       0.000us         0.00%       0.000us       0.000us      61.857ms        32.98%      61.857ms       3.885us           0 b           0 b           0 b           0 b         15920  
                                              aten::add         4.94%     212.045ms         7.19%     308.375ms      35.043us      31.516ms        16.80%      31.516ms       3.581us       1.52 Mb       1.52 Mb      17.19 Mb      17.19 Mb          8800  
                                            aten::stack         2.36%     101.184ms        11.35%     486.610ms     110.593us       0.000us         0.00%      18.682ms       4.246us       3.04 Mb           0 b      31.38 Mb           0 b          4400  
                                              aten::cat         4.36%     186.857ms         5.79%     248.417ms      56.458us      18.665ms         9.95%      18.682ms       4.246us       3.04 Mb       3.04 Mb      31.38 Mb      31.38 Mb          4400  
void at::native::(anonymous namespace)::CatArrayBatc...         0.00%       0.000us         0.00%       0.000us       0.000us      18.665ms         9.95%      18.665ms       4.666us           0 b           0 b           0 b           0 b          4000  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      11.828ms         6.31%      11.828ms       2.957us           0 b           0 b           0 b           0 b          4000  
                                               aten::to         0.18%       7.706ms         6.94%     297.843ms     212.745us       0.000us         0.00%     349.500us       0.250us       4.69 Kb           0 b       1.57 Mb           0 b          1400  
                                         aten::_to_copy         0.43%      18.509ms         6.77%     290.137ms     207.241us       0.000us         0.00%     349.500us       0.250us       4.69 Kb           0 b       1.57 Mb           0 b          1400  
                                            aten::copy_         0.28%      12.124ms         0.42%      18.129ms      12.950us     349.500us         0.19%     349.500us       0.250us           0 b           0 b           0 b           0 b          1400  
                       Memcpy HtoD (Pageable -> Device)         0.00%       0.000us         0.00%       0.000us       0.000us     349.500us         0.19%     349.500us       1.748us           0 b           0 b           0 b           0 b           200  
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
Self CPU time total: 4.289s
Self CUDA time total: 187.546ms

-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg       CPU Mem  Self CPU Mem      CUDA Mem  Self CUDA Mem    # of Calls  
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                            Area 1 Step         0.00%       0.000us         0.00%       0.000us       0.000us     126.239ms       783.14%     126.239ms     631.194us           0 b           0 b           0 b           0 b           200  
                                              aten::mul        17.54%      54.825ms        31.54%      98.557ms      20.533us      15.768ms        97.82%      15.774ms       3.286us       1.52 Mb       1.52 Mb       8.59 Mb       8.59 Mb          4800  
void at::native::elementwise_kernel<128, 2, at::nati...         0.00%       0.000us         0.00%       0.000us       0.000us      15.714ms        97.49%      15.714ms       3.948us           0 b           0 b           0 b           0 b          3980  
                                            Area 1 Step        24.81%      77.535ms        50.17%     156.796ms     783.981us       0.000us         0.00%       7.884ms      39.419us           0 b           0 b           0 b      -4.30 Mb           200  
                                               aten::to         1.40%       4.388ms         7.82%      24.445ms      17.461us       0.000us         0.00%     351.893us       0.251us       4.69 Kb           0 b       1.57 Mb           0 b          1400  
                                         aten::_to_copy         2.25%       7.046ms         6.42%      20.058ms      14.327us       0.000us         0.00%     351.893us       0.251us       4.69 Kb           0 b       1.57 Mb           0 b          1400  
                                            aten::copy_         1.22%       3.816ms         2.71%       8.473ms       6.052us     351.893us         2.18%     351.893us       0.251us           0 b           0 b           0 b           0 b          1400  
                       Memcpy HtoD (Pageable -> Device)         0.00%       0.000us         0.00%       0.000us       0.000us     351.893us         2.18%     351.893us       1.759us           0 b           0 b           0 b           0 b           200  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us      53.470us         0.33%      53.470us       2.673us           0 b           0 b           0 b           0 b            20  
                                       cudaLaunchKernel        12.22%      38.188ms        12.24%      38.261ms       9.565us       0.000us         0.00%       6.464us       0.002us           0 b           0 b           0 b           0 b          4000  
                                           Unrecognized         0.02%      73.437us         0.02%      73.437us      36.718us       6.464us         0.04%       6.464us       3.232us           0 b           0 b           0 b           0 b             2  
                                             aten::rand         1.09%       3.413ms         3.22%      10.062ms      12.578us       0.000us         0.00%       0.000us       0.000us       1.52 Mb           0 b           0 b           0 b           800  
                                            aten::empty         0.59%       1.856ms         0.59%       1.856ms       2.320us       0.000us         0.00%       0.000us       0.000us       1.52 Mb       1.52 Mb           0 b           0 b           800  
                                         aten::uniform_         1.53%       4.792ms         1.53%       4.792ms       5.991us       0.000us         0.00%       0.000us       0.000us           0 b           0 b           0 b           0 b           800  
                                               [memory]         0.00%       0.000us         0.00%       0.000us       0.000us       0.000us         0.00%       0.000us       0.000us      -9.89 Mb      -9.89 Mb      -5.85 Mb      -5.85 Mb          7817  
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
Self CPU time total: 312.503ms
Self CUDA time total: 16.119ms

-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg       CPU Mem  Self CPU Mem      CUDA Mem  Self CUDA Mem    # of Calls  
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                             IoU 2 Step         0.00%       0.000us         0.00%       0.000us       0.000us        2.479s       240.37%        2.479s      12.396ms           0 b           0 b           0 b           0 b           200  
                                             IoU 2 Step        19.78%     862.283ms        57.56%        2.509s      12.543ms       0.000us         0.00%     515.414ms       2.577ms           0 b           8 b           0 b     -29.95 Gb           200  
                                              aten::sub        10.00%     435.799ms        16.57%     722.345ms      17.364us     305.982ms        29.67%     305.989ms       7.355us       3.04 Mb       3.04 Mb      15.03 Gb      15.03 Gb         41600  
void at::native::elementwise_kernel<128, 2, at::nati...         0.00%       0.000us         0.00%       0.000us       0.000us     238.418ms        23.12%     238.418ms       4.608us           0 b           0 b           0 b           0 b         51740  
                                              aten::mul        11.24%     489.732ms        18.53%     807.787ms      17.715us     225.472ms        21.86%     225.497ms       4.945us       3.04 Mb       3.04 Mb       5.07 Gb       5.07 Gb         45600  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us     180.455ms        17.50%     180.455ms      22.278us           0 b           0 b           0 b           0 b          8100  
void at::native::elementwise_kernel<128, 2, at::nati...         0.00%       0.000us         0.00%       0.000us       0.000us     127.079ms        12.32%     127.079ms       3.991us           0 b           0 b           0 b           0 b         31840  
                                              aten::add         4.70%     204.810ms         7.63%     332.734ms      15.404us     113.323ms        10.99%     113.331ms       5.247us       3.04 Mb       3.04 Mb       5.02 Gb       5.02 Gb         21600  
                                              aten::min         0.17%       7.548ms         2.14%      93.468ms      23.367us       0.000us         0.00%      97.495ms      24.374us           0 b           0 b       9.89 Gb           0 b          4000  
                                          aten::minimum         1.22%      53.213ms         1.97%      85.920ms      21.480us      97.487ms         9.45%      97.495ms      24.374us           0 b           0 b       9.89 Gb       9.89 Gb          4000  
void at::native::elementwise_kernel<128, 2, at::nati...         0.00%       0.000us         0.00%       0.000us       0.000us      97.424ms         9.45%      97.424ms      24.478us           0 b           0 b           0 b           0 b          3980  
                                              aten::max         0.20%       8.687ms         3.13%     136.315ms      34.079us       0.000us         0.00%      94.151ms      23.538us           0 b           0 b       9.89 Gb           0 b          4000  
                                          aten::maximum         1.45%      63.235ms         2.93%     127.628ms      31.907us      94.138ms         9.13%      94.151ms      23.538us           0 b           0 b       9.89 Gb       9.89 Gb          4000  
void at::native::elementwise_kernel<128, 2, at::nati...         0.00%       0.000us         0.00%       0.000us       0.000us      94.075ms         9.12%      94.075ms      23.637us           0 b           0 b           0 b           0 b          3980  
                                            aten::clamp         1.47%      64.229ms         2.76%     120.170ms      30.042us      85.156ms         8.26%      85.165ms      21.291us           0 b           0 b       9.93 Gb       9.93 Gb          4000  
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
Self CPU time total: 4.358s
Self CUDA time total: 1.031s

-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg       CPU Mem  Self CPU Mem      CUDA Mem  Self CUDA Mem    # of Calls  
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                             IoU 1 Step         0.00%       0.000us         0.00%       0.000us       0.000us        2.197s       269.32%        2.197s      10.985ms           0 b           0 b           0 b           0 b           200  
                                             IoU 1 Step        21.92%     843.976ms        58.19%        2.241s      11.203ms       0.000us         0.00%     407.653ms       2.038ms           0 b           8 b           0 b     -29.90 Gb           200  
                                              aten::sub         6.39%     245.836ms        10.37%     399.351ms      22.690us     212.517ms        26.05%     212.539ms      12.076us       3.04 Mb       3.04 Mb      14.99 Gb      14.99 Gb         17600  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us     180.064ms        22.07%     180.064ms      22.230us           0 b           0 b           0 b           0 b          8100  
                                              aten::div         8.97%     345.251ms        15.25%     587.278ms      28.235us     136.253ms        16.70%     136.264ms       6.551us       1.52 Mb       1.52 Mb       5.07 Gb       5.07 Gb         20800  
void at::native::elementwise_kernel<128, 2, at::nati...         0.00%       0.000us         0.00%       0.000us       0.000us     115.930ms        14.21%     115.930ms       5.826us           0 b           0 b           0 b           0 b         19900  
                                              aten::mul         6.11%     235.362ms        10.46%     402.684ms      29.609us     108.396ms        13.29%     108.438ms       7.973us       3.04 Mb       3.04 Mb       5.00 Gb       5.00 Gb         13600  
void at::native::elementwise_kernel<128, 2, at::nati...         0.00%       0.000us         0.00%       0.000us       0.000us     108.236ms        13.27%     108.236ms       9.065us           0 b           0 b           0 b           0 b         11940  
                                              aten::min         0.31%      11.982ms         2.90%     111.731ms      27.933us       0.000us         0.00%      94.949ms      23.737us           0 b           0 b       9.89 Gb           0 b          4000  
                                          aten::minimum         1.62%      62.294ms         2.59%      99.749ms      24.937us      94.938ms        11.64%      94.949ms      23.737us           0 b           0 b       9.89 Gb       9.89 Gb          4000  
void at::native::elementwise_kernel<128, 2, at::nati...         0.00%       0.000us         0.00%       0.000us       0.000us      94.878ms        11.63%      94.878ms      23.839us           0 b           0 b           0 b           0 b          3980  
                                              aten::max         0.28%      10.898ms         2.99%     114.965ms      28.741us       0.000us         0.00%      94.356ms      23.589us           0 b           0 b       9.89 Gb           0 b          4000  
                                          aten::maximum         1.81%      69.625ms         2.70%     104.067ms      26.017us      94.356ms        11.57%      94.356ms      23.589us           0 b           0 b       9.89 Gb       9.89 Gb          4000  
void at::native::elementwise_kernel<128, 2, at::nati...         0.00%       0.000us         0.00%       0.000us       0.000us      94.295ms        11.56%      94.295ms      23.692us           0 b           0 b           0 b           0 b          3980  
                                            aten::clamp         2.05%      79.075ms         3.12%     120.056ms      30.014us      85.122ms        10.43%      85.122ms      21.281us           0 b           0 b       9.93 Gb       9.93 Gb          4000  
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
Self CPU time total: 3.850s
Self CUDA time total: 815.776ms

Apr 22 '25 00:04 alperenunlu

@alperenunlu Amazing! The benchmark looks good. Can you submit a PR with the changes from cf93d9e?

I was thinking that instead of publicly exposing box_iou_center, we could rename it box_iou_cxcywh. We could add a parameter in_fmt: str = xyxy to box_iou and depending on the bounding box input format either use to the original box_iou function or dispatch to box_iou_cxcywh (kind of similar to what's done for box_convert). Let me know what you think. Thanks for your time and contribution!

Apr 23 '25 16:04 AntoineSimoulin

@AntoineSimoulin Thanks! Let’s add it this way for now. Since there are other IoU functions (generalized, distance, and complete), I can work on them afterward. Once we have them, we can gradually shift to a dispatched style in a future version update.

What do you think?

This is the PR: #8992 I can update the branch then we can merge it.

Apr 29 '25 12:04 alperenunlu

@NicolasHug Could you also take a look?

Apr 29 '25 12:04 alperenunlu

I created the dispatch style and changed tests accordingly. All tests are successful. pr: https://github.com/pytorch/vision/pull/8992

Jul 12 '25 02:07 alperenunlu