ncnn icon indicating copy to clipboard operation
ncnn copied to clipboard

Improve yolov8 post-processing efficiency.

Open whyb opened this issue 1 year ago • 2 comments

Using OpenMP to improve yolov8 post-processing efficiency.

#if NCNN_SIMPLEOMP
#include "simpleomp.h"
#else
#include <omp.h>
#endif

...

static void parse_yolov8_detections(
    float* inputs, float confidence_threshold,
    int num_channels, int num_anchors, int num_labels,
    int infer_img_width, int infer_img_height,
    std::vector<Object>& objects)
{
    std::vector<Object> detections;
    cv::Mat output = cv::Mat((int)num_channels, (int)num_anchors, CV_32F, inputs).t();

    const size_t stride = num_anchors;
    const size_t num_threads = omp_get_max_threads();
    const size_t chunk_size = stride / num_threads;
    #pragma omp parallel shared(detections)
    {
        const size_t thread_id = omp_get_thread_num();
        const size_t start_idx = thread_id * chunk_size;
        const size_t end_idx = (thread_id == num_threads - 1) ? stride : (start_idx + chunk_size);
        for (int i = start_idx; i < end_idx; i++)
        {
            const float* row_ptr = output.row(i).ptr<float>();
            const float* bboxes_ptr = row_ptr;
            const float* scores_ptr = row_ptr + 4;
            const float* max_s_ptr = std::max_element(scores_ptr, scores_ptr + num_labels);
            float score = *max_s_ptr;
            if (score > confidence_threshold)
            {
                float x = *bboxes_ptr++;
                float y = *bboxes_ptr++;
                float w = *bboxes_ptr++;
                float h = *bboxes_ptr;

                float x0 = clampf((x - 0.5f * w), 0.f, (float)infer_img_width);
                float y0 = clampf((y - 0.5f * h), 0.f, (float)infer_img_height);
                float x1 = clampf((x + 0.5f * w), 0.f, (float)infer_img_width);
                float y1 = clampf((y + 0.5f * h), 0.f, (float)infer_img_height);

                cv::Rect_<float> bbox;
                bbox.x = x0;
                bbox.y = y0;
                bbox.width = x1 - x0;
                bbox.height = y1 - y0;
                Object object;
                object.label = max_s_ptr - scores_ptr;
                object.prob = score;
                object.rect = bbox;
                #pragma omp critical
                {
                    detections.push_back(object);
                }
            }
        }
    }
    objects = detections;
}

whyb avatar Aug 29 '24 06:08 whyb

CLA assistant check
Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

:white_check_mark: nihui
:x: whyb
You have signed the CLA already but the status is still pending? Let us recheck it.

tencent-adm avatar Aug 29 '24 06:08 tencent-adm

Hello @nihui,

I found the some ncnn's example usage of OpenMP parallel sections ( #pragma omp parallel sections ) in the following files:

examples/yolov7.cpp
examples/scrfd_crowdhuman.cpp
examples/fasterrcnn.cpp
examples/yolov5.cpp
examples/yolox.cpp
examples/scrfd.cpp
examples/rfcn.cpp
examples/nanodet.cpp
examples/retinaface.cpp

But I checked the implementation of src/simpleomp.cpp and I did not find the necessary functions for sections:

GOMP_parallel_sections_start()
GOMP_parallel_sections()
GOMP_sections_start()
GOMP_sections_next()
GOMP_sections_end()
GOMP_sections_end_cancel()
GOMP_sections_end_nowait()

Does this mean that simpleomp will not support OpenMP sections feature for a long time?

If we follow the same standard, since the example previously allowed the use of features that simpleomp does not support, the OpenMP shared clause feature should be allowed to be added.🙈

whyb avatar Sep 03 '24 06:09 whyb

hi, yolov8 examples are updated with full support for detection, segmentation, classification, pose estimation and obb https://github.com/Tencent/ncnn/tree/master/examples

android demo https://github.com/nihui/ncnn-android-yolov8

detailed instruction (zh) https://zhuanlan.zhihu.com/p/16030630352

nihui avatar Jan 08 '25 08:01 nihui