DlibDotNet icon indicating copy to clipboard operation
DlibDotNet copied to clipboard

Extract face coordinates from upsampled image using Frontal Face Detector

Open joao-sgoncalves opened this issue 3 years ago • 3 comments

Hello,

I am using the FrontalFaceDetector class to perform face detection using Dlib (thanks for the great library!). For that, I followed the example you provide (https://github.com/takuya-takeuchi/DlibDotNet/tree/master/examples/FaceDetection). There's a step there where you execute a function called pyramid up, which upsamples the image by a factor of 2, as far as I can tell. I am doing the same thing, to improve face detection on smaller faces. The problem is that I receive a bitmap as input and have to output the coordinates of the faces found on that image, which have to respect the original image. If I perform pyramid up on that image, the face coordinates returned will be incorrect, since they apply to the upsampled image.

public IList<System.Drawing.Rectangle> Detect(Bitmap image)
{
    Array2D<RgbPixel> imageArray = image.ToArray2D<RgbPixel>();

    Dlib.PyramidUp(imageArray);

    DlibDotNet.Rectangle[] faces = _faceDetector.Operator(imageArray);

    List<System.Drawing.Rectangle> faceRects = new();

    foreach (DlibDotNet.Rectangle face in faces)
    {
        faceRects.Add(new System.Drawing.Rectangle(face.Left / 2, face.Top / 2, (int)face.Width / 2, (int)face.Height / 2));
    }

    return faceRects;
}

I solved the problem by dividing each coordinate by the number of times the image was upsampled (in this case, 2). This does solve the problem, and I now have correct coordinates for the original image. Is this a correct approach? I have seen that the python library supports passing a parameter with the number of times to upsample the image, and was wondering if there was something in this library that performed the same, or there was a better approach to this.

Thanks for your help.

joao-sgoncalves avatar Aug 09 '22 09:08 joao-sgoncalves

@joao-sgoncalves Good. I understand what you are saying. We should reproject coordinates from upsampled ones.

I want to use same interface with Python. So let me check and please give me time to investigate more further.

Thanks.

takuya-takeuchi avatar Aug 10 '22 00:08 takuya-takeuchi

Ok, thanks for your help @takuya-takeuchi :)

joao-sgoncalves avatar Aug 10 '22 13:08 joao-sgoncalves

@joao-sgoncalves I forgot usage of DlibDotNet :( You need not to adjust returned points manually.

You can do the following steps

static void Main(string[] args)
{
    using var original = Dlib.LoadBmp<RgbPixel>("Lenna.bmp");
    using var image = Dlib.LoadBmp<RgbPixel>("Lenna.bmp");
    using var faceDetector = Dlib.GetFrontalFaceDetector();
    using var pyr = new PyramidDown(2);
    var upsamplingAmount = 1;

    var levels = upsamplingAmount;
    while (levels > 0)
    {
        levels--;
        Dlib.PyramidUp(image);
    }

    var rects = faceDetector.Operator(image);

    foreach (var rectangle in rects)
        Dlib.DrawRectangle(original, rectangle, new RgbPixel(255, 0, 0));

    foreach (var t in rects)
    {
        var rect = pyr.RectDown(t, (uint)upsamplingAmount);
        Dlib.DrawRectangle(original, rect, new RgbPixel(0, 255, 0));
    }
    
    Dlib.SavePng(original, "result.png");
}

image

Thanks.

takuya-takeuchi avatar Aug 13 '22 11:08 takuya-takeuchi

Hi @takuya-takeuchi,

I replaced the manual calculation with the Pyramid Down usage, and it works perfectly.

Thanks for your help :)

joao-sgoncalves avatar Aug 19 '22 07:08 joao-sgoncalves