ComfyUI_BiRefNet_ll icon indicating copy to clipboard operation
ComfyUI_BiRefNet_ll copied to clipboard

Segmentation running slow all of a sudden

Open dapa5900 opened this issue 1 year ago • 13 comments

Recently, the segmentation runs very slow. Before christmas the same image used to take under 1sec on my 4090, now at least 2-3sec and sometimes even over 15sec (it seems that is goes to cpu mode even though the vram is not full).

dapa5900 avatar Jan 08 '25 07:01 dapa5900

Recently, the segmentation runs very slow. Before christmas the same image used to take under 1sec on my 4090, now at least 2-3sec and sometimes even over 15sec (it seems that is goes to cpu mode even though the vram is not full).

Didn't notice the problem you mentioned, if it took that long, the model should have been cleared from the cache, please check if you are using other very memory intensive models

petercham avatar Jan 08 '25 13:01 petercham

It still persists here but only on the Rmbg and Rmbg Advance nodes. On the Getmask node with the Portrait model it is still the fastest segmentation of all the ones I tested with 0.5sec max.

Am Mi., 8. Jan. 2025 um 14:07 Uhr schrieb HJH_Chenhe < @.***>:

Recently, the segmentation runs very slow. Before christmas the same image used to take under 1sec on my 4090, now at least 2-3sec and sometimes even over 15sec (it seems that is goes to cpu mode even though the vram is not full).

Didn't notice the problem you mentioned, if it took that long, the model should have been cleared from the cache, please check if you are using other very memory intensive models

— Reply to this email directly, view it on GitHub https://github.com/lldacing/ComfyUI_BiRefNet_ll/issues/15#issuecomment-2577631431, or unsubscribe https://github.com/notifications/unsubscribe-auth/AR5YUKWBK7OJU7PJ4AFDR3T2JUPKZAVCNFSM6AAAAABUZHTN4SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNZXGYZTCNBTGE . You are receiving this because you authored the thread.Message ID: @.***>

dapa5900 avatar Jan 09 '25 07:01 dapa5900

It still persists here but only on the Rmbg and Rmbg Advance nodes. On the Getmask node with the Portrait model it is still the fastest segmentation of all the ones I tested with 0.5sec max. Am Mi., 8. Jan. 2025 um 14:07 Uhr schrieb HJH_Chenhe < @.>: Recently, the segmentation runs very slow. Before christmas the same image used to take under 1sec on my 4090, now at least 2-3sec and sometimes even over 15sec (it seems that is goes to cpu mode even though the vram is not full). Didn't notice the problem you mentioned, if it took that long, the model should have been cleared from the cache, please check if you are using other very memory intensive models — Reply to this email directly, view it on GitHub <#15 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AR5YUKWBK7OJU7PJ4AFDR3T2JUPKZAVCNFSM6AAAAABUZHTN4SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNZXGYZTCNBTGE . You are receiving this because you authored the thread.Message ID: @.>

image These three processes are equivalent. The difference from the previous one is that it is changed to fast-foreground-estimation to synthesize the image. You can use other nodes to mix the image and the mask instead.

这三种处理是等价的,与之前的差别是改成了fast-foreground-estimation来合成图片,你可以使用其它节点来混合图片和遮罩代替

lldacing avatar Jan 11 '25 11:01 lldacing

Thanks. Maybe that's the reason it's running so slow then...? Can you recommend a solution that yields similar results like yours? I use "ImageRemoveAlpha" from "LayerStyles"-custom nodes but it gives this Halo around the Edge (right side of the image) compared to what I get out of using your "RmbgByBiRefNet"-node directly (left side). Thank you.

halo

dapa5900 avatar Jan 14 '25 07:01 dapa5900

Thanks. Maybe that's the reason it's running so slow then...? Can you recommend a solution that yields similar results like yours? I use "ImageRemoveAlpha" from "LayerStyles"-custom nodes but it gives this Halo around the Edge (right side of the image) compared to what I get out of using your "RmbgByBiRefNet"-node directly (left side). Thank you.

halo

@dapa5900 I code a new node to reproduce the original version effect, you can add the node to file birefnetNode.py and test. I am not sure it works for you. If it works, I will merge it.

class GetForegroundImageSimple:

    @classmethod
    def INPUT_TYPES(cls):
        return {
            "required": {
                "image": ("IMAGE",),
                "mask": ("MASK", ),
            }
        }

    RETURN_TYPES = ("IMAGE",)
    RETURN_NAMES = ("image",)
    FUNCTION = "get_image"
    CATEGORY = "rembg/BiRefNet"

    def get_image(self, image, mask):
        # image.shape => (b, h, w, c)
        # mask.shape => (b, h, w)

        # You can open the code to see if it has any impact
        # mask = normalize_mask(mask)

        image = add_mask_as_alpha(image, mask)

        return image,


NODE_CLASS_MAPPINGS = {
    "AutoDownloadBiRefNetModel": AutoDownloadBiRefNetModel,
    "LoadRembgByBiRefNetModel": LoadRembgByBiRefNetModel,
    "RembgByBiRefNet": RembgByBiRefNet,
    "RembgByBiRefNetAdvanced": RembgByBiRefNetAdvanced,
    "GetMaskByBiRefNet": GetMaskByBiRefNet,
    "BlurFusionForegroundEstimation": BlurFusionForegroundEstimation,
    "GetForegroundImageSimple": GetForegroundImageSimple,
}

NODE_DISPLAY_NAME_MAPPINGS = {
    "AutoDownloadBiRefNetModel": "AutoDownloadBiRefNetModel",
    "LoadRembgByBiRefNetModel": "LoadRembgByBiRefNetModel",
    "RembgByBiRefNet": "RembgByBiRefNet",
    "RembgByBiRefNetAdvanced": "RembgByBiRefNetAdvanced",
    "GetMaskByBiRefNet": "GetMaskByBiRefNet",
    "BlurFusionForegroundEstimation": "BlurFusionForegroundEstimation",
    "GetForegroundImageSimple": "GetForegroundImageSimple",
}

image

lldacing avatar Jan 14 '25 09:01 lldacing

Thank you. It works but on a black background it gives the same result with the halo like the above-mentioned approach (although it's faster on my side, which is good :-)) Please find attached a workflow that compares these three approaches for an example input image

ModelBase_Girl_Wide_00027_ compare.json :

dapa5900 avatar Jan 14 '25 09:01 dapa5900

Thank you. It works but on a black background it gives the same result with the halo like the above-mentioned approach (although it's faster on my side, which is good :-)) Please find attached a workflow that compares these three approaches for an example input image

ModelBase_Girl_Wide_00027_ compare.json :

It looks like there is no difference with LayerStyle, I guess this is what fast-foreground-estimation solves.

lldacing avatar Jan 14 '25 10:01 lldacing

Alright, thanks for your efforts!

dapa5900 avatar Jan 14 '25 13:01 dapa5900

this node is slow because it runs on a CPU. if you fix it and add tensor translation to the device, it will work faster.

....
class BlurFusionForegroundEstimation:
.....
    def get_foreground(self, images, masks, blur_size=91, blur_size_two=7, fill_color=False, color=None):
     ......
        # (b, c, h, w)
        _image_masked = refine_foreground(image_bchw.to(deviceType), out_masks.to(deviceType), r1=blur_size, r2=blur_size_two)
        .....

Image

yrsolo avatar Feb 26 '25 16:02 yrsolo

this node is slow because it runs on a CPU. if you fix it and add tensor translation to the device, it will work faster.

....
class BlurFusionForegroundEstimation:
.....
    def get_foreground(self, images, masks, blur_size=91, blur_size_two=7, fill_color=False, color=None):
     ......
        # (b, c, h, w)
        _image_masked = refine_foreground(image_bchw.to(deviceType), out_masks.to(deviceType), r1=blur_size, r2=blur_size_two)
        .....

Image

In order to save VRAM, the calculation of the image will be moved on the CPU. You can use the node named 🔧 Image To Device of cubiq/ComfyUI_essentials to move it to the GPU.

lldacing avatar Feb 27 '25 09:02 lldacing

this node is slow because it runs on a CPU. if you fix it and add tensor translation to the device, it will work faster.

....
class BlurFusionForegroundEstimation:
.....
    def get_foreground(self, images, masks, blur_size=91, blur_size_two=7, fill_color=False, color=None):
     ......
        # (b, c, h, w)
        _image_masked = refine_foreground(image_bchw.to(deviceType), out_masks.to(deviceType), r1=blur_size, r2=blur_size_two)
        .....

Image

In order to save VRAM, the calculation of the image will be moved on the CPU. You can use the node named 🔧 Image To Device of cubiq/ComfyUI_essentials to move it to the GPU.

Image Maybe I should delete this, but the impact needs to be evaluated

lldacing avatar Feb 27 '25 09:02 lldacing

@lldacing > In order to save VRAM, the calculation of the image will be moved on the CPU. You can use the node named 🔧 Image To Device of cubiq/ComfyUI_essentials to move it to the GPU.

Image

Using this node, the following error occurs

Woukim avatar Apr 22 '25 12:04 Woukim

Can I suggest a more explicit device selection?

I noticed that the current implementation sometimes defaults to CPU when AUTO is selected, I suspect it somehow conflicts with ComfyUI-MultiGPU.

I kept the AUTO, CPU entries for backwards compatibility, and added the default ComfyUI torch device entry, this also allows to select a GPU in multi-GPU setup. (cuda only)

Unfortunately I have only tested it on CUDA setup, I have no access to other hardware.

https://github.com/lldacing/ComfyUI_BiRefNet_ll/compare/main...fAIseh00d:ComfyUI_BiRefNet_ll:main

fAIseh00d avatar Apr 23 '25 18:04 fAIseh00d