Segmentation running slow all of a sudden
Recently, the segmentation runs very slow. Before christmas the same image used to take under 1sec on my 4090, now at least 2-3sec and sometimes even over 15sec (it seems that is goes to cpu mode even though the vram is not full).
Recently, the segmentation runs very slow. Before christmas the same image used to take under 1sec on my 4090, now at least 2-3sec and sometimes even over 15sec (it seems that is goes to cpu mode even though the vram is not full).
Didn't notice the problem you mentioned, if it took that long, the model should have been cleared from the cache, please check if you are using other very memory intensive models
It still persists here but only on the Rmbg and Rmbg Advance nodes. On the Getmask node with the Portrait model it is still the fastest segmentation of all the ones I tested with 0.5sec max.
Am Mi., 8. Jan. 2025 um 14:07 Uhr schrieb HJH_Chenhe < @.***>:
Recently, the segmentation runs very slow. Before christmas the same image used to take under 1sec on my 4090, now at least 2-3sec and sometimes even over 15sec (it seems that is goes to cpu mode even though the vram is not full).
Didn't notice the problem you mentioned, if it took that long, the model should have been cleared from the cache, please check if you are using other very memory intensive models
— Reply to this email directly, view it on GitHub https://github.com/lldacing/ComfyUI_BiRefNet_ll/issues/15#issuecomment-2577631431, or unsubscribe https://github.com/notifications/unsubscribe-auth/AR5YUKWBK7OJU7PJ4AFDR3T2JUPKZAVCNFSM6AAAAABUZHTN4SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNZXGYZTCNBTGE . You are receiving this because you authored the thread.Message ID: @.***>
It still persists here but only on the Rmbg and Rmbg Advance nodes. On the Getmask node with the Portrait model it is still the fastest segmentation of all the ones I tested with 0.5sec max. Am Mi., 8. Jan. 2025 um 14:07 Uhr schrieb HJH_Chenhe < @.>: … Recently, the segmentation runs very slow. Before christmas the same image used to take under 1sec on my 4090, now at least 2-3sec and sometimes even over 15sec (it seems that is goes to cpu mode even though the vram is not full). Didn't notice the problem you mentioned, if it took that long, the model should have been cleared from the cache, please check if you are using other very memory intensive models — Reply to this email directly, view it on GitHub <#15 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AR5YUKWBK7OJU7PJ4AFDR3T2JUPKZAVCNFSM6AAAAABUZHTN4SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNZXGYZTCNBTGE . You are receiving this because you authored the thread.Message ID: @.>
These three processes are equivalent. The difference from the previous one is that it is changed to fast-foreground-estimation to synthesize the image. You can use other nodes to mix the image and the mask instead.
这三种处理是等价的,与之前的差别是改成了fast-foreground-estimation来合成图片,你可以使用其它节点来混合图片和遮罩代替
Thanks. Maybe that's the reason it's running so slow then...? Can you recommend a solution that yields similar results like yours? I use "ImageRemoveAlpha" from "LayerStyles"-custom nodes but it gives this Halo around the Edge (right side of the image) compared to what I get out of using your "RmbgByBiRefNet"-node directly (left side). Thank you.
Thanks. Maybe that's the reason it's running so slow then...? Can you recommend a solution that yields similar results like yours? I use "ImageRemoveAlpha" from "LayerStyles"-custom nodes but it gives this Halo around the Edge (right side of the image) compared to what I get out of using your "RmbgByBiRefNet"-node directly (left side). Thank you.
@dapa5900 I code a new node to reproduce the original version effect, you can add the node to file birefnetNode.py and test. I am not sure it works for you. If it works, I will merge it.
class GetForegroundImageSimple:
@classmethod
def INPUT_TYPES(cls):
return {
"required": {
"image": ("IMAGE",),
"mask": ("MASK", ),
}
}
RETURN_TYPES = ("IMAGE",)
RETURN_NAMES = ("image",)
FUNCTION = "get_image"
CATEGORY = "rembg/BiRefNet"
def get_image(self, image, mask):
# image.shape => (b, h, w, c)
# mask.shape => (b, h, w)
# You can open the code to see if it has any impact
# mask = normalize_mask(mask)
image = add_mask_as_alpha(image, mask)
return image,
NODE_CLASS_MAPPINGS = {
"AutoDownloadBiRefNetModel": AutoDownloadBiRefNetModel,
"LoadRembgByBiRefNetModel": LoadRembgByBiRefNetModel,
"RembgByBiRefNet": RembgByBiRefNet,
"RembgByBiRefNetAdvanced": RembgByBiRefNetAdvanced,
"GetMaskByBiRefNet": GetMaskByBiRefNet,
"BlurFusionForegroundEstimation": BlurFusionForegroundEstimation,
"GetForegroundImageSimple": GetForegroundImageSimple,
}
NODE_DISPLAY_NAME_MAPPINGS = {
"AutoDownloadBiRefNetModel": "AutoDownloadBiRefNetModel",
"LoadRembgByBiRefNetModel": "LoadRembgByBiRefNetModel",
"RembgByBiRefNet": "RembgByBiRefNet",
"RembgByBiRefNetAdvanced": "RembgByBiRefNetAdvanced",
"GetMaskByBiRefNet": "GetMaskByBiRefNet",
"BlurFusionForegroundEstimation": "BlurFusionForegroundEstimation",
"GetForegroundImageSimple": "GetForegroundImageSimple",
}
Thank you. It works but on a black background it gives the same result with the halo like the above-mentioned approach (although it's faster on my side, which is good :-)) Please find attached a workflow that compares these three approaches for an example input image
Thank you. It works but on a black background it gives the same result with the halo like the above-mentioned approach (although it's faster on my side, which is good :-)) Please find attached a workflow that compares these three approaches for an example input image
It looks like there is no difference with LayerStyle, I guess this is what fast-foreground-estimation solves.
Alright, thanks for your efforts!
this node is slow because it runs on a CPU. if you fix it and add tensor translation to the device, it will work faster.
....
class BlurFusionForegroundEstimation:
.....
def get_foreground(self, images, masks, blur_size=91, blur_size_two=7, fill_color=False, color=None):
......
# (b, c, h, w)
_image_masked = refine_foreground(image_bchw.to(deviceType), out_masks.to(deviceType), r1=blur_size, r2=blur_size_two)
.....
this node is slow because it runs on a CPU. if you fix it and add tensor translation to the device, it will work faster.
.... class BlurFusionForegroundEstimation: ..... def get_foreground(self, images, masks, blur_size=91, blur_size_two=7, fill_color=False, color=None): ...... # (b, c, h, w) _image_masked = refine_foreground(image_bchw.to(deviceType), out_masks.to(deviceType), r1=blur_size, r2=blur_size_two) .....
In order to save VRAM, the calculation of the image will be moved on the CPU. You can use the node named 🔧 Image To Device of cubiq/ComfyUI_essentials to move it to the GPU.
this node is slow because it runs on a CPU. if you fix it and add tensor translation to the device, it will work faster.
.... class BlurFusionForegroundEstimation: ..... def get_foreground(self, images, masks, blur_size=91, blur_size_two=7, fill_color=False, color=None): ...... # (b, c, h, w) _image_masked = refine_foreground(image_bchw.to(deviceType), out_masks.to(deviceType), r1=blur_size, r2=blur_size_two) .....
In order to save VRAM, the calculation of the image will be moved on the CPU. You can use the node named
🔧 Image To Deviceof cubiq/ComfyUI_essentials to move it to the GPU.
Maybe I should delete this, but the impact needs to be evaluated
@lldacing > In order to save VRAM, the calculation of the image will be moved on the CPU. You can use the node named 🔧 Image To Device of cubiq/ComfyUI_essentials to move it to the GPU.
Using this node, the following error occurs
Can I suggest a more explicit device selection?
I noticed that the current implementation sometimes defaults to CPU when AUTO is selected, I suspect it somehow conflicts with ComfyUI-MultiGPU.
I kept the AUTO, CPU entries for backwards compatibility, and added the default ComfyUI torch device entry, this also allows to select a GPU in multi-GPU setup. (cuda only)
Unfortunately I have only tested it on CUDA setup, I have no access to other hardware.
https://github.com/lldacing/ComfyUI_BiRefNet_ll/compare/main...fAIseh00d:ComfyUI_BiRefNet_ll:main
