BiRefNet Finetuning with small dataset

First of all, thank you for your great project. I want to ask you for some recommendation about finetuning with a small dataset (Around 400 images), my problem is main car segmentation (Only segment one car even if the image contains multiple cars, the main car is the biggest one and in the middle of the image)

Which layers should i freeze?
What learning rate should i start with?
Because the car segmentation is pretty simple, should i turn off any loss components?
Do you have any idea of how many images is pretty enough for training? Due to the limit of resources so i can't try all these thing, your suggestion will be really meaningful to me.

May 12 '24 09:05 LeDuySon

Model: Since you have only a little data, I suggest using a smaller model, e.g., choose the swin_v1_tiny as the backbone in config.py (remember to put the weights of backbone in the right place).
Freezing layers: I'm not sure whether freezing some layers can help the training. But if you want, you can turn on the freeze_bb option in config.py to easily freeze the layers of the backbone.
Loss: You can turn off the ssim loss in config.py since it benefits segmentation in fine regions, which is unnecessary in your case. In my experience, IoU loss converges much faster but decreases in accuracy. But if you want to see the results faster, you can leave it on only.
If you do not have extra data, you can split 40 images for validation.

If you have further questions, feel free to leave messages :)

May 12 '24 13:05 ZhengPeng7

Thank you so much for your detailed answer.

About the swin_v1_tiny, can we have a massive dataset training with this one? I found the massive training one is much better in general case compare to the one that trained on only one dataset.

About the loss function, i think about it again because my input image can sometime be like this, kinda complex so i will need to try it myself, do you have any recommendation about where to hire gpu?

May 13 '24 03:05 LeDuySon

Thanks for your feedback! However, it takes a lot of time to do the massive training, even with swin-tiny, which I haven't started. I can only say I might spare my own and GPUs' time for it in the future.

If your cases are similar to the image above, I recommend using the default settings of losses in my project.

About renting GPUs, I personally recommend those on autodl, which is the cheapest platform I used. But if you are not in China (you know there are firewalls to block people from things like Google), I recommend finding some GPUs on vast.ai. BTW, if you want to use the default training setting (bs=2, bb=Swin-L), you need GPUs with more than 37G memory. If you want to train with swin-tiny, you can use GPUs with 24G memory. Batch size is better to be larger than 1 (I've tested full training with bs=1).

May 13 '24 07:05 ZhengPeng7

Thank you! Have you tried to export this model to Onnx? I plan to deploy this model to triton inference server latter after the training so if you have not, maybe i will try to do it and get back to you

May 13 '24 08:05 LeDuySon

Sorry, I haven't done this kind of thing. But if you encounter some problems while doing the deployment, which you think I may know about, feel free to leave messages here. Good luck!

May 13 '24 09:05 ZhengPeng7

Thank you!

May 14 '24 15:05 LeDuySon

Hi @ZhengPeng7 , i know this one is not related to this discussion but i can't load the BiRefNet_DIS_ep500-swin_v1_tiny anymore? Do you know why? I have changed the backbone in config to swin_v1_t but when loading the checkpoint, it just shows mismatched between many layers

May 17 '24 03:05 LeDuySon

There were some differences between the previous codes and the descriptions in the paper in terms of model architecture. I made the modifications so that they are 100% correct with each other now. I'll try to train a swin-tiny version in the massive training setting. I'll reply to you once it's done.

May 17 '24 05:05 ZhengPeng7

Thanks man!

May 17 '24 15:05 LeDuySon

Feel free to reopen it if you have any more question.

May 25 '24 12:05 ZhengPeng7

@ZhengPeng7 I have finished my training on tiny version with around 1500 annotated images. Thank for your suggestions, i want to ask about some ideas to improve the current model. Current model have some problems attached in the images below

The contour around the object have some weird artifact? Do you have any idea to fix this, i'm thinking is due to the mask resize when inference
Some images have multiple cars but i only want to segment the main car (the car in the middle of the images) so sometime it will segment the car nearby too

Questions:

Do you have any idea or reference to handle these cases?
Why you only apply the flip augmentation to the dataset?
Any suggestion on the hyperparameter setting for full finetuning the tiny version? The learning rate, loss setting

Thank you!

Jul 31 '24 13:07 LeDuySon

Hi, @LeDuySon,

I'm not very sure about the reason, but the one you provided makes sense. Besides, I don't know the sizes of your images. If these images are much larger than 1024x1024, the artifacts might be from the downsampling of the input. If not, they might come from the resizing after the prediction. In the second case, using some methods like SR may work better here.
You may try: use some methods like instance segmentation and remove regions of instances with low IoU.
In current default setting, four data aug methods are applied as given in this line. But I didn't have enough efforts and GPUs to conduct careful ablation study on each of them.
About fine-tuning the tiny version, I would provide suggestions the same as here. But in your case, where segmentation inside the cars is easier, I guess improving the IoU weight in the total loss may bring some improvement.

Aug 01 '24 03:08 ZhengPeng7

Hi @ZhengPeng7 , Sorry for late reply and thank you for your suggestion!

Yes, my image is bigger compare to the model size (some images even 4k) so i think thats the reason. Do you have any idea to mitigate this problem?
I have tried with instance segmentation but the mask is not really good, maybe i can do both instance segmentation and your model but the inference time will be much higher. And i just checked you box guided segmentation, maybe i will have a look on it because the main car always in the center of the image
Do you have any suggestions for augmentation in my cases? I think i will try to add more augmentation and experiment with it
Thank you, i will try that

Aug 10 '24 07:08 LeDuySon

Hi, @LeDuySon, here are my suggestions:

Simply increase the resolution and try fine-tuning on your 4k images.
Check my box-guided segmentation colab demo given in the README. Use your instance seg model to crop the boxes of instances and do the inference on the box image which may be resized to 1024x1024 or 4k.
Replace the background could possibly improve the performance. I personally recommend using the BG-20k dataset as the background images for a try.

You're welcome :)

Aug 10 '24 09:08 ZhengPeng7

Hi，Great creator, I would like to ask, when predicting, is it necessary to resize to 1024x1024 to get better results, because I noticed that the actual input image size, aspect ratio is not fixed when scaling is not etc. Will the segmentation accuracy be affected when scaling is performed?

Aug 21 '24 08:08 cyy-1234

Hi, because the size set in training was 1024x1024, it should be the best one for prediction. Also, your idea is very good, and I actually tried it before haha. I tried to adapt the shorter side of the image to 1024 and keep its hei/wid ratio. But in the results, the performance is similar to that with 1024x1024, and the occupation of GPU memory during inference would be much higher.

The distortion also exists in training, so I think we can take it easy. I also have some other ideas on this problem in my mind, but it may need some time to conduct experiments. You'll see them in my next work if I continue this task.

Aug 21 '24 11:08 ZhengPeng7