imgutils How do i run inference with multiple models without maxing out my GPU VRAM?

I'm trying to tag a dataset using more than one WD14 model, so i wrote a simple script that iterates all the files in a directory for every model in a list, like this

model_list = ['SwinV2', 'ConvNextV2', 'MOAT', 'ViT']

for m in model_list:
    for child in directory.glob('**/*'):
          ratings, features, chars = get_wd14_tags(child, model_name=m, general_threshold=thresh)

my problem is, after every loop in model_list, inference time increases a lot, for the first loop it takes ~0.15 seconds to extract the tags from each image, no matter how many images, but by the time i'm doing the 4th loop it takes 15 seconds. But if i run the script 4 times with a single model in the list, every model takes the same 0.15 seconds.

Running it with the task manager open, i noticed that every time it loads a new model, the dedicated VRAM used by python increases by ~1.5gb, so once i reach the third loop, my poor 970 doesn't have any more free memory so i guess it start using the system RAM and that's why it slows down.

Is there a way to free the VRAM before loading a new model? I tried looking in the ONNX documentation but it's way above my level of understanding.

I'm running it on Win10 / onnxruntime-gpu / cuda11.8

Nov 28 '23 00:11 HSHallucinations

by the time i'm doing the 4th loop it takes 15 seconds

Is this mean 15 secs per image? or something else?

Dec 04 '23 05:12 narugo1992

Running it with the task manager open, i noticed that every time it loads a new model, the dedicated VRAM used by python increases by ~1.5gb, so once i reach the third loop, my poor 970 doesn't have any more free memory so i guess it start using the system RAM and that's why it slows down.

I remember the Geforce GTX 790 has at least 12GB of VRAM, so it seems not to be related to VRAM.

Dec 04 '23 05:12 narugo1992

yes, it's 15 seconds per image once i max out the VRAM. Unfortunately the gtx970 has only 3.5 gb of VRAM, it's almot 10years old at this point, maybe you're thinking of some newer AMD card with a similar name

Dec 04 '23 14:12 HSHallucinations

Actually, we can release VRAM by clearing the cache. The source code is available here: https://github.com/deepghs/imgutils/blob/main/imgutils/tagging/wd14.py#L69

Here's how you can use it:

from imgutils.tagging.wd14 import _get_wd14_model

_get_wd14_model.cache_clear()

Once the cache is cleared, the previously loaded model will be released.

However, this method is currently just a workaround. A more suitable approach would be for us to provide a complete VRAM management layer in the future. This part has already been added to the todo list.

Dec 10 '23 06:12 narugo1992

Works perfectly for what i need to do, thanks for the help and also for writing this library, i spent months trying every commercial software with auto tagging but they were all too generic, while this does exactly what i wanted

Dec 17 '23 23:12 HSHallucinations