[FEAT] Free most VRAM possible before using local model

Open mattepiu opened this issue 6 months ago • 0 comments

✨ Is your feature request related to a problem?

Pipeline just keep VRAM even when using local vlm model, causing oom. This seems system issue, forcing to search lighter alternatives to Surya or use Cloud APIs.

💡 Describe the Solution You'd Like

VRAM is freed, at least most of it, before calling models like llm, OlmOCR or other VL/OCR models. If needed it can be reloaded afterwards.

📋 Alternatives Considered

Tried removing Surya, but gave issues, Cloud API aren't a solution when documentation is confidential.

Oct 20 '25 10:10 mattepiu