marker
marker copied to clipboard
[FEAT] Free most VRAM possible before using local model
✨ Is your feature request related to a problem?
Pipeline just keep VRAM even when using local vlm model, causing oom. This seems system issue, forcing to search lighter alternatives to Surya or use Cloud APIs.
💡 Describe the Solution You'd Like
VRAM is freed, at least most of it, before calling models like llm, OlmOCR or other VL/OCR models. If needed it can be reloaded afterwards.
📋 Alternatives Considered
Tried removing Surya, but gave issues, Cloud API aren't a solution when documentation is confidential.