EasyOCR icon indicating copy to clipboard operation
EasyOCR copied to clipboard

Fix: Resolves memory leak caused by using CRAFT detector with detect() or readtext().

Open daniellovera opened this issue 1 year ago • 15 comments

This fix enables garbage collection to appropriately work when https://github.com/JaidedAI/EasyOCR/blob/c999505ef6b43be1c4ee36aa04ad979175178352/easyocr/detection.py#L24 returns, by deleting the objects we moved to the GPU after we move the forward pass results back on the CPU.

See https://pytorch.org/blog/understanding-gpu-memory-2/#why-doesnt-automatic-garbage-collection-work for more detail.

Running torch.cuda.empty_cache() in test_net() before returning allows nvidia-smi to be accurate.

Interestingly, nvidia-smi showed that GPU memory usage per process was 204MiB upon reader initialization, and then would increase to 234MiB or 288MiB after running easyocr.reader.detect(), but then not increase beyond that point and in some cases reduce back down to 234MiB. I think this has something to do with

One note is that I tested this on a single GPU machine where I changed https://github.com/JaidedAI/EasyOCR/blob/c999505ef6b43be1c4ee36aa04ad979175178352/easyocr/detection.py#L86 to be net = net.to(device), removing DataParallel. There's no reason this shouldn't work on multi-GPU machines, but noting it wasn't tested on one.

I also only tested this on the CRAFT detector, not DBNet.

Relevant package versions easyocr version 1.7.1 torch version 2.2.1+cu121 torchvision 0.17.1+cu121

Hope this helps!

daniellovera avatar Jul 10 '24 20:07 daniellovera

I should clarify, this resolves GPU vRAM memory leaks. It's not resolving the CPU RAM memory leaks.

daniellovera avatar Jul 14 '24 17:07 daniellovera

Corrected to only call empty_cache() if the device in use is cuda.

daniellovera avatar Jul 16 '24 04:07 daniellovera

The del stuff can't possibly work. It just removes the Python variable from the scope (the function) but doesn't actually remove anything from the GPU/CPU

jonashaag avatar Aug 07 '24 12:08 jonashaag

The del stuff can't possibly work. It just removes the Python variable from the scope (the function) but doesn't actually remove anything from the GPU/CPU

@jonashaag did you attempt to replicate my results? It'll take you less than 15 minutes to give it a whirl and prove if it's possible or not.

Because it did work for me, and the pytorch.org blog post I linked provides the reasoning for exactly why it does work. I'll quote here:

Why doesn’t automatic garbage collection work? The automatic garbage collection works well when there is a lot of extra memory as is common on CPUs because it amortizes the expensive garbage collection by using Generational Garbage Collection. But to amortize the collection work, it defers some memory cleanup making the maximum memory usage higher, which is less suited to memory constrained environments. The Python runtime also has no insights into CUDA memory usage, so it cannot be triggered on high memory pressure either. It’s even more challenging as GPU training is almost always memory constrained because we will often raise the batch size to use any additional free memory.

The CPython’s garbage collection frees unreachable objects held in reference cycles via the mark-and-sweep. The garbage collection is automatically run when the number of objects exceeds certain thresholds. There are 3 generations of thresholds to help amortize the expensive costs of running garbage collection on every object. The later generations are less frequently run. This would explain why automatic collections will only clear several tensors on each peak, however there are still tensors that leak resulting in the CUDA OOM. Those tensors were held by reference cycles in later generations.

I'm not going to claim that I think it SHOULD work this way. But this isn't the first time some weird garbage collection and scoping issues across CPU/GPUs caused issues.

Again, try it and let us all know if it's actually working for you or not.

daniellovera avatar Aug 07 '24 19:08 daniellovera

Sorry, maybe I misunderstood the reason why del is used here. Is it so that the call to empty_cache() can remove the tensors x, y, feature from GPU memory? That might work unless there are other references to the tensors that those variables reference.

jonashaag avatar Aug 07 '24 20:08 jonashaag

Sorry, maybe I misunderstood the reason why del is used here. Is it so that the call to empty_cache() can remove the tensors x, y, feature from GPU memory? That might work unless there are other references to the tensors that those variables reference.

I don't think I understand it well enough to explain it better. I also call torch.empty_cache() and torch.cuda.reset_peak_memory_stats() after the function returns. It's possible that the empty_cache() call inside the function isn't actually doing anything since the GC doesn't run until the function goes out of scope - I probably should have double checked that but I was less concerned with nvidia-smi being accurate as I was not getting CUDA OOM errors.

I'm far from an expert, but I do know that these changes resulted in halting the memory leaks I had, and I haven't had a CUDA OOM error since.

Best suggestion is that since action produces information, you give it a whirl and let us know if it works. If it doesn't work for you, then that's valuable for me to know how your machine is different than mine, so I can make further changes to avoid getting these errors again if I scale-up or swap machines.

daniellovera avatar Aug 07 '24 23:08 daniellovera

@jonashaag Hey, I'd love to know if del worked if you tried it.

daniellovera avatar Aug 13 '24 19:08 daniellovera

Sorry, I've switched to another engine (macOS Live Text) because it's better and much faster.

I feel a bit bad to have left such a smart-ass comment initially and not contribute anything of substance here :-/

jonashaag avatar Aug 13 '24 20:08 jonashaag

It's all good. Are you using Live Text natively on the devices or can it be hosted in a way that allows it to replace EasyOCR for serving a website that's not on an Apple device?

daniellovera avatar Aug 14 '24 20:08 daniellovera

Yes we run a Mac mini in production (via Scaleway)

If you are interested I can share some code

jonashaag avatar Aug 15 '24 04:08 jonashaag

Thanks! I was able to reproduce and your fix works, it took me a while to figure out this issue, can we merge this PR asap and bump the version of EasyOCR (for now I just applied a local fix)?

BMukhtar avatar Sep 13 '24 13:09 BMukhtar

Any news regarding the CRAFT-related leak ? I am noticing leaks either ways, when using gpu or not. Running on macos 14.5 here, at every run the memory usage increases by 10 to 100 megs, which is crazy.. how do people run easyocr in production ?
If you try to increase mag_ratio to a value > 1, the leak gets much more obvious..

msciancalepore98 avatar Nov 19 '24 14:11 msciancalepore98

Any news regarding the CRAFT-related leak ? I am noticing leaks either ways, when using gpu or not. Running on macos 14.5 here, at every run the memory usage increases by 10 to 100 megs, which is crazy.. how do people run easyocr in production ? If you try to increase mag_ratio to a value > 1, the leak gets much more obvious..

This resolves the GPU memory leak. I didn't test if it resolves the CPU memory leak, so I don't know conclusively if it fixes the CPU memory leak. You could try it and let us know if it fixes the CPU memory leak.

daniellovera avatar Nov 19 '24 22:11 daniellovera

I have just applied this patch and tested for CPU only detection. It does not resolve a memory leak in that context

nickb937 avatar Jan 28 '25 10:01 nickb937

Hey @rkcosmos are you able to take a look at this, please? I ran a small test with Docling processing a handful of PDFs using CUDA (A5000) and without this patch DocumentConverter (from Docling) consumes 4 GB from the get-go and slowly grows the VRAM footprint. With this patch by @daniellovera my code starts at ~1GB of VRAM and stays mostly flat, with some spikes depending on the input PDF, but always coming back to ~1GB of VRAM consumption.

vitorfalcaor avatar Jun 11 '25 02:06 vitorfalcaor