EasyOCR CUDA Out Of Memory Error - None of the Solutions Working

I am trying to scale EasyOCR for a use-case where I might need to process multiple requests of OCR from an image simultaneously. But after about 7 to 8 requests, I run into a CUDA Out Of Memory Error. The Specifications of my system are:

16G CUDA supported GPU
8 vCPUs

Here, after analysing the system usage, I found out that with each image, the memory usage in CUDA is increasing by a whooping 2G, which seems to be to much, as even the model size while loading is about 1.5G.

I even did a comparison of images based on its size to understand the memory increase. For an image with 1700 x 2200 size and 96 DPI, the increase is 2G, while for an image with 463 x 1013 size and 150DPI, (it was a sample invoice) the increase is about 265M which is still a lot considering the text in it. So, is this a normal memory increase spike? Or am I doing something wrong?

The code skeleton I am using is something like this

import easyocr
import torch

reader = easyocr.Reader(['en'], gpu=True)

def make_inference(image):
    '''This function is called multiple times simultaneously according to the number of simultaneous requests'''
    result = reader.readtext(image)
    torch.cuda.empty_cache()  # To clear out the residual cache
    torch.cuda.reset_peak_memory_stats()  # Does not have any effect on memory
    torch.cuda.reset_accumulated_memory_stats()   # Does not have any effect on memory
    return result

I even tried clearing the residual cache, but for simultaneous calls, it is failing in case all images are read together. So, my question is, if this is not a problem from the EasyOCR side, is there any workaround I can use to be able to process more requests simultaneously? Please note that I need to use GPU here as inference time also matters for me. I even tried creating a flask API and creating a Gunicorn server without any success. I also tried modifying the source code to clear CUDA cache before and after the inference calls, but this did not work as well. Also, loading the model in the inference function and deleting it after that would not be a viable solution for me, as time is also a constraint for my use-case. Would appreciate any help for the same. Thanks.

Jul 04 '22 06:07 Rutvik-Trivedi

had exact same issue... easyocr eats lots of my GPU memories even with very low QPS/small inputs. @Rutvik-Trivedi wondering if you resolved the issue by any chance?

Aug 23 '22 16:08 hgong-snap

Sorry. No resolution from my side other than clearing out cuda cache. Had to go with a different OCR for my use case.

Aug 23 '22 16:08 Rutvik-Trivedi

What OCR did you go with? Is it open source?

Aug 23 '22 17:08 mrg0lden