CompreFace icon indicating copy to clipboard operation
CompreFace copied to clipboard

Gateway timeout on 200k images

Open nagem07 opened this issue 3 years ago • 11 comments

Describe the bug After adding 200,000 images through Python SDK recognition service does not work anymore and returns 504 gateway timeout error.

To Reproduce Steps to reproduce the behavior:

  1. Add 200k images
  2. Go to UI and try recognition service

Expected behavior Person is recognised

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • Ubuntu 20.04
  • Chrome
  • CompreFace 1.0.0
  • RTX2060
  • 64GB RAM

Additional context Logs (Using external host as files are too big) https://www.mediafire.com/file/h2gap9jn3cja01i/DB.log/file https://www.mediafire.com/file/sl76j63by5f0xcl/Admin.log/file https://www.mediafire.com/file/d4jumgd0b609ssl/FE.log/file https://www.mediafire.com/file/ibvtf15kdvaq10l/API.log/file https://www.mediafire.com/file/41z5mttlld1us0n/Core.log/file

nagem07 avatar May 08 '22 06:05 nagem07

Hi, sorry for so long response, I was on a long vacation. According to your API.log file, java fails because it lacks memory. Did you update the .env file? There is such an option in the default configuration: compreface_api_java_options=-Xmx4g You should update it, e.g.: compreface_api_java_options=-Xmx16g

pospielov avatar May 25 '22 14:05 pospielov

Hey,

Yeah I did update the .env file and it fixed the issue, however even if I set it to 16gb on 180k images it shoots up to 28GB, correct me if I am wrong but if I was to use a million images we would be looking at 256 - 384GB of RAM?

nagem07 avatar May 26 '22 01:05 nagem07

yes, unfortunately, CompreFace is not optimized for such huge face collections. This will require different approaches which could be too heavy for small collections. So we had to choose which collection size to support and for now we chose small collections. I believe 100k is the max comfortable size for CompreFace, of course, it will work with bigger collections, but as you mentioned it would cost too many resources.

pospielov avatar May 26 '22 13:05 pospielov

I see you prioritized smaller collections. From your knowledge, what would be the steps to optimize CompreFace for bigger collections? I have also noticed very slow service initialization, on a 180k images. Meaning that after a reboot, system takes at least 30 minutes of time to load images back into the RAM. Could that be addressed as well?

nagem07 avatar Jun 01 '22 01:06 nagem07

To support bigger collections, we need to change the architecture. Now we store face embeddings in Postgres, and then load images into RAM in each compreface-api node to calculate face similarities. Ideally, we need to find a solution for storing and calculating similarities in one place. Furthermore, this place also should be scalable. I know about such a solution, this is a vector database Milvus, it basically does exactly what we need. But they are targeted for enterprise cloud solutions, as a result, their minimum requirements are 16G of RAM. Recommended to have 8 CPUs and 32G of RAM. This is not what most of our users expect from us.

I have also noticed very slow service initialization, on a 180k images. Meaning that after a reboot, system takes at least 30 minutes of time to load images back into the RAM. Could that be addressed as well?

I didn't expect this. For 50k images it takes like 1 minute to load. I'll create a task to check what it could be.

pospielov avatar Jun 01 '22 09:06 pospielov

Thank you for your answer and sorry for the delay in getting back to you. So having had a look at Milvus, I understand it will be situated before Postgres in the architecture, so embeddings will be saved in Milvus, while subject information would be on Postgres. Am I correct?

Having said that could you advise on which scripts will require changes in order to integrate Milvus? I am looking at above a million images, and using the current architecture has a high cost in terms of resources.

nagem07 avatar Jun 25 '22 06:06 nagem07

I didn't research it deeply. I think yes, it should be similar to your description. You need to replace this class, and probably lots of logic related to it with Milvus logic: https://github.com/exadel-inc/CompreFace/blob/master/java/api/src/main/java/com/exadel/frs/core/trainservice/component/classifiers/EuclideanDistanceClassifier.java

pospielov avatar Jun 30 '22 12:06 pospielov

What is the maxium practical amount of faces?

martinenkoEduard avatar Jul 04 '22 19:07 martinenkoEduard

It depends on what you mean by practical. I would recommend using not more than 100k-200k. But somebody can use CompreFace with more faces because buying hardware is often cheaper than buying software or paying for custom development.

pospielov avatar Jul 05 '22 09:07 pospielov

I didn't research it deeply. I think yes, it should be similar to your description. You need to replace this class, and probably lots of logic related to it with Milvus logic: https://github.com/exadel-inc/CompreFace/blob/master/java/api/src/main/java/com/exadel/frs/core/trainservice/component/classifiers/EuclideanDistanceClassifier.java

Would that be the sole class that requires modification or are there others as well?

nagem07 avatar Jul 06 '22 03:07 nagem07

Others as well, this is just where you can start discovering.

pospielov avatar Jul 06 '22 16:07 pospielov