CompreFace Gateway timeout on 200k images

Describe the bug After adding 200,000 images through Python SDK recognition service does not work anymore and returns 504 gateway timeout error.

To Reproduce Steps to reproduce the behavior:

Add 200k images
Go to UI and try recognition service

Expected behavior Person is recognised

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Ubuntu 20.04
Chrome
CompreFace 1.0.0
RTX2060
64GB RAM

Additional context Logs (Using external host as files are too big) https://www.mediafire.com/file/h2gap9jn3cja01i/DB.log/file https://www.mediafire.com/file/sl76j63by5f0xcl/Admin.log/file https://www.mediafire.com/file/d4jumgd0b609ssl/FE.log/file https://www.mediafire.com/file/ibvtf15kdvaq10l/API.log/file https://www.mediafire.com/file/41z5mttlld1us0n/Core.log/file

May 08 '22 06:05 nagem07

Hi, sorry for so long response, I was on a long vacation. According to your API.log file, java fails because it lacks memory. Did you update the .env file? There is such an option in the default configuration: compreface_api_java_options=-Xmx4g You should update it, e.g.: compreface_api_java_options=-Xmx16g

May 25 '22 14:05 pospielov

Hey,

Yeah I did update the .env file and it fixed the issue, however even if I set it to 16gb on 180k images it shoots up to 28GB, correct me if I am wrong but if I was to use a million images we would be looking at 256 - 384GB of RAM?

May 26 '22 01:05 nagem07

yes, unfortunately, CompreFace is not optimized for such huge face collections. This will require different approaches which could be too heavy for small collections. So we had to choose which collection size to support and for now we chose small collections. I believe 100k is the max comfortable size for CompreFace, of course, it will work with bigger collections, but as you mentioned it would cost too many resources.

May 26 '22 13:05 pospielov

I see you prioritized smaller collections. From your knowledge, what would be the steps to optimize CompreFace for bigger collections? I have also noticed very slow service initialization, on a 180k images. Meaning that after a reboot, system takes at least 30 minutes of time to load images back into the RAM. Could that be addressed as well?

Jun 01 '22 01:06 nagem07

To support bigger collections, we need to change the architecture. Now we store face embeddings in Postgres, and then load images into RAM in each compreface-api node to calculate face similarities. Ideally, we need to find a solution for storing and calculating similarities in one place. Furthermore, this place also should be scalable. I know about such a solution, this is a vector database Milvus, it basically does exactly what we need. But they are targeted for enterprise cloud solutions, as a result, their minimum requirements are 16G of RAM. Recommended to have 8 CPUs and 32G of RAM. This is not what most of our users expect from us.

I have also noticed very slow service initialization, on a 180k images. Meaning that after a reboot, system takes at least 30 minutes of time to load images back into the RAM. Could that be addressed as well?

I didn't expect this. For 50k images it takes like 1 minute to load. I'll create a task to check what it could be.

Jun 01 '22 09:06 pospielov

Thank you for your answer and sorry for the delay in getting back to you. So having had a look at Milvus, I understand it will be situated before Postgres in the architecture, so embeddings will be saved in Milvus, while subject information would be on Postgres. Am I correct?

Having said that could you advise on which scripts will require changes in order to integrate Milvus? I am looking at above a million images, and using the current architecture has a high cost in terms of resources.

Jun 25 '22 06:06 nagem07

I didn't research it deeply. I think yes, it should be similar to your description. You need to replace this class, and probably lots of logic related to it with Milvus logic: https://github.com/exadel-inc/CompreFace/blob/master/java/api/src/main/java/com/exadel/frs/core/trainservice/component/classifiers/EuclideanDistanceClassifier.java

Jun 30 '22 12:06 pospielov

What is the maxium practical amount of faces?

Jul 04 '22 19:07 martinenkoEduard

It depends on what you mean by practical. I would recommend using not more than 100k-200k. But somebody can use CompreFace with more faces because buying hardware is often cheaper than buying software or paying for custom development.

Jul 05 '22 09:07 pospielov

I didn't research it deeply. I think yes, it should be similar to your description. You need to replace this class, and probably lots of logic related to it with Milvus logic: https://github.com/exadel-inc/CompreFace/blob/master/java/api/src/main/java/com/exadel/frs/core/trainservice/component/classifiers/EuclideanDistanceClassifier.java

Would that be the sole class that requires modification or are there others as well?

Jul 06 '22 03:07 nagem07

Others as well, this is just where you can start discovering.

Jul 06 '22 16:07 pospielov