recognize icon indicating copy to clipboard operation
recognize copied to clipboard

Face clustering stops while face recognition finishes

Open JoeHardi opened this issue 11 months ago • 15 comments

Which version of recognize are you using?

8.2.0

Enabled Modes

Face recognition

TensorFlow mode

Normal mode

Downstream App

Memories App

Which Nextcloud version do you have installed?

Nextcloud Hub 9 (30.0.5)

Which Operating system do you have installed?

ZorinOS / NixOS

Which database are you running Nextcloud on?

MySQL

Which Docker container are you using to run Nextcloud? (if applicable)

No response

How much RAM does your server have?

32GB

What processor Architecture does your CPU have?

x86x64

Describe the Bug

Within face recognition, face clustering stopps, as it seems randonly after a while. Face recognition itself finished operation but face clustering says 6.652 faces left to cluster; Scheduled background jobs: 0. It remains that way.

Image

Expected Behavior

The run face clustering till the end and start again after new pictures appear.

Image

To Reproduce

Resets do not solve the problem. After a while, it stops again.

Debug log

No response

JoeHardi avatar Feb 01 '25 12:02 JoeHardi

Hi @JoeHardi Can you check the nextcloud.log file for errors?

marcelklehr avatar Feb 03 '25 16:02 marcelklehr

(Sysadmin of the aforementioned server here)

The nextcloud.log didn't contain anything that caught my eye, so I did some further research and noticed that almost all of the background jobs seem to be missing:

[I] root@aidoskyneen /v/l/nextcloud# nextcloud-occ background-job:list | grep Recognize
| 444912 | OCA\Recognize\BackgroundJobs\MaintenanceJob                       | 2025-02-05T17:47:08+00:00 | null

So the issue probably is that the clustering background job simply doesn't run - and the initial face clustering seems to have happened because we manually ran nextcloud-occ recognize:cluster-faces.

Is there some way to manually re-register the mssing background jobs?

cyclic-pentane avatar Feb 05 '25 18:02 cyclic-pentane

I have the same issue. Just finished classifying a day ago and have 38000 waiting to be clustered. occ recognize:cluster-faces does not do anything either.

Kdubs937 avatar Feb 08 '25 15:02 Kdubs937

mmh

Is there some way to manually re-register the mssing background jobs?

That's currently not possible I think.

occ recognize:cluster-faces does not do anything either.

Why is that? What happens when you run it?

marcelklehr avatar Feb 08 '25 16:02 marcelklehr

command sudo docker exec -u www-data nextcloud-app-1 php occ recognize:cluster-faces

result

ClusterDebug: Retrieving face detections for user User1
ClusterDebug: Not enough face detections found
Clustering face detections for user User 2
ClusterDebug: Retrieving face detections for user User 2
ClusterDebug: Not enough face detections found
Clustering face detections for user User 3
ClusterDebug: Retrieving face detections for user User 3
ClusterDebug: Found 13340 fresh detections. Adding 0 old detections and 0 sampled detections from already existing clusters. Calculating clusters on 13340 detections

Now that I ran it again it is skipping 2 users. It was not skipping them last night when I ran it and it did classify all of their files. User 1 and User 2 do not have any pictures to scan.

Image

A little background,

This has worked fine for a couple years. I recently had a database issue where something was opening connections and not closing connections to the database. Resulting in nextcloud crashing after about 12 hours. I troubleshot this to the Recognize app, Memories App, or Face Recognition APP. Uninstalled all three, droped the tables from the database. (Mistake) After that nextcloud would not work at all. I had to manually reinstall all three apps to get the tables back. Then removed them using the App store. Then reinstalled recognize and got that set up. It scanned and classified all of the file but wont cluster for some reason.

Another question is how and which tables to remove from the the database for a complete uninstallation?

Kdubs937 avatar Feb 08 '25 17:02 Kdubs937

And it ends with this message? It may be running out of RAM, because there's a lot of face detections to cluster. I always recommend to use the cluster-faces command with the batch size parameter set to 10000 or even lower to allow it to fit in RAM.

marcelklehr avatar Feb 08 '25 17:02 marcelklehr

yes then nothing further happens. I just tried with batch size 500 same result except this time not sure why but under the video tagging i now have

There are queued files but no background job is scheduled to process them.

That did not pop up until i ran the above command and it worked just this morning. Not sure if they are related or not.

Kdubs937 avatar Feb 08 '25 17:02 Kdubs937

I turned off GPU mode and the clustering started working with that command. The video also started working and scheduling background jobs. No background jobs are scheduled though for clustering. Is there a way to add that? The others schedule background jobs as soon as a picture or video is uploaded. The GPU ran all night last and completed the whole classify process but does not work this morning.

Kdubs937 avatar Feb 08 '25 17:02 Kdubs937

yes then nothing further happens. I just tried with batch size 500 same result

Well, clustering takes time, how long did you wait?

marcelklehr avatar Feb 08 '25 17:02 marcelklehr

I turned off GPU mode and the clustering started working with that command.

That's peculiar. Clustering doesn't use GPU

marcelklehr avatar Feb 08 '25 18:02 marcelklehr

It was about 20 minutes that I waited and nothing changed. But its working now and faces are showing with GPU off. Weird thing is now it wont detect my GPU if I turn it back on. Which is fine. Now that the initial classification is done I don't really need to go through the hassle of troubleshooting that. It was hard enough just to get working the first time. I will give it a couple days and see if the background job starts working. If it does not i will just make a cron job that runs that command until I hear if this is a recognize issue or just a me issue.

Kdubs937 avatar Feb 08 '25 18:02 Kdubs937

The MaintenanceJob that runs every 12h should create missing ClusterFacesJobs actually.

marcelklehr avatar Feb 08 '25 18:02 marcelklehr

It was about 20 minutes that I waited and nothing changed.

That's likely not long enough, clustering takes quite some time.

marcelklehr avatar Feb 08 '25 18:02 marcelklehr

The MaintenanceJob that runs every 12h should create missing ClusterFacesJobs actually.

@alyaeanyx If it doesn't you should have errors about that in your nextcloud log.

marcelklehr avatar Feb 08 '25 18:02 marcelklehr

Ok thank you for the help! Just to note when i run the job manually I can see the numbers go down in the admin settings pretty instantly if that is what you were referring to.

Kdubs937 avatar Feb 08 '25 18:02 Kdubs937