Multiple instances of photos added to People after Recognize is run multiple times, people are merged or faces are added
Which version of recognize are you using?
8.1.0
Enabled Modes
Object recognition, Face recognition
TensorFlow mode
Normal mode
Downstream App
Memories App
Which Nextcloud version do you have installed?
30.0.0
Which Operating system do you have installed?
Ubuntu 22.04.5
Which database are you running Nextcloud on?
11.5.2-MariaDB-ubu2404
Which Docker container are you using to run Nextcloud? (if applicable)
No response
How much RAM does your server have?
384GB
What processor Architecture does your CPU have?
x86_64
Describe the Bug
I noticed after upgrading my docker image based installation to 30.0.0 and manually running classification in Recognize because it didn't seem to be triggered automatically (I might have been a bit impatient too) that multiple instances of the exact same photos were added to People. If I went into People, I would have 4-8 copies of each picture in different series. If I open them, they are the exact same image as the filename is up in the Tab name. I can only get rid of them manually by selecting "Remove from person" because I assume using delete would remove the single image, and I don't want that, just the multiple references.
I marked on the attached picture which ones are valid and which are just duplicates. In all cases, the filename is exactly the same and the face recognized is also the same (see green square).
EDIT: I have noticed a couple of interesting things since my bug report:
- the bubbles over People's index images showing how many pictures there is inside of that "person" don't take the duplicates into account. They only show the number of truly unique pictures in there.
- If I open a person and remove duplicates manually, they get added to Recognize's clustering queue, they don't just disappear, despite being duplicate references to files already in a cluster (person).
Note: Not knowing whose fault this might be, I also filed this with pulsejet/memories.
Expected Behavior
I would expect to see 1 instance of photos for each face recognized, even if clustering is launched manually several times, this shouldn't cause severe duplication of images.
To Reproduce
I can't really suggest any exact way to reproduce, however, the common theme seems to be that:
- running occ recognize:classify or occ recognize:cluster-faces
- merging people
- adding faces from the unidentified pool to people (where those pictures already probably exist)
causes the "duplication"
Debug log
No response
Hello :wave:
Thank you for taking the time to open this issue with recognize. I know it's frustrating when software causes problems. You have made the right choice to come here and open an issue to make sure your problem gets looked at and if possible solved. I try to answer all issues and if possible fix all bugs here, but it sometimes takes a while until I get to it. Until then, please be patient. Note also that GitHub is a place where people meet to make software better together. Nobody here is under any obligation to help you, solve your problems or deliver on any expectations or demands you may have, but if enough people come together we can collaborate to make this software better. For everyone. Thus, if you can, you could also look at other issues to see whether you can help other people with your knowledge and experience. If you have coding experience it would also be awesome if you could step up to dive into the code and try to fix the odd bug yourself. Everyone will be thankful for extra helping hands! One last word: If you feel, at any point, like you need to vent, this is not the place for it; you can go to the forum, to twitter or somewhere else. But this is a technical issue tracker, so please make sure to focus on the tech and keep your opinions to yourself. (Also see our Code of Conduct. Really.)
I look forward to working with you on this issue Cheers :blue_heart:
What is this supposed to be? Don't get me wrong please, but 1. Nobody should just run files dropped in by anyone with zero explanation as to what that is and 2. How is that supposed to help? Looks like a windows executable and a DLL. I'm on a Linux VM.
Could you check your nextcloud log for errors regarding recognize?
There is actually a guard in place that prevents the same picture from being processed by face detection multiple times. It can be circumvented I guess, by running multiple classifier processes in parallel, though. My guess would be that you were too impatient indeed, and multiple processes processed the same images in parallel.
There was nothing in the logs, no errors, not even a warning related to Recognize or Memories.
Could you perhaps check with the memories app which faces it has detected in the image. It could be that it just detected multiple faces in the same picture, possibly false positives that are not event real faces.
If you look at the screenshot I provided, it's the same face over and over. Or did you mean something else?
Ah, indeed, I didn't look closely enough to see that it already contains the face marker. 🤔
for me this happened without running it multiple times
cleaned up the oc_recognize_* tables manually, ran the occ's for cleanup-tags, remove-legacy-tags, reset-face-clusters, reset-faces, clear-background-jobs and then finally classify, which resulted in multiple results of the exact same face (same x,y,width,height and vector) being recognized
one example:
I had the same problem, as OP, due to the same reason.
I also checked my DB output for find duplicates
SELECT
main, (ids::jsonb - 0) AS exess, file_id, user_id, x, y, width, height, cluster_id, threshold
FROM (
SELECT
count(id) AS count, min(id) AS main, json_agg(id ORDER BY id) AS ids,
file_id, user_id, x, y, width, height, cluster_id, threshold
FROM
oc_recognize_face_detections
GROUP BY
x, y, width, height, cluster_id, file_id, user_id, threshold
) AS d
WHERE
d.count > 1;
Then I used this to delete the excess ones: (This uses postgess only functions!)
DELETE FROM oc_recognize_face_detections
WHERE id IN (
SELECT
exess::int
FROM (
SELECT
main, jsonb_array_elements(ids::jsonb - 0) AS exess,
file_id, user_id, x, y, width, height, cluster_id, threshold
FROM (
SELECT
count(id) AS count, min(id) AS main, json_agg(id ORDER BY id) AS ids,
file_id, user_id, x, y, width, height, cluster_id, threshold
FROM
oc_recognize_face_detections
GROUP BY
x, y, width, height, cluster_id, file_id, user_id, threshold
) AS d
WHERE
d.count > 1
) AS excess_list);
I deleted 79529 duplicate matches, and now every file shows up only once 🙂
The next release will have a repair step that removes the duplicates and a guard in place to prevent the creation of duplicates.
Thank you for the query @beardhatcode
As I'm using mariadb, I have had to modify the query. This one should do the trick for the same issue with mariadb.
DELETE FROM oc_recognize_face_detections WHERE id IN ( SELECT JSON_EXTRACT(exess, '$[0][0]') FROM ( SELECT main, json_array(ids) AS exess, file_id, user_id, x, y, width, height, cluster_id, threshold FROM ( SELECT count(id) AS count, min(id) AS main, json_arrayagg(id ORDER BY id) AS ids, file_id, user_id, x, y, width, height, cluster_id, threshold FROM oc_recognize_face_detections GROUP BY x, y, width, height, cluster_id, file_id, user_id, threshold ) AS d WHERE d.count > 1 ) AS excess_list);
I got slightly different threshold values, so I re-run the delete query without grouping on threshold to get rid of these duplicates.
The latest v8 v9 and v10 releases now incorporate a) a repair step that removes duplicates automatically and b) prevent the creation of duplicates. Thanks everyone for your cooperation and patience 💙