findimagedupes icon indicating copy to clipboard operation
findimagedupes copied to clipboard

Concurrent access to fingerprint DB

Open jinnko opened this issue 5 years ago • 3 comments

Is it safe to run multiple invocations of findimagedupes with each accessing a single fingerprint DB file?

The context is an image store of just over 1TB of images and using parallel to generate the hashes across all CPU cores first. For example something like this:

find /path/to/files/{InstantUpload,Media/Photos} -maxdepth 3 -type d | \
  nice -n 15 \
  parallel -X --max-args 1 \
    --jobs 8 -l 12 \
    -u --tmpdir \
    /path/to/file/tmp \
    findimagedupes -R -f '/path/to/files/.findimagedupes.db' --no-compare '{}'

Is this safe, or should each job slot be using a separate DB file, then merge all the files at the end?

jinnko avatar Jul 24 '20 16:07 jinnko

Unfortunately, no, concurrent DB access is not safe.

Yes, each job should use a separate DB and at the end you can use --merge.

Thanks for the question. I'll update the documentation.

jhnc avatar Jul 24 '20 16:07 jhnc

I should implement file-locking on the fingerprint database.

jhnc avatar Jul 24 '20 16:07 jhnc

Thanks for the quick reply. I'd suggest file-locking is a nice-to-have feature and not essential. A mention in the docs would suffice.

jinnko avatar Jul 24 '20 20:07 jinnko