face_recognition icon indicating copy to clipboard operation
face_recognition copied to clipboard

how to effciently query them in database?please!

Open leshier12 opened this issue 7 years ago • 39 comments

It would be really nice if you could add an example which shows how the face encodings can be stored in a database and how to efficiently query them.

leshier12 avatar Mar 14 '18 00:03 leshier12

Hello leshier12, I was interested in the exact same thing. Have you found any resources for this? Does it look like this would be a possibility with this API?

mmelatti avatar Mar 14 '18 12:03 mmelatti

What the database you use? For PostgreSQL look this post https://stackoverflow.com/questions/23557537/how-to-convert-numpy-array-to-postgresql-list

DeadNumbers avatar Mar 14 '18 12:03 DeadNumbers

I believe the approach I would like to take is storing 128 measurements (embedded faces of known face), in some type of database. Then querying this database with a basic machine learning classification algorithm like SVM classifier (or kNN?) using an unknown face grabbed from an image. Any notes how how this type of database could be structured? in facerec_from_video_file.py they build an array of known faces and then call: compare_faces(known_faces, face_encoding, tolerance); I'd like a system that can scale for very large number of known faces (possibly 1 image per known face). In the end, I hope to feed my system a video stream. Thanks for any advice / insight into the performance of compare_faces()!

mmelatti avatar Mar 14 '18 14:03 mmelatti

Having a table of 128 float columns (f1, f2, ..., f128), you can build composite index on several of them (e.g. (f1, f2, f3, f4)) for selection optimization, and then query

SELECT id, POW(f1 - :e1, 2) + POW(f2 - :e2, 2) + ... + POW(f128 - :e128, 2) AS square_distance
FROM encodings
WHERE 
 f1 > :minF1 AND f1 < :maxF1 AND 
 f2 > :minF2 AND f2 < :maxF2 AND 
 ... 
 f128 > :minF128 AND f128 < :maxF128
ORDER BY square_distance ASC LIMIT 1

where

:eX = encodingX
:minFX = encodingX - 0.1 * abs(encodingX)
:maxFX =  encodingX + 0.1 * abs(encodingX)

0.1 defines how strict is selection, 0 is most strict.

This should bring you a row with minimal vector distance to searched encodings. It may also return nothing if the selection is too strict.

vearutop avatar Mar 15 '18 04:03 vearutop

Fantastic vearutop. That query is exactly what I'm looking for! I needed that confirmation before moving forward. Thanks so much!

mmelatti avatar Mar 15 '18 15:03 mmelatti

Postgresql has the type CUBE, use it that will be much easier Ex: SELECT c FROM test ORDER BY c <-> cube(array[0.5,0.5,0.5]) LIMIT 1; https://www.postgresql.org/docs/10/static/cube.html Remember : To make it harder for people to break things, there is a limit of 100 on the number of dimensions of cubes. This is set in cubedata.h if you need something bigger. Change cube.data.h to 128 and test.

railton avatar Mar 15 '18 16:03 railton

Thanks Railton! this works perfectly. I'm totally new to Postgresql. I've used "CREATE EXTENSION cube" to import the data type (works up to 100 dimensions). I can't find "cubedata.h" anywhere in the postgresql binary to change the value from 100 dimensions to 128. Does anyone know where to find this? should I be importing the data type cube another way? Thanks in advanced


I can only find cube.sql files in share/postgresql/extensions

mmelatti avatar Mar 19 '18 15:03 mmelatti

Use this container, it is already changed to work up to 350 dimensions. https://github.com/oelmekki/postgres-350d

railton avatar Mar 19 '18 15:03 railton

Thanks Again :)

mmelatti avatar Mar 19 '18 15:03 mmelatti

Adding this for anybody trying to make this type of DB on MacOSX. also for my own future reference when I forget how to do this and need to reinstall. (the docker solution did not work so this is the manual solution that worked for me):

Requirements:

  • Need to have Xcode downloaded for Cmake
  • Homebrew

Starters... make sure you have postgresql downloaded for used for 'pg_config' $ brew install postgresql

INSTRUCTIONS: download source for postgresql: https://ftp.postgresql.org/pub/source/v9.6.0/postgresql-9.6.0.tar.bz2 (get the correct version number ___mine is: 10.3)

unzip...

change /contrib/cube/cubedata.h to include 128 dimensions: #define CUBE_MAX_DIM (100) -> #define CUBE_MAX_DIM (128)

//128 float for facial Encodings


Follow Directions in the 'INSTALL' file at top directory, Follow instructions for install and for starting server:

./configure make su make install adduser postgres mkdir /usr/local/pgsql/data chown postgres /usr/local/pgsql/data su - postgres /usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data

***Start Server: /usr/local/pgsql/bin/postgres -D /usr/local/pgsql/data

***note (use this command to change whoami on mac) $ sudo su - postgres

Now we need to add the extension. go to the /contrib/ directory follow directions in the README, we can either make all make all install for all extensions or we can navigate to /contrib/cube/ and just: $ make $ make install for this one extension


Now you want to go to your database and add the extension. For this I just used my GUI and ran the following: CREATE EXTENSION cube

mmelatti avatar Mar 19 '18 19:03 mmelatti

@mmelatti what if I have postgreql already installed on my system ? Should I remove it first or install along other one.

xenc0d3r avatar Mar 21 '18 18:03 xenc0d3r

@xenc0d3r when I did it I used the uninstaller to remove the version of postgres I had. Then I downloaded the source for postgres with that link. I also changed the URL and downloaded the current 10.3 version instead of that 9.6 version.

mmelatti avatar Mar 21 '18 18:03 mmelatti

hello @vearutop When we encode the photo, the values in the lis is like -0.09634063 format. How can I convert them into float type in python to store them in single row ?

xenc0d3r avatar Mar 23 '18 15:03 xenc0d3r

@xenc0d3r encode using that, base64 or is it saving the array that the library returns?

railton avatar Mar 23 '18 17:03 railton

@railton in vearutop's post (above) he demos how to add threshold for min max and returning "unknown face". do you have any links for postgresql documentation for doing something similar?

I am returning closest match: SELECT c FROM test ORDER BY c <-> cube(array[0.5,0.5,0.5]) LIMIT 1;

I could get a distance vector between the current face encoding and the closest match returned from the database, but I'm concerned about this distance: image

not being a good measure if the resolution for the database entry is different from the resolution in the face encoding that we are trying to match?

can I accomplish thresholding with my query? and if not what is the best approach with what I have returned in python program? Thanks!

UPDATE: I believe I've answered my own question (see below). Please feel free to lead feedback for better solutions and more info.

mmelatti avatar Mar 25 '18 15:03 mmelatti

If you want to utilize Postgre's cube, you can use a small trick do it without patching CUBE_MAX_DIM. You can split all points in two vectors, 64 points each. This violates mathematical model a bit, but for the purpose of finding closest vector should work fine.

I made a small example of PostgreSQL and face_recognition integration: https://github.com/vearutop/face-postgre

vearutop avatar Mar 26 '18 11:03 vearutop

@vearutop I am using cube extension and it is good. But if I upload a face which is not in database it returns the most similar face to the uploaded image with LIMIT 1 parameter. Is there a way of fixing this.

xenc0d3r avatar Mar 26 '18 13:03 xenc0d3r

@mmelatti You did exactly what I did, sorry for my delay.

railton avatar Mar 26 '18 17:03 railton

can use the hash trick to query face image ? have anyone done

xxllp avatar Apr 04 '18 04:04 xxllp

@xxllp you will always have slightly different vector values for same face from different photos, hence you can not query by equality, you can only look for a vector that is closest (by euclidean distance).

Hash is only suitable for exact equality comparison because slightly different vectors would produce completely different hashes, so it is not relevant for the task.

vearutop avatar Apr 04 '18 04:04 vearutop

@mmelatti i run python on windows, how to define cube max dim 128 in postgresql on windows. because i run postgresql installer and i don't find cubedata.h to edit. i set data type of face_encoding is public.cube in pgAdmin3 capture2

oknoproblem3 avatar Apr 08 '18 00:04 oknoproblem3

@oknoproblem3 you need recompile PostgreSQL from source https://www.postgresql.org/docs/10/static/install-windows-full.html

DeadNumbers avatar Apr 08 '18 08:04 DeadNumbers

@oknoproblem3 You don't really have to recompile PostgreSQL to workaround CUBE limitation. Having your 128 points split into two vectors (64 + 64 for example) you can calculate euclidean distance of whole vector from euclidean distances of two sub vectors:

            query = "SELECT id, sqrt(power(CUBE(array[{}]) <-> vec_low, 2) + power(CUBE(array[{}]) <-> vec_high, 2)) as dist FROM encodings ORDER BY dist ASC LIMIT 1".format(
                ','.join(str(s) for s in encodings[0][0:64]),
                ','.join(str(s) for s in encodings[0][64:128]),
            )

dist in this expression will be valid to check against the threshold of 0.6.

EDIT: fixed array slicing to include last elements ([0:63] -> [0:64], [64:127] -> [64:128]).

vearutop avatar May 23 '18 08:05 vearutop

@vearutop

128 points split into 64 + 64, follow your last advice, one thing makes me confused, we are finding the smallest distance.

why don't we use

CUBE(array[{}]) <-> vec_low + CUBE(array[{}]) <-> vec_high

instead of

sqrt(power(CUBE(array[{}]) <-> vec_low, 2) + power(CUBE(array[{}]) <-> vec_high, 2))

it looks like "power" then "sqrt" have no effects on result (distance which biger is more biger and smaller is more smaller)

why2lyj avatar Jun 22 '18 03:06 why2lyj

Euclidean distance between (a1,b1,c1,d1) and (a2,b2,c2,d2) is sqrt((a1-a2)^2+(b1-b2)^2+(c1-c2)^2+(d1-d2)^2).

Mathematically sqrt((a1-a2)^2+(b1-b2)^2+(c1-c2)^2+(d1-d2)^2) != sqrt((a1-a2)^2+(b1-b2)^2) + sqrt((c1-c2)^2+(d1-d2)^2).

If you ^2 left and right parts you'll have (remember the calculus equation (a+b)^2 = a^2 + 2ab + b^2):

(a1-a2)^2+(b1-b2)^2+(c1-c2)^2+(d1-d2)^2 != (a1-a2)^2+(b1-b2)^2 + 2*sqrt((a1-a2)^2+(b1-b2)^2)*sqrt((c1-c2)^2+(d1-d2)^2)+ (c1-c2)^2+(d1-d2)^2

Sorry for poor math formatting :)

vearutop avatar Jun 22 '18 08:06 vearutop

Do you guys have sample efficient query for MS SQL?

devmikko avatar Aug 08 '18 11:08 devmikko

What an excellent thread, guys! @railton thank you for pointing out to cube type. I had no idea.

I needed to store text embedding so my vector is 512 items. For those who need PSQL with cube that supports more than 100 items I created docker images with patched extension setting limit to 2048. I created builds for 10.7 and 11.2. Feel free to use them

https://hub.docker.com/r/expert/postgresql-large-cube/tags https://github.com/unoexperto/docker-postgresql-large-cube

unoexperto avatar Apr 23 '19 15:04 unoexperto

@unoexperto Just out of curiosity, what do you store more than the 128 points on the face?

And thank you for your contribution.

railton avatar Apr 25 '19 15:04 railton

@railton My use-case is different. I store embeddings that encode meaning of scientific article.

unoexperto avatar Apr 25 '19 16:04 unoexperto

@vearutop What do you suggest, Shall i go for a modified Postgres or we can just go with Vector Split for 64 points

jayaraj avatar Jan 06 '20 06:01 jayaraj