how to effciently query them in database?please!
It would be really nice if you could add an example which shows how the face encodings can be stored in a database and how to efficiently query them.
Hello leshier12, I was interested in the exact same thing. Have you found any resources for this? Does it look like this would be a possibility with this API?
What the database you use? For PostgreSQL look this post https://stackoverflow.com/questions/23557537/how-to-convert-numpy-array-to-postgresql-list
I believe the approach I would like to take is storing 128 measurements (embedded faces of known face), in some type of database. Then querying this database with a basic machine learning classification algorithm like SVM classifier (or kNN?) using an unknown face grabbed from an image. Any notes how how this type of database could be structured? in facerec_from_video_file.py they build an array of known faces and then call: compare_faces(known_faces, face_encoding, tolerance); I'd like a system that can scale for very large number of known faces (possibly 1 image per known face). In the end, I hope to feed my system a video stream. Thanks for any advice / insight into the performance of compare_faces()!
Having a table of 128 float columns (f1, f2, ..., f128), you can build composite index on several of them (e.g. (f1, f2, f3, f4)) for selection optimization, and then query
SELECT id, POW(f1 - :e1, 2) + POW(f2 - :e2, 2) + ... + POW(f128 - :e128, 2) AS square_distance
FROM encodings
WHERE
f1 > :minF1 AND f1 < :maxF1 AND
f2 > :minF2 AND f2 < :maxF2 AND
...
f128 > :minF128 AND f128 < :maxF128
ORDER BY square_distance ASC LIMIT 1
where
:eX = encodingX
:minFX = encodingX - 0.1 * abs(encodingX)
:maxFX = encodingX + 0.1 * abs(encodingX)
0.1 defines how strict is selection, 0 is most strict.
This should bring you a row with minimal vector distance to searched encodings. It may also return nothing if the selection is too strict.
Fantastic vearutop. That query is exactly what I'm looking for! I needed that confirmation before moving forward. Thanks so much!
Postgresql has the type CUBE, use it that will be much easier Ex: SELECT c FROM test ORDER BY c <-> cube(array[0.5,0.5,0.5]) LIMIT 1; https://www.postgresql.org/docs/10/static/cube.html Remember : To make it harder for people to break things, there is a limit of 100 on the number of dimensions of cubes. This is set in cubedata.h if you need something bigger. Change cube.data.h to 128 and test.
Thanks Railton! this works perfectly. I'm totally new to Postgresql. I've used "CREATE EXTENSION cube" to import the data type (works up to 100 dimensions). I can't find "cubedata.h" anywhere in the postgresql binary to change the value from 100 dimensions to 128. Does anyone know where to find this? should I be importing the data type cube another way? Thanks in advanced
I can only find cube.sql files in share/postgresql/extensions
Use this container, it is already changed to work up to 350 dimensions. https://github.com/oelmekki/postgres-350d
Thanks Again :)
Adding this for anybody trying to make this type of DB on MacOSX. also for my own future reference when I forget how to do this and need to reinstall. (the docker solution did not work so this is the manual solution that worked for me):
Requirements:
- Need to have Xcode downloaded for Cmake
- Homebrew
Starters... make sure you have postgresql downloaded for used for 'pg_config' $ brew install postgresql
INSTRUCTIONS: download source for postgresql: https://ftp.postgresql.org/pub/source/v9.6.0/postgresql-9.6.0.tar.bz2 (get the correct version number ___mine is: 10.3)
unzip...
change /contrib/cube/cubedata.h to include 128 dimensions: #define CUBE_MAX_DIM (100) -> #define CUBE_MAX_DIM (128)
//128 float for facial Encodings
Follow Directions in the 'INSTALL' file at top directory, Follow instructions for install and for starting server:
./configure make su make install adduser postgres mkdir /usr/local/pgsql/data chown postgres /usr/local/pgsql/data su - postgres /usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data
***Start Server: /usr/local/pgsql/bin/postgres -D /usr/local/pgsql/data
***note (use this command to change whoami on mac) $ sudo su - postgres
Now we need to add the extension. go to the /contrib/ directory follow directions in the README, we can either make all make all install for all extensions or we can navigate to /contrib/cube/ and just: $ make $ make install for this one extension
Now you want to go to your database and add the extension. For this I just used my GUI and ran the following: CREATE EXTENSION cube
@mmelatti what if I have postgreql already installed on my system ? Should I remove it first or install along other one.
@xenc0d3r when I did it I used the uninstaller to remove the version of postgres I had. Then I downloaded the source for postgres with that link. I also changed the URL and downloaded the current 10.3 version instead of that 9.6 version.
hello @vearutop When we encode the photo, the values in the lis is like -0.09634063 format. How can I convert them into float type in python to store them in single row ?
@xenc0d3r encode using that, base64 or is it saving the array that the library returns?
@railton in vearutop's post (above) he demos how to add threshold for min max and returning "unknown face". do you have any links for postgresql documentation for doing something similar?
I am returning closest match: SELECT c FROM test ORDER BY c <-> cube(array[0.5,0.5,0.5]) LIMIT 1;
I could get a distance vector between the current face encoding and the closest match returned from the database, but I'm concerned about this distance:

not being a good measure if the resolution for the database entry is different from the resolution in the face encoding that we are trying to match?
can I accomplish thresholding with my query? and if not what is the best approach with what I have returned in python program? Thanks!
UPDATE: I believe I've answered my own question (see below). Please feel free to lead feedback for better solutions and more info.
If you want to utilize Postgre's cube, you can use a small trick do it without patching CUBE_MAX_DIM. You can split all points in two vectors, 64 points each. This violates mathematical model a bit, but for the purpose of finding closest vector should work fine.
I made a small example of PostgreSQL and face_recognition integration: https://github.com/vearutop/face-postgre
@vearutop I am using cube extension and it is good. But if I upload a face which is not in database it returns the most similar face to the uploaded image with LIMIT 1 parameter. Is there a way of fixing this.
@mmelatti You did exactly what I did, sorry for my delay.
can use the hash trick to query face image ? have anyone done
@xxllp you will always have slightly different vector values for same face from different photos, hence you can not query by equality, you can only look for a vector that is closest (by euclidean distance).
Hash is only suitable for exact equality comparison because slightly different vectors would produce completely different hashes, so it is not relevant for the task.
@mmelatti i run python on windows, how to define cube max dim 128 in postgresql on windows. because i run postgresql installer and i don't find cubedata.h to edit. i set data type of face_encoding is public.cube in pgAdmin3
@oknoproblem3 you need recompile PostgreSQL from source https://www.postgresql.org/docs/10/static/install-windows-full.html
@oknoproblem3 You don't really have to recompile PostgreSQL to workaround CUBE limitation. Having your 128 points split into two vectors (64 + 64 for example) you can calculate euclidean distance of whole vector from euclidean distances of two sub vectors:
query = "SELECT id, sqrt(power(CUBE(array[{}]) <-> vec_low, 2) + power(CUBE(array[{}]) <-> vec_high, 2)) as dist FROM encodings ORDER BY dist ASC LIMIT 1".format(
','.join(str(s) for s in encodings[0][0:64]),
','.join(str(s) for s in encodings[0][64:128]),
)
dist in this expression will be valid to check against the threshold of 0.6.
EDIT: fixed array slicing to include last elements ([0:63] -> [0:64], [64:127] -> [64:128]).
@vearutop
128 points split into 64 + 64, follow your last advice, one thing makes me confused, we are finding the smallest distance.
why don't we use
CUBE(array[{}]) <-> vec_low + CUBE(array[{}]) <-> vec_high
instead of
sqrt(power(CUBE(array[{}]) <-> vec_low, 2) + power(CUBE(array[{}]) <-> vec_high, 2))
it looks like "power" then "sqrt" have no effects on result (distance which biger is more biger and smaller is more smaller)
Euclidean distance between (a1,b1,c1,d1) and (a2,b2,c2,d2) is sqrt((a1-a2)^2+(b1-b2)^2+(c1-c2)^2+(d1-d2)^2).
Mathematically sqrt((a1-a2)^2+(b1-b2)^2+(c1-c2)^2+(d1-d2)^2) != sqrt((a1-a2)^2+(b1-b2)^2) + sqrt((c1-c2)^2+(d1-d2)^2).
If you ^2 left and right parts you'll have (remember the calculus equation (a+b)^2 = a^2 + 2ab + b^2):
(a1-a2)^2+(b1-b2)^2+(c1-c2)^2+(d1-d2)^2 != (a1-a2)^2+(b1-b2)^2 + 2*sqrt((a1-a2)^2+(b1-b2)^2)*sqrt((c1-c2)^2+(d1-d2)^2)+ (c1-c2)^2+(d1-d2)^2
Sorry for poor math formatting :)
Do you guys have sample efficient query for MS SQL?
What an excellent thread, guys! @railton thank you for pointing out to cube type. I had no idea.
I needed to store text embedding so my vector is 512 items. For those who need PSQL with cube that supports more than 100 items I created docker images with patched extension setting limit to 2048. I created builds for 10.7 and 11.2. Feel free to use them
https://hub.docker.com/r/expert/postgresql-large-cube/tags https://github.com/unoexperto/docker-postgresql-large-cube
@unoexperto Just out of curiosity, what do you store more than the 128 points on the face?
And thank you for your contribution.
@railton My use-case is different. I store embeddings that encode meaning of scientific article.
@vearutop What do you suggest, Shall i go for a modified Postgres or we can just go with Vector Split for 64 points