Help me with my use case plz
Could you please help me with a starter code for my use case)
I want to store in vector similarity db key: sentenceID value: vector. Examples:
id_1 [0.06284283101558685, 0.046207964420318604, 0.0053909290581941605, ...]
id_2 [0.006631242576986551, 0.08234132081270218, -0.0787612572312355, ...]
And then I want n top similar vectors' IDs to the given vector.
Sorry for the late reply. If my understanding is not wrong, you want to get n top similar vectors inside the same Vector Set. So just follow below steps:
- Decide which parameter values you should use
- dimension of the vectors, for example we can assume 10 here
- the name of the vector set, for example we can assume 'vector' here
- Setup
> bmk b10 t1 t2 t3 t4 t5 t6 t7 t8 t9 t0
> vmk b10 vector
> rmk vector vector cosinesq
3.Fill data
> vadd vector 1 0.11 0.112 0.1123...
> vadd vector 2 0.21 0.212 0.2123...
You should notice here the number 1, 2 are their ids inside simbase, you can setup a map between your IDs and the ids here.
- Retrieve result
You can retrieve the inner id from your ID via the map, for example it is 1234, and then issue the command:
> rrec vector 1234 vector
Hope the above instructions help.
First two steps are okay. Others should be too, thank you! I want to perform mass insertion in redis, but there is something wrong. I saw the issues regarding redis, but figuring it out so far, I would very much appreciate your help.
Dimensions of vectors are 300, so I have b300 set. And I have batch file about 60.7MB with following commands:
vadd vector 1 8.748467856397232E-4 0.008283308086295923 0.014330921694636345 0.02630641683936119 ...
vadd vector 2 0.032103515822779045 0.019140462851448155 0.035745080137117344 0.025860785591331394 ...
I either run this batch file with cat batch.file | redis-cli -p 7654 --pipe
and get this Error writing to the server: Connection reset by peer
or run it with this cat batch.file; sleep 60 | redis-cli -p 7654 --pipe
and get this
All data transferred. Waiting for the last reply...
No replies for 30 seconds: exiting.
errors: 1, replies: 0
The pipe mode of redis protocol is not implemented. So I think below command will work.
redis-cli vadd vector 1 8.748467856397232E-4 0.008283308086295923 0.014330921694636345 0.02630641683936119 ...
redis-cli vadd vector 2 0.032103515822779045 0.019140462851448155 0.035745080137117344 0.025860785591331394 ...
Or you can use python etc
import redis
dest = redis.Redis(host='localhost', port=7654)
with open('csvdatafile.txt') as data:
for idx, line in enumerate(data):
line = line[:-1]
components = line.split(',')
dest.execute_command('vadd', 'vector', idx, *components)
Great the python script helped, I changed it a bit, and looks like this now:
import redis
dest = redis.Redis(host='localhost', port=7654)
with open('tmpFiles/t300.txt') as t300:
for idx, line in enumerate(t300):
line = line[:-1]
b = line.split(' ')
print("Setting vector dimensions (b300): " + dest.execute_command('bmk', 'b300', line))
print("And the name (video) of vector set with b300 dimension: " + dest.execute_command('vmk', 'b300', 'video'))
print("Setting recommender (video->video): " + dest.execute_command('rmk', 'video', 'video', 'cosinesq'))
with open('tmpFiles/batch2.txt') as data:
for idx, line in enumerate(data):
line = line[:-1]
components = line.split(',')
print("ID:" + str(idx+1) + ": " + dest.execute_command('vadd', 'video', idx+1, *components))
And successfully executed it, but after I try to get some vector: vget video 1
I get this (error) Unknown server error!
Or if I try this rrec video 1 video
I get this: (empty list or set) although I should get vecor ids.
Could you paste the error in log file, it is at log directory
2017-02-27 19:24:35 INFO SimEngineImpl:313 - loading basis[b300]
2017-02-27 19:24:36 ERROR SimEngineImpl:56 - java.lang.ArrayIndexOutOfBoundsException
com.guokr.simbase.errors.SimException: java.lang.ArrayIndexOutOfBoundsException
at com.guokr.simbase.engine.SimBasis.bload(SimBasis.java:105)
at com.guokr.simbase.engine.SimEngineImpl$3.invoke(SimEngineImpl.java:322)
at com.guokr.simbase.engine.SimEngineImpl$AsyncSafeRunner.run(SimEngineImpl.java:54)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ArrayIndexOutOfBoundsException
at java.lang.System.arraycopy(Native Method)
at gnu.trove.list.array.TFloatArrayList.toArray(TFloatArrayList.java:715)
at com.guokr.simbase.store.DenseVectorSet.get(DenseVectorSet.java:124)
at com.guokr.simbase.store.DenseVectorSet.get(DenseVectorSet.java:133)
at com.guokr.simbase.store.Recommendation.<init>(Recommendation.java:58)
at com.guokr.simbase.store.SerializerHelper$RecommendationSerializer.read(SerializerHelper.java:203)
at com.guokr.simbase.store.SerializerHelper.readR(SerializerHelper.java:300)
at com.guokr.simbase.store.SerializerHelper.readRecommendations(SerializerHelper.java:339)
at com.guokr.simbase.engine.SimBasis.bload(SimBasis.java:84)
... 5 more
2017-02-27 20:07:13 INFO SimEngineImpl:313 - loading basis[b300]
2017-02-27 20:07:13 ERROR SimEngineImpl:56 - java.lang.ArrayIndexOutOfBoundsException
com.guokr.simbase.errors.SimException: java.lang.ArrayIndexOutOfBoundsException
at com.guokr.simbase.engine.SimBasis.bload(SimBasis.java:105)
at com.guokr.simbase.engine.SimEngineImpl$3.invoke(SimEngineImpl.java:322)
at com.guokr.simbase.engine.SimEngineImpl$AsyncSafeRunner.run(SimEngineImpl.java:54)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ArrayIndexOutOfBoundsException
at java.lang.System.arraycopy(Native Method)
at gnu.trove.list.array.TFloatArrayList.toArray(TFloatArrayList.java:715)
at com.guokr.simbase.store.DenseVectorSet.get(DenseVectorSet.java:124)
at com.guokr.simbase.store.DenseVectorSet.get(DenseVectorSet.java:133)
at com.guokr.simbase.store.Recommendation.<init>(Recommendation.java:58)
at com.guokr.simbase.store.SerializerHelper$RecommendationSerializer.read(SerializerHelper.java:203)
at com.guokr.simbase.store.SerializerHelper.readR(SerializerHelper.java:300)
at com.guokr.simbase.store.SerializerHelper.readRecommendations(SerializerHelper.java:339)
at com.guokr.simbase.engine.SimBasis.bload(SimBasis.java:84)
... 5 more
@mountain any guess? :) Is it that either 300 dimensions are too many for basis vector or the lengths of floating points of the vectors are too big?
I tried simbase with 10d vectors it's ok, and as I try with >10 dimension vectors (e.g. 11d, although I tried with 13d, 14d, 15d, 25d, 50d) I get the following error
2017-03-04 20:43:03 INFO SimEngineImpl:385 - basis[b11] created
2017-03-04 20:43:03 INFO SimEngineImpl:460 - vectorset[video] created under basis[b11]
2017-03-04 20:43:03 INFO SimEngineImpl:727 - creating recommendation[video_video] with funcscore[cosinesq]
2017-03-04 20:43:03 INFO SimEngineImpl:740 - recommendation[video_video] created with funcscore[cosinesq]
2017-03-04 20:43:03 ERROR SimEngineImpl:56 -
java.lang.ArrayIndexOutOfBoundsException
at java.lang.System.arraycopy(Native Method)
at gnu.trove.list.array.TFloatArrayList.toArray(TFloatArrayList.java:715)
at com.guokr.simbase.store.DenseVectorSet.get(DenseVectorSet.java:124)
at com.guokr.simbase.store.DenseVectorSet.rescore(DenseVectorSet.java:276)
at com.guokr.simbase.store.Recommendation.processDenseChangedEvt(Recommendation.java:129)
at com.guokr.simbase.store.Recommendation.onVectorAdded(Recommendation.java:208)
at com.guokr.simbase.store.DenseVectorSet.add(DenseVectorSet.java:152)
at com.guokr.simbase.engine.SimBasis.vadd(SimBasis.java:153)
at com.guokr.simbase.engine.SimEngineImpl$14.invoke(SimEngineImpl.java:513)
at com.guokr.simbase.engine.SimEngineImpl$AsyncSafeRunner.run(SimEngineImpl.java:54)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)