simbase icon indicating copy to clipboard operation
simbase copied to clipboard

Help me with my use case plz

Open Sherafgan opened this issue 9 years ago • 9 comments

Could you please help me with a starter code for my use case)

I want to store in vector similarity db key: sentenceID value: vector. Examples: id_1 [0.06284283101558685, 0.046207964420318604, 0.0053909290581941605, ...] id_2 [0.006631242576986551, 0.08234132081270218, -0.0787612572312355, ...]

And then I want n top similar vectors' IDs to the given vector.

Sherafgan avatar Feb 20 '17 14:02 Sherafgan

Sorry for the late reply. If my understanding is not wrong, you want to get n top similar vectors inside the same Vector Set. So just follow below steps:

  1. Decide which parameter values you should use
  • dimension of the vectors, for example we can assume 10 here
  • the name of the vector set, for example we can assume 'vector' here
  1. Setup
> bmk b10 t1 t2 t3 t4 t5 t6 t7 t8 t9 t0
> vmk b10 vector
> rmk vector vector cosinesq

3.Fill data

> vadd vector 1 0.11 0.112 0.1123...
> vadd vector 2 0.21 0.212 0.2123...

You should notice here the number 1, 2 are their ids inside simbase, you can setup a map between your IDs and the ids here.

  1. Retrieve result

You can retrieve the inner id from your ID via the map, for example it is 1234, and then issue the command:

> rrec vector 1234 vector

Hope the above instructions help.

mountain avatar Feb 22 '17 15:02 mountain

First two steps are okay. Others should be too, thank you! I want to perform mass insertion in redis, but there is something wrong. I saw the issues regarding redis, but figuring it out so far, I would very much appreciate your help.

Dimensions of vectors are 300, so I have b300 set. And I have batch file about 60.7MB with following commands: vadd vector 1 8.748467856397232E-4 0.008283308086295923 0.014330921694636345 0.02630641683936119 ...
vadd vector 2 0.032103515822779045 0.019140462851448155 0.035745080137117344 0.025860785591331394 ...

I either run this batch file with cat batch.file | redis-cli -p 7654 --pipe
and get this Error writing to the server: Connection reset by peer
or run it with this cat batch.file; sleep 60 | redis-cli -p 7654 --pipe
and get this
All data transferred. Waiting for the last reply...
No replies for 30 seconds: exiting.
errors: 1, replies: 0

Sherafgan avatar Feb 22 '17 21:02 Sherafgan

The pipe mode of redis protocol is not implemented. So I think below command will work.

redis-cli vadd vector 1 8.748467856397232E-4 0.008283308086295923 0.014330921694636345 0.02630641683936119 ...
redis-cli vadd vector 2 0.032103515822779045 0.019140462851448155 0.035745080137117344 0.025860785591331394 ...

mountain avatar Feb 23 '17 02:02 mountain

Or you can use python etc

import redis

dest = redis.Redis(host='localhost', port=7654)
with open('csvdatafile.txt') as data:
    for idx, line in enumerate(data):
        line = line[:-1]
        components = line.split(',')
        dest.execute_command('vadd', 'vector', idx, *components)

mountain avatar Feb 23 '17 02:02 mountain

Great the python script helped, I changed it a bit, and looks like this now:

import redis

dest = redis.Redis(host='localhost', port=7654)
with open('tmpFiles/t300.txt') as t300:
    for idx, line in enumerate(t300):
        line = line[:-1]
        b = line.split(' ')
print("Setting vector dimensions (b300): " + dest.execute_command('bmk', 'b300', line))
print("And the name (video) of vector set with b300 dimension: " + dest.execute_command('vmk', 'b300', 'video'))
print("Setting recommender (video->video): " + dest.execute_command('rmk', 'video', 'video', 'cosinesq'))
with open('tmpFiles/batch2.txt') as data:
    for idx, line in enumerate(data):
        line = line[:-1]
        components = line.split(',')
        print("ID:" + str(idx+1) + ": " + dest.execute_command('vadd', 'video', idx+1, *components))

And successfully executed it, but after I try to get some vector: vget video 1
I get this (error) Unknown server error! Or if I try this rrec video 1 video
I get this: (empty list or set) although I should get vecor ids.

Sherafgan avatar Feb 27 '17 12:02 Sherafgan

Could you paste the error in log file, it is at log directory

mountain avatar Feb 27 '17 15:02 mountain

2017-02-27 19:24:35 INFO  SimEngineImpl:313 - loading basis[b300]
2017-02-27 19:24:36 ERROR SimEngineImpl:56 - java.lang.ArrayIndexOutOfBoundsException
com.guokr.simbase.errors.SimException: java.lang.ArrayIndexOutOfBoundsException
	at com.guokr.simbase.engine.SimBasis.bload(SimBasis.java:105)
	at com.guokr.simbase.engine.SimEngineImpl$3.invoke(SimEngineImpl.java:322)
	at com.guokr.simbase.engine.SimEngineImpl$AsyncSafeRunner.run(SimEngineImpl.java:54)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ArrayIndexOutOfBoundsException
	at java.lang.System.arraycopy(Native Method)
	at gnu.trove.list.array.TFloatArrayList.toArray(TFloatArrayList.java:715)
	at com.guokr.simbase.store.DenseVectorSet.get(DenseVectorSet.java:124)
	at com.guokr.simbase.store.DenseVectorSet.get(DenseVectorSet.java:133)
	at com.guokr.simbase.store.Recommendation.<init>(Recommendation.java:58)
	at com.guokr.simbase.store.SerializerHelper$RecommendationSerializer.read(SerializerHelper.java:203)
	at com.guokr.simbase.store.SerializerHelper.readR(SerializerHelper.java:300)
	at com.guokr.simbase.store.SerializerHelper.readRecommendations(SerializerHelper.java:339)
	at com.guokr.simbase.engine.SimBasis.bload(SimBasis.java:84)
	... 5 more
2017-02-27 20:07:13 INFO  SimEngineImpl:313 - loading basis[b300]
2017-02-27 20:07:13 ERROR SimEngineImpl:56 - java.lang.ArrayIndexOutOfBoundsException
com.guokr.simbase.errors.SimException: java.lang.ArrayIndexOutOfBoundsException
	at com.guokr.simbase.engine.SimBasis.bload(SimBasis.java:105)
	at com.guokr.simbase.engine.SimEngineImpl$3.invoke(SimEngineImpl.java:322)
	at com.guokr.simbase.engine.SimEngineImpl$AsyncSafeRunner.run(SimEngineImpl.java:54)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ArrayIndexOutOfBoundsException
	at java.lang.System.arraycopy(Native Method)
	at gnu.trove.list.array.TFloatArrayList.toArray(TFloatArrayList.java:715)
	at com.guokr.simbase.store.DenseVectorSet.get(DenseVectorSet.java:124)
	at com.guokr.simbase.store.DenseVectorSet.get(DenseVectorSet.java:133)
	at com.guokr.simbase.store.Recommendation.<init>(Recommendation.java:58)
	at com.guokr.simbase.store.SerializerHelper$RecommendationSerializer.read(SerializerHelper.java:203)
	at com.guokr.simbase.store.SerializerHelper.readR(SerializerHelper.java:300)
	at com.guokr.simbase.store.SerializerHelper.readRecommendations(SerializerHelper.java:339)
	at com.guokr.simbase.engine.SimBasis.bload(SimBasis.java:84)
	... 5 more

Sherafgan avatar Feb 27 '17 17:02 Sherafgan

@mountain any guess? :) Is it that either 300 dimensions are too many for basis vector or the lengths of floating points of the vectors are too big?

Sherafgan avatar Mar 01 '17 11:03 Sherafgan

I tried simbase with 10d vectors it's ok, and as I try with >10 dimension vectors (e.g. 11d, although I tried with 13d, 14d, 15d, 25d, 50d) I get the following error

2017-03-04 20:43:03 INFO  SimEngineImpl:385 - basis[b11] created
2017-03-04 20:43:03 INFO  SimEngineImpl:460 - vectorset[video] created under basis[b11]
2017-03-04 20:43:03 INFO  SimEngineImpl:727 - creating recommendation[video_video] with funcscore[cosinesq]
2017-03-04 20:43:03 INFO  SimEngineImpl:740 - recommendation[video_video] created with funcscore[cosinesq]
2017-03-04 20:43:03 ERROR SimEngineImpl:56 - 
java.lang.ArrayIndexOutOfBoundsException
	at java.lang.System.arraycopy(Native Method)
	at gnu.trove.list.array.TFloatArrayList.toArray(TFloatArrayList.java:715)
	at com.guokr.simbase.store.DenseVectorSet.get(DenseVectorSet.java:124)
	at com.guokr.simbase.store.DenseVectorSet.rescore(DenseVectorSet.java:276)
	at com.guokr.simbase.store.Recommendation.processDenseChangedEvt(Recommendation.java:129)
	at com.guokr.simbase.store.Recommendation.onVectorAdded(Recommendation.java:208)
	at com.guokr.simbase.store.DenseVectorSet.add(DenseVectorSet.java:152)
	at com.guokr.simbase.engine.SimBasis.vadd(SimBasis.java:153)
	at com.guokr.simbase.engine.SimEngineImpl$14.invoke(SimEngineImpl.java:513)
	at com.guokr.simbase.engine.SimEngineImpl$AsyncSafeRunner.run(SimEngineImpl.java:54)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

Sherafgan avatar Mar 04 '17 17:03 Sherafgan