How do I determine which keys go to which shard for configured hash, distribution, and server? (in order to migrate data to new redis server)
How twemproxy works on keys distribution given a hash and distribution defined in twemproxy. Want to figure out which keys will go to which shard.(is there any way i can tell beforehand to which shard this key will go to?) My twemp configs are as follows:
mysql_binlog_replication_master: listen: 0.0.0.0:22116 hash: fnv1a_64 distribution: ketama auto_eject_hosts: false redis: true server_retry_timeout: 2000 server_failure_limit: 3 server_connections: 1 servers:
-
127.0.0.1:10020:1 master-shard-1
-
127.0.0.1:10021:1 master-shard-2
-
127.0.0.1:10022:1 master-shard-3
-
127.0.0.1:10023:1 master-shard-4
-
127.0.0.1:10020:1 master-shard-5
-
127.0.0.1:10021:1 master-shard-6
-
127.0.0.1:10022:1 master-shard-7
-
127.0.0.1:10023:1 master-shard-8
-
127.0.0.1:10020:1 master-shard-9
-
127.0.0.1:10021:1 master-shard-10
-
127.0.0.1:10022:1 master-shard-11
-
127.0.0.1:10023:1 master-shard-12
-
127.0.0.1:10020:1 master-shard-13
-
127.0.0.1:10021:1 master-shard-14
-
127.0.0.1:10022:1 master-shard-15
-
127.0.0.1:10023:1 master-shard-16
@architsingla13 It uses ketema.(DHT) make hash ring with server-name(in this case, master-shard-xx) by fnv1a_64 and hash(fnv1a_64) key and find hash value nearest higher value of server key) and store data into that server(server that has nearest higher hash value).
@charsyam is there any lib through which i can just check whether the given key will be part of which server instead of reinventing the wheel. I tried - https://github.com/RJ/ketama But it gave errors for all languages lib. Any lib which you recommend? Thanks in advance
Why do you need this? It will be more helpful to recommend?
Actually in my product, i have four shards as you can see in the twemp configs, and we have graphs for cpu usage of redises. We saw that these 4 shards are hitting 100 percent cpu(as redis is single threaded), thus the redis client having pool connection with it, has socket timeout so it restarts. So, we want to reshard data from 4 shards to 8 shards, and without downtime is preferred. And if i know which keys should go to which shard, i can pre store those in required shards and make the previous point to those 8 shards
@charsyam any recommendation? i basically want to split my data which is in 4 shards to 8 shards so as to reduce cpu usage per shard. I also thought of having one more twemp behind previous 4 shards and another 4 new shards. Will replay the keys from older shards one by one on the new twemp, then after all shards' keys are done, will fire delete query of those keys on older twemp.
Will this solution work?
@architsingla13, Do you want to move data in service? or just can stop it for a while? There are many approaches. 1] scan each redis and send it to new twemproxy. if you can build new twemproxy shard. in this case, you don't need any implementing for hash. 2] if you want to use library, I just recommend to extract and port twemproxy nc_katema.c code. it is not too hard to extract. because I am not sure it is the same with other library. so just use it is better.
@charsyam hey i wont be able to go with the first approach as i dont have enough space to build 8 new shards. Also, i want to move the data in service, as the application is critical.
Will my solution mentioned in above comment would work? will try the ketama twemp code for sure.
@architsingla13 How will you move your data? I just understand like below. 1] you have 4 nodes. 2] you have a plan to expand 8 nodes. -> but it will not be built new cluster. so, You want to build new shards with current nodes. 3] so how will you move your current data to new nodes? -> you said new don't have new 8 nodes.
@charsyam I was thinking after you said referring to use nc_katema.c code, i can know which keys should be in which shard, so i will del those keys from there and put the keys to the required shard
@architsingla13 That's my question. How do you know your data is new data(that you insert newly) or old data?
@charsyam oh sorry, when i start this process, i will stop the entry of any new data to the redis. Only reads will be made from the old data.
First approach - Step 1 - Read all keys from shard1, then find according to ketama where this key should go to . if this key belongs to source shard, skip else delete this key from current shard and add this key to the necessary shard. Step -2 repeat this process with other shards as well.
second approach - Setup one new twemp with old four shards plus new 4 shards. Replay all keys from old shards one by one onto the new twemp, till then old twemp be used to read data from. After all keys are done, the new four shards will contain only those keys which should be according to new twemp, but not for older shards. Delete those keys which do not belong to older shards according to ketama
@charsyam Does these approaches seems right to you?
@charsyam any views?
@architsingla13 Have you got any solution for this or ported nc_katema.c code ?
or ported nc_katema.c code ?
The ketama code is compatible with libmemcached, so anything using the same distribution and hash function should go to the same servers - ketama implementations would be available in memcached clients in many languages. Same for the hash function.
- Read the documentation of clients carefully to see if certain options don't change the choice of hash or distribution - you can try setting/getting keys with both those clients and twemproxy to see if they would get sent to the same servers
Depending on the use case, https://github.com/sripathikrishnan/redis-rdb-tools may also be useful in deleting outdated keys, listing all old keys, etc.