Change the approach for data structures: list/hash/sortedSet etc
Issue: Current benchmarking for data structures like hash, list, and sorted set is very poor, compared to string (as string DS, we have an on-par performance with Redis).
Current Approach
For example, for the list data structure, when adding a new element to a list key, we have to do the below things:
- Client sends
lpush foo bar - Server gets byte array value from
fookey from the DB - Server encodes that byte array value to
LinkedListjava heap data structure - Server adds new element
barto thatLinkedList - Server decodes that
LinkedListobject to byte array - Server saves that new byte array into
fookey in the DB
As you can see, decoding/encoding data structure from off-heap to on-heap is very costly, this leads to poor performance when the list is extensive, cost more heap space, and increase latency.
Since @yampml @axblueblader @haphananhtuan @the123saurav have worked on these data structures, shall we have a discussion about this?
Proposal
We will store list elements like this:
- Say, we have a
listkey asfoo, with the value being an array[foe, bar, bazz] - Key
foowill store the byte array valueLIST:3(in thatLISTis a prefix byte array to identify a list key) - Elements a stored like this:
foo:0=foe,foo:1=bar,foo:2=bazz
Pros:
- Encoding/Decoding cost is extremely reduced (to equivalent string key encoding/decoding)
- Latency can be reduced also
Cons:
- More complex logic to handle, for example, this LREM command, if we remove index 1, we will need to
reputvalue for almost the whole list (O(N))
Follow up pull request #138