keva Change the approach for data structures: list/hash/sortedSet etc

Issue: Current benchmarking for data structures like hash, list, and sorted set is very poor, compared to string (as string DS, we have an on-par performance with Redis).

Current Approach

For example, for the list data structure, when adding a new element to a list key, we have to do the below things:

Client sends lpush foo bar
Server gets byte array value from foo key from the DB
Server encodes that byte array value to LinkedList java heap data structure
Server adds new element bar to that LinkedList
Server decodes that LinkedList object to byte array
Server saves that new byte array into foo key in the DB

As you can see, decoding/encoding data structure from off-heap to on-heap is very costly, this leads to poor performance when the list is extensive, cost more heap space, and increase latency.

May 20 '22 13:05 tuhuynh27

Since @yampml @axblueblader @haphananhtuan @the123saurav have worked on these data structures, shall we have a discussion about this?

May 20 '22 13:05 tuhuynh27

Proposal

We will store list elements like this:

Say, we have a list key as foo, with the value being an array [foe, bar, bazz]
Key foo will store the byte array value LIST:3 (in that LIST is a prefix byte array to identify a list key)
Elements a stored like this: foo:0=foe, foo:1=bar, foo:2=bazz

Pros:

Encoding/Decoding cost is extremely reduced (to equivalent string key encoding/decoding)
Latency can be reduced also

Cons:

More complex logic to handle, for example, this LREM command, if we remove index 1, we will need to reput value for almost the whole list (O(N))

May 20 '22 13:05 tuhuynh27

Follow up pull request #138

May 31 '22 16:05 tuhuynh27