keva icon indicating copy to clipboard operation
keva copied to clipboard

Change the approach for data structures: list/hash/sortedSet etc

Open tuhuynh27 opened this issue 3 years ago • 3 comments

Issue: Current benchmarking for data structures like hash, list, and sorted set is very poor, compared to string (as string DS, we have an on-par performance with Redis).

Current Approach

For example, for the list data structure, when adding a new element to a list key, we have to do the below things:

  • Client sends lpush foo bar
  • Server gets byte array value from foo key from the DB
  • Server encodes that byte array value to LinkedList java heap data structure
  • Server adds new element bar to that LinkedList
  • Server decodes that LinkedList object to byte array
  • Server saves that new byte array into foo key in the DB

As you can see, decoding/encoding data structure from off-heap to on-heap is very costly, this leads to poor performance when the list is extensive, cost more heap space, and increase latency.

tuhuynh27 avatar May 20 '22 13:05 tuhuynh27

Since @yampml @axblueblader @haphananhtuan @the123saurav have worked on these data structures, shall we have a discussion about this?

tuhuynh27 avatar May 20 '22 13:05 tuhuynh27

Proposal

We will store list elements like this:

  • Say, we have a list key as foo, with the value being an array [foe, bar, bazz]
  • Key foo will store the byte array value LIST:3 (in that LIST is a prefix byte array to identify a list key)
  • Elements a stored like this: foo:0=foe, foo:1=bar, foo:2=bazz

Pros:

  • Encoding/Decoding cost is extremely reduced (to equivalent string key encoding/decoding)
  • Latency can be reduced also

Cons:

  • More complex logic to handle, for example, this LREM command, if we remove index 1, we will need to reput value for almost the whole list (O(N))

tuhuynh27 avatar May 20 '22 13:05 tuhuynh27

Follow up pull request #138

tuhuynh27 avatar May 31 '22 16:05 tuhuynh27