Question: Efficiently using a dict of lists

Open fadams opened this issue 6 years ago • 0 comments

Hi thanks for the great project. From what I've observed it looks like objects stored in say a RedisDict are "immutable" in the sense that if I were to want to store a dict in a RedisDict (e.g. to implement a Redis backed shared dictionary of objects) that'd be fine, but if I wanted to change an element of any stored object I'd have to do something like:

tmp = redis_dict[key] tmp['field'] = "value" redis_dict[key] = tmp

For modest sized objects that workaround seems fine, but as objects get larger I wonder about the efficiency?

Things get more awkward as one use case I've got is what amounts to a dict of lists and I'd like that to be backed by Redis so multiple instances of my application can see the same dict of lists.

Now I know that I could keep writing the list to the redis_dict and from a functional perspective I think it'd work, but my suspicion is that it'd get horribly inefficient pretty quickly and as the number of items in the list grows at a guess what'd end up happening is an increasingly expensive JSON serialisation of the list.

As far as I can tell it's not possible for a RedisDict to contain a RedisList (when I tried it barfed with a JSON error IIRC)

Do you have any thoughts on how to do this sort of thing efficiently - I'm no Redis expert and one of the reasons I looked at pottery was to try and avoid having to to get too much into the weeds of Redis but now I'm wondering whether I might as well keep digging :-)

I'm guessing RedisList is implemented as, well, a Redis list with the values of the list JSON serialised so appending an item directly to a RedisList should only be as expensive as the serialisation cost for that Item, but how to model a dict of those efficiently (and intuitively) kind of escapes me.

I guess from the RedisList example lyrics = RedisList(redis=redis, key='lyrics')

the key is, well key - so to model something like {"a": [], "b": [], "c": [], "d": [] .......}

Do I simply have

{"a": RedisList(redis=redis, key="a"), "b": RedisList(redis=redis, key="b", "c": RedisList(redis=redis, key="c", "d": RedisList(redis=redis, key="d" .......}

Though as the containing dict isn't stored how would other instance know about the keys

So I'm thinking the only way to do this is by dereferencing - so I have a RedisSet holding the keys and if I want to "look up" a list I first check the key is in the dict and if it isn't I look it up in the RedisSet (cause it might have been set by another instance in the cluster) and if the key is in the RedisSet I create a RedisList instance using that key and add it to my dict.

I think that'd work, but is rather less transparent than a dict of lists idiom that it is trying to model and loses some of the benefit of what is supposed to be a more pythonic container abstraction.

Do you have any thoughts? Is my observation about right or have I missed something?

To be clear I'm not being critical, I'm just thinking out loud about how to model a shared dict of lists in a more efficient way than JSON serialising the entire list for every insert and would really appreciate thoughts on this problem.

MTIA

Apr 16 '20 15:04 fadams