python-diskcache icon indicating copy to clipboard operation
python-diskcache copied to clipboard

Q: Config for in-memory shared cache

Open ddorian opened this issue 2 years ago • 8 comments

Hi,

With missing in-memory multiprocess cache in Python, the closest thing seems sqlite.

Any best practices for in-memory shared cache between multiple processes?

Some ideas:

  • sqlite_synchronous=0
  • sqlite_mmap_size=0
  • sqlite_cache_size=0
  • sqlite_synchronous=0
  • directory="/dev/shm/"

ddorian avatar Jan 16 '24 12:01 ddorian

That looks like a good start. Remember that the directory path should be unique so /dev/shm/name is probably desired.

grantjenks avatar Jan 16 '24 16:01 grantjenks

I’m curious how much faster that’ll be. If you have a benchmark, please share the results.

grantjenks avatar Jan 16 '24 16:01 grantjenks

Here is a simple bench script that uses locust. You'll (may) need multiple processes because the whole process is probably blocked by sqlite lock (gevent).

import time

import diskcache
from locust import User, task


class MyClient(diskcache.Deque):
    @classmethod
    def fromcache(cls, cache, iterable=(), maxlen=None, request_event=None):
        self = super().fromcache(cache)
        self._request_event = request_event
        return self

    def __getattribute__(self, item: str):
        if item not in ("append",):
            return diskcache.Deque.__getattribute__(self, item)

        func = diskcache.Deque.__getattribute__(self, item)

        def wrapper(*args, **kwargs):
            request_meta = {
                "request_type": "diskcache",
                "name": func.__name__,
                "start_time": time.time(),
                "response_length": 0,
                # calculating this for an xmlrpc.client response would be too hard
                "response": None,
                "context": {},  # see HttpUser if you actually want to implement contexts
                "exception": None,
            }
            start_perf_counter = time.perf_counter()
            try:
                request_meta["response"] = func(*args, **kwargs)
            except Exception as e:
                request_meta["exception"] = e
            response_time = (time.perf_counter() - start_perf_counter) * 1000
            request_meta["response_time"] = response_time
            # This is what makes the request actually get logged in Locust
            self._request_event.events.request.fire(**request_meta)
            return request_meta["response"]

        return wrapper


class BaseActor(User):
    """
    A minimal Locust user class that provides an XmlRpcClient to its subclasses
    """

    host = ""
    abstract = True  # don't instantiate this as an actual user when running Locust
    client: MyClient

    def __init__(self, environment):
        super().__init__(environment)
        self.environment = environment
        self.cache = diskcache.Cache(
            # CONFIG 1
            # directory="/dev/shm/my_index.db",
            # sqlite_journal_mode="OFF",
            # statistics=0,
            # sqlite_synchronous=0,
            #
            # CONFIG 2
            directory="/tmp/my_index.db",
            statistics=0,
            sqlite_synchronous=0,
            sqlite_journal_mode="wal",
            #
            # shared config
            #
            sqlite_cache_size=0,
            sqlite_mmap_size=0,
            # size_limit=10 * (1024**3),
        )

        self.client = MyClient.fromcache(self.cache, request_event=environment)


class SingleInsert(BaseActor):
    @task
    def only_insert(self):
        self.client.append("s")

ddorian avatar Jan 17 '24 14:01 ddorian

What are the results?

grantjenks avatar Jan 17 '24 15:01 grantjenks

In my laptop it was 1500/s on disk and 5500/s in memory (Deque.append()). There were many bottlenecks like too many transactions, querying for rows before appending, unused indexes (for deque), triggers (cache count), etc etc.

With some relaxed settings (like not checking for full on every insert), sharding/fanout, batching, & a faster laptop/server should be able to run at least 10x faster.

ddorian avatar Jan 17 '24 16:01 ddorian

Assuming I use it, would you be open down the line to accepting PRs making some things optional & performance fixes?

Some examples:

  1. https://github.com/grantjenks/python-diskcache/blob/323787f507a6456c56cce213156a78b17073fe00/diskcache/core.py#L885 can be just 1 CTE query instead of doing 2 queries from python
  2. https://github.com/grantjenks/python-diskcache/blob/323787f507a6456c56cce213156a78b17073fe00/diskcache/core.py#L533 should probably be a partial index
  3. Make triggers keeping "total row count" optional https://github.com/grantjenks/python-diskcache/blob/323787f507a6456c56cce213156a78b17073fe00/diskcache/core.py#L544 (the db would do select count(*) underneath).
  4. Delete can just delete instead of selecting and not raise an error (optional) https://github.com/grantjenks/python-diskcache/blob/323787f507a6456c56cce213156a78b17073fe00/diskcache/core.py#L1349-L1365
  5. Don't need unique-index on Deque https://github.com/grantjenks/python-diskcache/blob/323787f507a6456c56cce213156a78b17073fe00/diskcache/core.py#L527-L530
  6. Use WITHOUT ROW ID for normal Cache/Index https://github.com/grantjenks/python-diskcache/blob/323787f507a6456c56cce213156a78b17073fe00/diskcache/core.py#L513
  7. Use AUTOINCREMENT in Deque.append() instead of https://github.com/grantjenks/python-diskcache/blob/323787f507a6456c56cce213156a78b17073fe00/diskcache/core.py#L1446-L1480
  8. There are such cases in most methods

ddorian avatar Jan 18 '24 08:01 ddorian

Sure, I’m open to improvements. But, I have some comments:

  1. How do you get the filenames and do the delete at the same time?
  2. You mean like excluding NULLs? That seems reasonable.
  3. Maybe. len() should really be fast in my mind but I suppose if it’s optional that would be okay. Maybe a count() method would better indicate the slower path.
  4. Again, how to cleanup the file then?
  5. How would you implement that? Deque is layered on top of Cache without specializations today. Specializing Cache could be tricky.
  6. Probably not, the rowid is really helpful in debugging.
  7. Probably not.

I like the partial index ideas best and the specializations for Deque least.

I would propose that you make all the changes you would like in a separate project (like fastdeque or diskdeque or whatever) and then benchmark the deque implementations against each other. Depending on how big the improvements are and how extensive the changes, maybe we could merge it back. Or, I could remove my Deque implementation and recommend yours.

Part of my hesitation is the deque scenario itself which is kind of moving away from the primary cache/index scenario.

grantjenks avatar Jan 18 '24 16:01 grantjenks

How do you get the filenames and do the delete at the same time?

I see you're storing some values as separate files. It won't work in that case. It should work if using incremental BLOB for large files.

I wouldn't cache small-inline-values & large-values that end up as separate files in the same cache instance though.

You mean like excluding NULLs? That seems reasonable.

Yes, less index to maintain.

Maybe. len() should really be fast in my mind but I suppose if it’s optional that would be okay. Maybe a count() method would better indicate the slower path.

Optional. I wouldn't want triggers in the hotpath when I will rarely (without locks) need the total row count.

Again, how to cleanup the file then?

See BLOB API like in 1. But still, there should be specialization for not using files. (actually, the reverse is true, caching big values into separate files is the specialization)

How would you implement that? Deque is layered on top of Cache without specializations today. Specializing Cache could be tricky.

There's no reason to store kv-cache & deque in the same table in sqlite, since they don't share anything from a quick look that I took.

Probably not, the rowid is really helpful in debugging.

It's still an extra index to maintain on every insert/delete.

Probably not.

I didn't understand the reasoning here.

I would propose that you make all the changes you would like in a separate project (like fastdeque or diskdeque or whatever) and then benchmark the deque implementations against each other.

It's ~easy to benchmark, just comment triggers & indexes & transactions. I did some and got the 5.5K -> 17K for in-memory Deque.append() as example (using code above).

Depending on how big the improvements are and how extensive the changes, maybe we could merge it back. Or, I could remove my Deque implementation and recommend yours.

Part of my hesitation is the deque scenario itself which is kind of moving away from the primary cache/index scenario.

I picked it as a hard case, only blocking write operations. The same (triggers,indexes,transactions,rowid) apply to the normal cache.


My original question was about config, which was answered.

ddorian avatar Jan 18 '24 17:01 ddorian