Use ReadySet for increased database read performance
Requirements
- [X] Is this a feature request? For questions or discussions use https://lemmy.ml/c/lemmy_support
- [X] Did you check to see if this issue already exists?
- [X] Is this only a feature request? Do not put multiple feature requests in one issue.
- [X] Is this a backend issue? Use the lemmy-ui repo for UI / frontend issues.
Is your proposal related to a problem?
Instances are easily overloaded when a specific post is shared over social medias and many people come to see it.
Describe the solution you'd like.
ReadySet is a caching layer that goes on top of your database and make the reads much faster. The cache is automatically updated when a write comes.
Describe alternatives you've considered.
Hand-rolling a cache using a key-value store system, but this is much more work, and brittle at the same time.
Additional context
No response
In general this sounds like an interesting tool. Historically "magic" database caching never worked well for me personally, though seeing the concepts and the source of this solution would make me try again.
One thing to keep in mind: The data-update mechanism seems to be based on logical postgres replication. This means that using the cache would be subject to the same issue as using a readonly follower database: it can take seconds for a change you write to be readable on the follower, or in this case, on the cache (I didn't test this specific to this caching layer, this judgement is purely based on the described architecture).
That being said: having readonly follower databases, or this cached layer, as way of scaling is a very valid approach. Typically this could be implemented as optional secondary database(s), so each task or handler can decide if they need 100% up-to-date data, or can read from the faster but slightly outdated follower (or cache). Every instance admin can set this up as they wish.
It uses PosgreSQL replication features to follow the logs and determine when cached content needs updated. This approach requires the least amount of surgery to Lemmy's Rust code, like you say in the "alternates". It also starts out altering nothing and passes data to the live database, you determine how caching gets enabled.
The lemmy_server code right now is repeating a lot of cacheable queries for things like the local instance's own settings, language codes, who an administrator is, the "trending" communities, etc. It's beyond crisis.
EDIT: revisiting this 24 hours later
Reading here, https://docs.readyset.io/reference/sql-support/ ReadySet is going to double your server storage requirements, and it also runs a second instance of PostgreSQL from what I understand, so your RAM usage is going to go up too. I wonder how you would allocate memory to which PostgreSQL for when queries that aren't cached miss.
@RocketDerp
and it also runs a second instance of PostgreSQL from what I understand, I'm not sure where you got that from.
@syphar
This means that using the cache would be subject to the same issue as using a readonly follower database: it can take seconds for a change you write to be readable on the follower
This is indeed the case
discussion with developers of ReadySet here: https://github.com/readysettech/readyset/discussions/112
looks like I was wrong/confused, ReadySet will interface with the primary postgress database and the differing credentials are explained.
Reading a little more about ReadySet, it wouldn't automatically start caching but only do this for explicitly defined queries.
So we would be back to needed code changes where lemmy needs to tell ReadySet which queries to cache, and which not. Or every instance admin does this on their own.
This sounds like something which can be deployed by instance admins without any changes to Lemmy code. So no need for an issue here. Also see https://github.com/LemmyNet/lemmy/issues/3005.