[Bug]: Old posts get synced with a too high hot_rank
Requirements
- [X] Is this a bug report? For questions or discussions use https://lemmy.ml/c/lemmy_support
- [X] Did you check to see if this issue already exists?
- [X] Is this only a single bug? Do not put multiple bugs in one issue.
- [X] Is this a UI / front end issue? Use the lemmy-ui repo.
Summary
By default, every row in post_aggregates starts with a hot_rank of 1728. (Same is true for comments)
This works well for new posts, but when a user searches for an old post on another instance, this older post will also get an entry in the post_aggregates table, with the same default value. The result of this is that users can flood the front page with 2-3Y old posts from other instances, as long as those posts had never been synced before.
This issue is compounded by the fact that hot_rank calculation for older posts only runs on server startup, and in fact will very often fail completely due to database deadlocks (#3076).
I propose significantly reducing the default value of hot_rank in post_aggregates (and also comment_aggregates) for any entry where published is older than 24h. Perhaps even reducing it to 0. I can make a PR with this proposed change.
Steps to Reproduce
- Sync an old post from any other instance
- Check it's
hot_rankin thepost_aggregatestable
Technical Details
N/A
Version
0.17.4
Lemmy Instance URL
No response
Yes this makes sense, seems like we missed an edge case there. You can find the relevant code in https://github.com/LemmyNet/lemmy/pull/2952. I think any post/comment db writes from crates/apub/ need to calculate the rank manually.
@sunaurus I'd prefer this edge-case be handled in two ways, and not through editing or messing with SQL triggers:
- Since this is only an issue for federated posts and comments, then in the apub code, on receive, update the relevant
post_aggregateshot_rankandhot_rank_activecolumns, for only that item. (Same for comment_aggregates) Check the scheduled_tasks.rs for how to do this already in diesel.- I like this the best, because it only changes that row.
- Another hacky solution, would be for the
scheduled_tasks.rsto set all historical rows older than a week to zero, but I don't like that as much.