lemmy icon indicating copy to clipboard operation
lemmy copied to clipboard

[Bug]: Old posts get synced with a too high hot_rank

Open sunaurus opened this issue 2 years ago • 2 comments

Requirements

  • [X] Is this a bug report? For questions or discussions use https://lemmy.ml/c/lemmy_support
  • [X] Did you check to see if this issue already exists?
  • [X] Is this only a single bug? Do not put multiple bugs in one issue.
  • [X] Is this a UI / front end issue? Use the lemmy-ui repo.

Summary

By default, every row in post_aggregates starts with a hot_rank of 1728. (Same is true for comments)

This works well for new posts, but when a user searches for an old post on another instance, this older post will also get an entry in the post_aggregates table, with the same default value. The result of this is that users can flood the front page with 2-3Y old posts from other instances, as long as those posts had never been synced before.

This issue is compounded by the fact that hot_rank calculation for older posts only runs on server startup, and in fact will very often fail completely due to database deadlocks (#3076).

I propose significantly reducing the default value of hot_rank in post_aggregates (and also comment_aggregates) for any entry where published is older than 24h. Perhaps even reducing it to 0. I can make a PR with this proposed change.

Steps to Reproduce

  1. Sync an old post from any other instance
  2. Check it's hot_rank in the post_aggregates table

Technical Details

N/A

Version

0.17.4

Lemmy Instance URL

No response

sunaurus avatar Jun 15 '23 09:06 sunaurus

Yes this makes sense, seems like we missed an edge case there. You can find the relevant code in https://github.com/LemmyNet/lemmy/pull/2952. I think any post/comment db writes from crates/apub/ need to calculate the rank manually.

Nutomic avatar Jun 15 '23 10:06 Nutomic

@sunaurus I'd prefer this edge-case be handled in two ways, and not through editing or messing with SQL triggers:

  • Since this is only an issue for federated posts and comments, then in the apub code, on receive, update the relevant post_aggregates hot_rank and hot_rank_active columns, for only that item. (Same for comment_aggregates) Check the scheduled_tasks.rs for how to do this already in diesel.
    • I like this the best, because it only changes that row.
  • Another hacky solution, would be for the scheduled_tasks.rs to set all historical rows older than a week to zero, but I don't like that as much.

dessalines avatar Jun 15 '23 16:06 dessalines