Schedule cron job on Heroku to remove issues older than a year
We could run it weekly. The command to run is ./manage.py cleanup_issues --nb-days=365.
https://devcenter.heroku.com/articles/scheduler
Why don't we do this as a stored procedure into the pgsql db? W could use https://github.com/citusdata/pg_cron as an extension to pgsql 10+ and mandate it to run a simple cleanup issue once every week.
That might work too, but I don't know if we can install extensions in Heroku's pgsql.
It looks like this is already done:
We just need to look into the DB to make sure the script is indeed working and old issues are no longer present.
Service desk pointed me towards the heroku-members admins so I can gain access to the code-review apps running there.
Then it's just a matter of running a query on the db or through the django admin shell
I got access to the heroku console, and ran a few queries on production
>>> Issue.objects.all().aggregate(min=Min('created'))
{'min': datetime.datetime(2021, 9, 19, 10, 30, 17, 550164, tzinfo=datetime.timezone.utc)}
>>> Issue.objects.count()
75644841
The cleanup is not working as expected: there are issues from 3 years ago in the DB, totalling 75M issues.
This explains the bad performance of the DB when querying or inserting issues that @Archaeopteryx noticed.
Probably a bug in the cleanup code: when running the command from the scheduler, I get the following
Didn't find any old revision to delete.
INFO:code_review_backend.issues.management.commands.cleanup_issues:Didn't find any old revision to delete.
The cleanup code only works on revision based and headed on autoland & mozilla-central.
I think this filter is irrevelant now, we can simply remove it and run the cleanup on all revisions.
BTW the current configuration cleans all revisions older than 30 days: we should also increase that to a year (or configure the scheduler)
WDYT @marco-c
Removing the filter seems reasonable to me.
Regarding increasing the expiration date, we could try to increase it but we need to make sure the number of issues doesn't increase too much and makes the DB too slow.
I made the quick patch to remove the filters #2290
I'm also making a postgres backup using heroku tool (last one was 10 months old). That way i'll be able to test the changes locally on real data
I've cleaned up testing and run VACUUM and VACUUM FULL on the testing DB.
Next up is prod, and then we can close this issue.
I've cleaned up production and run VACUUM and VACUUM FULL on the production DB.