code-review icon indicating copy to clipboard operation
code-review copied to clipboard

Schedule cron job on Heroku to remove issues older than a year

Open marco-c opened this issue 2 years ago • 2 comments

We could run it weekly. The command to run is ./manage.py cleanup_issues --nb-days=365.

https://devcenter.heroku.com/articles/scheduler

marco-c avatar Feb 09 '23 15:02 marco-c

Why don't we do this as a stored procedure into the pgsql db? W could use https://github.com/citusdata/pg_cron as an extension to pgsql 10+ and mandate it to run a simple cleanup issue once every week.

abpostelnicu avatar Feb 09 '23 16:02 abpostelnicu

That might work too, but I don't know if we can install extensions in Heroku's pgsql.

marco-c avatar Feb 10 '23 10:02 marco-c

It looks like this is already done: immagine

We just need to look into the DB to make sure the script is indeed working and old issues are no longer present.

marco-c avatar Jul 23 '24 09:07 marco-c

Service desk pointed me towards the heroku-members admins so I can gain access to the code-review apps running there.

Then it's just a matter of running a query on the db or through the django admin shell

La0 avatar Jul 30 '24 09:07 La0

I got access to the heroku console, and ran a few queries on production

>>> Issue.objects.all().aggregate(min=Min('created'))
{'min': datetime.datetime(2021, 9, 19, 10, 30, 17, 550164, tzinfo=datetime.timezone.utc)}
>>> Issue.objects.count()
75644841

The cleanup is not working as expected: there are issues from 3 years ago in the DB, totalling 75M issues.

La0 avatar Jul 31 '24 08:07 La0

This explains the bad performance of the DB when querying or inserting issues that @Archaeopteryx noticed.

marco-c avatar Jul 31 '24 08:07 marco-c

Probably a bug in the cleanup code: when running the command from the scheduler, I get the following

Didn't find any old revision to delete.
INFO:code_review_backend.issues.management.commands.cleanup_issues:Didn't find any old revision to delete.

La0 avatar Jul 31 '24 08:07 La0

The cleanup code only works on revision based and headed on autoland & mozilla-central.

I think this filter is irrevelant now, we can simply remove it and run the cleanup on all revisions.

BTW the current configuration cleans all revisions older than 30 days: we should also increase that to a year (or configure the scheduler)

WDYT @marco-c

La0 avatar Jul 31 '24 09:07 La0

Removing the filter seems reasonable to me.

Regarding increasing the expiration date, we could try to increase it but we need to make sure the number of issues doesn't increase too much and makes the DB too slow.

marco-c avatar Jul 31 '24 10:07 marco-c

I made the quick patch to remove the filters #2290

I'm also making a postgres backup using heroku tool (last one was 10 months old). That way i'll be able to test the changes locally on real data

La0 avatar Jul 31 '24 12:07 La0

I've cleaned up testing and run VACUUM and VACUUM FULL on the testing DB.

Next up is prod, and then we can close this issue.

marco-c avatar Aug 05 '24 09:08 marco-c

I've cleaned up production and run VACUUM and VACUUM FULL on the production DB.

marco-c avatar Aug 06 '24 10:08 marco-c