activity Expiration background job for activities is not using an index

The query is quite heavy and there is no index on the timestamp column alone, so the delete is not using any index and therefor takes quite some time:


MariaDB [oc]> EXPLAIN DELETE FROM `oc_activity` WHERE `timestamp` < 1234567 
    -> ;
+------+-------------+-------------+------+---------------+------+---------+------+---------+-------------+
| id   | select_type | table       | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
+------+-------------+-------------+------+---------------+------+---------+------+---------+-------------+
|    1 | SIMPLE      | oc_activity | ALL  | NULL          | NULL | NULL    | NULL | 6673188 | Using where |
+------+-------------+-------------+------+---------------+------+---------+------+---------+-------------+

Oct 25 '21 10:10 nickvergessen

timestamp isn't an index but that would speed-up the housekeeping cron job for activities. Potentially check back with Joas.

Nov 04 '21 08:11 AndyScherzinger

I run on kubernetes. And I don't necessarily keep or analyses logs from cron jobs.

So, I get the monitoring telling me the database is going to reach 5GB soon. This instance has a bit more than 40 members. And the activity table is 3.6GB.

I changed the variable activity_expire_days to 60 days without much effect then. I guess it times out, silently for me.

I deleted manually some entries:

delete from activity where timestamp < 1627779661

It was a lot of trial error to pick the right timestamp, as the database would timeout.

I created the index:

CREATE INDEX activity_timestamp ON public.activity(timestamp);

Now these queries:

select count(*) from activity where timestamp < 1654045261 ;
  count  
---------
 2104123
(1 row)

Are a lot faster.

The last activity in the database is currently from this date: Wednesday, December 1, 2021 7:33:21 AM

I hope the cron will work this weekend so that I can report back.

Jul 29 '22 12:07 pierreozoux

It worked, so we just need to add an index to fix it :)

Do you manage these files manually? Or do you have some cli to template them? I could try to PR.

Aug 02 '22 15:08 pierreozoux

Mind to share how big the index is and how big your table is?

Aug 02 '22 15:08 nickvergessen

From what I understand, postgres doesn't reclaim space automatically.

So when I run the commands, I got: (I think first command returns table size, and second, table size plus all the indexes)

nextcloud=# SELECT pg_size_pretty(pg_relation_size('activity'));
 pg_size_pretty 
----------------
 2173 MB
(1 row)

nextcloud=# SELECT pg_size_pretty( pg_total_relation_size('activity') );
 pg_size_pretty 
----------------
 3337 MB
(1 row)

From 1st December until now, so pretty much 8 months.

Then I decided to reclaim the space:

vacuum full;

And then the result was:

nextcloud=# SELECT pg_size_pretty(pg_relation_size('activity'));
 pg_size_pretty 
----------------
 241 MB
(1 row)

nextcloud=# SELECT pg_size_pretty( pg_total_relation_size('activity') );
 pg_size_pretty 
----------------
 323 MB
(1 row)

is it enough info? If you need more, let me know!

Aug 02 '22 15:08 pierreozoux

On an other Nextcloud instance: Before and after the index creation:

nextcloud=# SELECT pg_size_pretty(pg_relation_size('activity'));
 pg_size_pretty 
----------------
 2203 MB
(1 row)

nextcloud=# SELECT pg_size_pretty( pg_total_relation_size('activity') );
 pg_size_pretty 
----------------
 3150 MB
(1 row)

nextcloud=# CREATE INDEX activity_timestamp ON public.activity(timestamp);
CREATE INDEX
nextcloud=# SELECT pg_size_pretty( pg_total_relation_size('activity') );
 pg_size_pretty 
----------------
 3158 MB
(1 row)

so the index in this case takes 8MB.

Aug 03 '22 15:08 pierreozoux

Okay, that sounds acceptable. We have a performance hackweek in 10 days. I will bring it up there and discuss it.

Aug 04 '22 17:08 nickvergessen

How can I add this index in a migration? I'm willing to PR it, but I've read the files, without much success, if you could help a bit, I think I can PR :)

Sep 19 '22 13:09 pierreozoux

This needs to be done via the db:add-missing-indices command in the server, because otherwise there is a big issue on the upgrade.

Sep 19 '22 13:09 nickvergessen

With truncate, you don't need to vacuum afterward \o/

https://www.postgresql.org/docs/9.1/sql-truncate.html

truncate activity;

Nov 20 '23 16:11 pierreozoux