activity icon indicating copy to clipboard operation
activity copied to clipboard

Expiration background job for activities is not using an index

Open nickvergessen opened this issue 4 years ago • 10 comments

The query is quite heavy and there is no index on the timestamp column alone, so the delete is not using any index and therefor takes quite some time:


MariaDB [oc]> EXPLAIN DELETE FROM `oc_activity` WHERE `timestamp` < 1234567 
    -> ;
+------+-------------+-------------+------+---------------+------+---------+------+---------+-------------+
| id   | select_type | table       | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
+------+-------------+-------------+------+---------------+------+---------+------+---------+-------------+
|    1 | SIMPLE      | oc_activity | ALL  | NULL          | NULL | NULL    | NULL | 6673188 | Using where |
+------+-------------+-------------+------+---------------+------+---------+------+---------+-------------+


nickvergessen avatar Oct 25 '21 10:10 nickvergessen

timestamp isn't an index but that would speed-up the housekeeping cron job for activities. Potentially check back with Joas.

AndyScherzinger avatar Nov 04 '21 08:11 AndyScherzinger

I run on kubernetes. And I don't necessarily keep or analyses logs from cron jobs.

So, I get the monitoring telling me the database is going to reach 5GB soon. This instance has a bit more than 40 members. And the activity table is 3.6GB.

I changed the variable activity_expire_days to 60 days without much effect then. I guess it times out, silently for me.

I deleted manually some entries:

delete from activity where timestamp < 1627779661

It was a lot of trial error to pick the right timestamp, as the database would timeout.

I created the index:

CREATE INDEX activity_timestamp ON public.activity(timestamp);

Now these queries:

select count(*) from activity where timestamp < 1654045261 ;
  count  
---------
 2104123
(1 row)

Are a lot faster.

The last activity in the database is currently from this date: Wednesday, December 1, 2021 7:33:21 AM

I hope the cron will work this weekend so that I can report back.

pierreozoux avatar Jul 29 '22 12:07 pierreozoux

It worked, so we just need to add an index to fix it :)

Do you manage these files manually? Or do you have some cli to template them? I could try to PR.

pierreozoux avatar Aug 02 '22 15:08 pierreozoux

Mind to share how big the index is and how big your table is?

nickvergessen avatar Aug 02 '22 15:08 nickvergessen

From what I understand, postgres doesn't reclaim space automatically.

So when I run the commands, I got: (I think first command returns table size, and second, table size plus all the indexes)

nextcloud=# SELECT pg_size_pretty(pg_relation_size('activity'));
 pg_size_pretty 
----------------
 2173 MB
(1 row)

nextcloud=# SELECT pg_size_pretty( pg_total_relation_size('activity') );
 pg_size_pretty 
----------------
 3337 MB
(1 row)

From 1st December until now, so pretty much 8 months.

Then I decided to reclaim the space:

vacuum full;

And then the result was:

nextcloud=# SELECT pg_size_pretty(pg_relation_size('activity'));
 pg_size_pretty 
----------------
 241 MB
(1 row)

nextcloud=# SELECT pg_size_pretty( pg_total_relation_size('activity') );
 pg_size_pretty 
----------------
 323 MB
(1 row)

is it enough info? If you need more, let me know!

pierreozoux avatar Aug 02 '22 15:08 pierreozoux

On an other Nextcloud instance: Before and after the index creation:

nextcloud=# SELECT pg_size_pretty(pg_relation_size('activity'));
 pg_size_pretty 
----------------
 2203 MB
(1 row)

nextcloud=# SELECT pg_size_pretty( pg_total_relation_size('activity') );
 pg_size_pretty 
----------------
 3150 MB
(1 row)

nextcloud=# CREATE INDEX activity_timestamp ON public.activity(timestamp);
CREATE INDEX
nextcloud=# SELECT pg_size_pretty( pg_total_relation_size('activity') );
 pg_size_pretty 
----------------
 3158 MB
(1 row)

so the index in this case takes 8MB.

pierreozoux avatar Aug 03 '22 15:08 pierreozoux

Okay, that sounds acceptable. We have a performance hackweek in 10 days. I will bring it up there and discuss it.

nickvergessen avatar Aug 04 '22 17:08 nickvergessen

How can I add this index in a migration? I'm willing to PR it, but I've read the files, without much success, if you could help a bit, I think I can PR :)

pierreozoux avatar Sep 19 '22 13:09 pierreozoux

This needs to be done via the db:add-missing-indices command in the server, because otherwise there is a big issue on the upgrade.

nickvergessen avatar Sep 19 '22 13:09 nickvergessen

With truncate, you don't need to vacuum afterward \o/

https://www.postgresql.org/docs/9.1/sql-truncate.html

truncate activity;

pierreozoux avatar Nov 20 '23 16:11 pierreozoux