pg_cron icon indicating copy to clipboard operation
pg_cron copied to clipboard

Segmentation fault error in version 1.6

Open slothyrulez opened this issue 1 year ago • 9 comments

Hi!

We are running postgresql 14.10 on AWS RDS, we have installed recently the pg_cron 1.6 extension, following this guide: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/PostgreSQL_pg_cron.html

We only have one job configured, to launch every day out of peak hours, a VACUUM VERBOSE ANALYZE on the whole database.

I have found we have 2 crashes related to pg_cron:


	2024-07-24T07:18:48.000Z	2024-07-24 07:18:48 UTC::@:[703]:LOG: background worker "pg_cron" (PID 24839) was terminated by signal 11: Segmentation fault
	2024-07-24T07:18:48.000Z	2024-07-24 07:18:48 UTC::@:[703]:DETAIL: Failed process was running: VACUUM VERBOSE ANALYZE
	2024-07-24T07:18:48.000Z	2024-07-24 07:18:48 UTC::@:[703]:LOG: terminating any other active server processes /etc/rds/dbbin/pgscripts/rds_wal_archive: line 29: 29367 Quit (core dumped) usleep 50000
	2024-07-24T07:18:48.000Z	2024-07-24 07:18:48 UTC::@:[2788]:FATAL: archive command failed with exit code 131
	2024-07-24T07:18:48.000Z	2024-07-24 07:18:48 UTC::@:[2788]:DETAIL: The failed archive command was: /etc/rds/dbbin/pgscripts/rds_wal_archive pg_wal/0000000100004C6600000027 skipTimeCheckForWalSegmentAndReadyFile
	2024-07-24T07:18:48.000Z	2024-07-24 07:18:48 UTC::@:[703]:LOG: all server processes terminated; reinitializing
	2024-07-24T07:18:48.000Z	2024-07-24 07:18:48 UTC::@:[29368]:LOG: database system was interrupted; last known up at 2024-07-24 07:16:46 UTC
	2024-07-24T07:18:48.000Z	2024-07-24 07:18:48 UTC:10.2.0.249(58010):rdsrepladmin@[unknown]:[29369]:FATAL: the database system is in recovery mode

and

2024-07-21 07:18:59 UTC::@:[723]:LOG: cron job 1: INFO: vacuuming "pg_toast.pg_toast_5855542"
	2024-07-21T07:18:59.000Z	2024-07-21 07:18:59 UTC::@:[703]:LOG: background worker "pg_cron" (PID 30526) was terminated by signal 11: Segmentation fault
	2024-07-21T07:18:59.000Z	2024-07-21 07:18:59 UTC::@:[703]:DETAIL: Failed process was running: VACUUM VERBOSE ANALYZE
	2024-07-21T07:18:59.000Z	2024-07-21 07:18:59 UTC::@:[703]:LOG: terminating any other active server processes
	2024-07-21T07:18:59.000Z	2024-07-21 07:18:59 UTC::@:[703]:LOG: all server processes terminated; reinitializing
	2024-07-21T07:18:59.000Z	2024-07-21 07:18:59 UTC::@:[2582]:LOG: database system was interrupted; last known up at 2024-07-21 07:18:13 UTC
	2024-07-21T07:18:59.000Z	2024-07-21 07:18:59 UTC:10.2.0.249(39436):rdsrepladmin@[unknown]:[2583]:FATAL: the database system is in recovery mode
	2024-07-21T07:18:59.000Z	2024-07-21 07:18:59 UTC::@:[2582]:LOG: database system was not properly shut down; automatic recovery in progress
	2024-07-21T07:18:59.000Z	2024-07-21 07:18:59 UTC::@:[2582]:LOG: redo starts at 4C55/52048718
	2024-07-21T07:18:59.000Z	2024-07-21 07:18:59 UTC::@:[2582]:LOG: invalid record length at 4C55/5956D5B0: wanted 24, got 0
	2024-07-21T07:18:59.000Z	2024-07-21 07:18:59 UTC::@:[2582]:LOG: redo done at 4C55/5956D588 system usage: CPU: user: 0.19 s, system: 0.06 s, elapsed: 0.27 s
	2024-07-21T07:18:59.000Z	2024-07-21 07:18:59 UTC::@:[2582]:LOG: checkpoint starting: end-of-recovery immediate
	2024-07-21T07:18:59.000Z	2024-07-21 07:18:59 UTC:172.31.21.40(48308)::[2584]:FATAL: the database system is in recovery mode

Do you have any hints ?

slothyrulez avatar Jul 24 '24 15:07 slothyrulez

That's an interesting one. It's possible that the VACUUM VERBOSE ANALYZE crashes itself (e.g. due to another extension), though I wonder if the amount of notices produced by VERBOSE is causing an issue with the background worker mode used by RDS. Can you try to run without VERBOSE to see whether the issue goes away? (which is a strong indication that pg_cron is indeed the problem)

marcoslot avatar Jul 25 '24 08:07 marcoslot

Ey!, thanks for your time.

Yes, I think is possible that the cause is not pg_cron itself, but VACUUM. Ok, I'll reconfigure the job to run without the VERBOSE, and let you know.

Thanks again.

slothyrulez avatar Jul 25 '24 12:07 slothyrulez

@marcoslot update here: we have been running VACUUM ANALYZE 5 days without any issue for the moment. imagen

slothyrulez avatar Jul 30 '24 14:07 slothyrulez

Checked again and running as expected, I don't know if i can help you somehow.

slothyrulez avatar Aug 02 '24 16:08 slothyrulez

Hi, Me too getting this issue in aurora postgreSQL RDS database. while executing a materialized view via pg_cron.

2024-08-21 06:47:00 UTC::@:[554]:LOG: background worker "pg_cron" (PID 22225) was terminated by signal 11: Segmentation fault
2024-08-21 06:47:00 UTC::@:[554]:DETAIL: Failed process was running: SELECT ingress.refresh_interactions_materialized_view();
2024-08-21 06:47:00 UTC::@:[554]:LOG: terminating any other active server processes
2024-08-21 06:47:00 UTC::@:[554]:FATAL: Aurora Runtime process unexpectedly exited
2024-08-21 06:47:00 UTC::@:[554]:LOG: database system is shut down

When I manually executes this, it will take around 4 minutes to finish.

Thanks.

rinshadka avatar Aug 21 '24 07:08 rinshadka

@bdrouvotAWS do you happen to have any way of getting a stack trace?

marcoslot avatar Aug 22 '24 10:08 marcoslot

@rinshadka @slothyrulez could you please ping me at [email protected]?

bdrouvotAWS avatar Aug 23 '24 04:08 bdrouvotAWS

I'm having a similar issue using postgreSQL RDS

2024-10-02 13:00:00.000,"2024-10-02 13:00:00 UTC::@:[554]:LOG:  background worker ""pg_cron"" (PID 25883) was terminated by signal 11: Segmentation fault",xxx-xxx-xxx-xxxx-xxxxx:/aws/rds/cluster/xxxxxxx/postgresql
2024-10-02 13:00:00.000,"2024-10-02 13:00:00 UTC::@:[554]:DETAIL:  Failed process was running: Select * from dbo.""archiveFormInstances""(2* 370, 25000)",xxxx-xxx-xx-xxxx-xxxx:/aws/rds/cluster/xxxxxxx/postgresql
2024-10-02 13:00:00.000,2024-10-02 13:00:00 UTC::@:[554]:LOG:  terminating any other active server processes,xxxx-xxx-xx-xxxx-xxxx:/aws/rds/cluster/xxxxxxx/postgresql
2024-10-02 13:00:00.000,2024-10-02 13:00:00 UTC::@:[554]:FATAL:  Aurora Runtime process unexpectedly exited,xxxx-xxx-xx-xxxx-xxxx:/aws/rds/cluster/xxxxxxx/postgresql
2024-10-02 13:00:00.000,2024-10-02 13:00:00 UTC::@:[554]:LOG:  database system is shut down,xxxx-xxx-xx-xxxx-xxxx:/aws/rds/cluster/xxxxxxx/postgresql

Anyone had any luck debugging issues like that?

Politta avatar Oct 02 '24 23:10 Politta

I am having similar issue on Postgres 16, pg_cron 1.6 while creating index - it crashes writing node.

2025-07-10 21:27:00 UTC::@:[559]:LOG: background worker "pg_cron" (PID 22493) was terminated by signal 11: Segmentation fault 2025-07-10 21:27:00 UTC::@:[559]:DETAIL: Failed process was running: CREATE INDEX index ... 2025-07-10 21:27:00 UTC::@:[559]:LOG: terminating any other active server processes 2025-07-10 21:27:00 UTC::@:[559]:FATAL: Aurora Runtime process unexpectedly exited 2025-07-10 21:27:00 UTC::@:[559]:LOG: database system is shut down

creating index manually takes about a minute. Any help?

sans17 avatar Jul 10 '25 22:07 sans17