plugins icon indicating copy to clipboard operation
plugins copied to clipboard

acme-client: haproxy still serves old cert after renewal

Open aque opened this issue 1 year ago • 4 comments

Important notices Before you add a new report, we ask you kindly to acknowledge the following:

  • [x] I have read the contributing guide lines at https://github.com/opnsense/plugins/blob/master/CONTRIBUTING.md
  • [x] I have searched the existing issues, open and closed, and I'm convinced that mine is new.
  • [x] The title contains the plugin to which this issue belongs

Describe the bug HAProxy continues to serve old Let's Encrypt certificates after it was renewed and updated. ACME Client > Automations has the "Restart HAProxy (OPNsense plugin)" configured and enabled. Restarting HAProxy manually thru the GUI loads the updated certificates.

To Reproduce Steps to reproduce the behavior:

  1. Configure HAProxy to serve Let's Encrypt certificates
  2. Configure ACME Client > Automations to restart HAProxy on renewals
  3. Wait for certificate to renew
  4. Note the certificate served by HAProxy

Expected behavior ACME Client automation that restarts HAProxy loads the updated certificate.

Additional context

  1. I confirmed that ACME client has the new certificate in the /var/etc/acme-client/cert-home/6029e36f2fa175.26395931/host.domain.tld_ecc/host.domain.tld.cer file.
  2. I also confirmed this same new certificate is one of the files listed in /tmp/haproxy/ssl/60294d9f6fa932.93592251.certlist. (same serial number)
  3. openssl s_client confirms that HAProxy still serves the older certificate.
  4. I suspect that HAProxy needs a full restart instead of a SIGHUP.

Environment OPNsense 24.7.2-amd64 FreeBSD 14.1-RELEASE-p3 OpenSSL 3.0.14

aque avatar Aug 27 '24 10:08 aque

I found the following entry in /var/log/acmeclient/acmeclient_20240825.log:

<14>1 2024-08-25T02:38:14-05:00 gw acme.sh 5916 - [meta sequenceId="43"] [Sun Aug 25 02:38:14 CDT 2024] Your cert is in: /var/etc/acme-client/cert-home/6029e36f2fa175.26395931/host.domain.tld_ecc/host.domain.tld.cer

Below is from /var/log/haproxy/haproxy_20240825.log around that time. /tmp/haproxy/ssl/6029e37fc87ca.pem is the problem certificate for that day.

<134>1 2024-08-25T02:38:22-05:00 gw haproxy 71632 - [meta sequenceId="20"] -:- [25/Aug/2024:02:38:22.029] <OCSP-UPDATE> /tmp/haproxy/ssl/60295e5d400a1.pem 1 "Update successful" 0 1
<134>1 2024-08-25T02:38:22-05:00 gw haproxy 71632 - [meta sequenceId="21"] -:- [25/Aug/2024:02:38:22.128] <OCSP-UPDATE> /tmp/haproxy/ssl/6029e37fc87ca.pem 1 "Update successful" 0 1

I also get the same OCSP-UPDATE lines on a manual haproxy restart. At this point, I do not know the difference between "Restart HAProxy (OPNsense plugin)" cron job versus a manual restart. If there is no difference, I wonder if sync is needed to flush pending disk writes before haproxy is restarted.

aque avatar Aug 29 '24 12:08 aque

Well, I'd suggest inspecting Settings - Service options, reading the help there and experimenting with those options.

doktornotor avatar Aug 29 '24 20:08 doktornotor

I went through the HAProxy settings. I think these are the relevant settings to this issue, and below are their values. I do not see anything that would cache the certificate.

I can also try manually running the automation next time this happens again, but I don't know where to find that. I found /usr/local/opnsense/scripts/OPNsense/HAProxy/syncCerts.py sync --output json in Cron actions_haproxy.conf but I am not sure it is equivalent to the acme automation. I can try that next time though to see if there is a difference.

Service

  • Graceful stop: unchecked
  • Seamless reload: unchecked

Global settings > SSL settings - I believe these are the defaults.

  • Automatic OCSP updates: checked
  • Minimum OCSP Interval: 300
  • Maximum OCSP Interval: 3600

Cache

  • Cache enabled: unchecked

aque avatar Aug 30 '24 20:08 aque

I see the same behaviour - the automation does not restart HAproxy properly.

It seems that actions_haproxy.conf has both reload and restart options, and in the ACME configuration, only a HAproxy restart is offered. However I am not so sure that the very action from the cron definitions is taken when the automations run, because the cron action has slightly different wording from the ACME automation.

The code doing this is rather complex with interfaces, factories and all the good stuff that makes things clearer (tm)...

meyergru avatar Oct 17 '24 22:10 meyergru

I believe I know what is happening. The certificate updated this morning and I have not restarted yet - still on 24.7.10_2. Here are my supporting documentations.

Acme Logs show when the cert was updated, and I confirmed the file had today's date:

<14>1 2024-12-23T02:38:06-06:00 gw acme.sh 99011 - [meta sequenceId="19"] [Mon Dec 23 02:38:06 CST 2024] Installing cert to: /var/etc/acme-client/certs/6029e36f2fa175.26395931/cert.pem

# openssl x509 -noout -in /var/etc/acme-client/certs/6029e36f2fa175.26395931/cert.pem -serial -dates -subject
serial=038121C688432012BF2BFDED363F6DEAACF9
notBefore=Dec 23 07:39:35 2024 GMT
notAfter=Mar 23 07:39:34 2025 GMT
subject=CN = host.domain.tld

I also confirmed HAProxy has the updated certificate, and there are no pending HAProxy cert syncs. Uptime was less than 1 day so the script properly restarted the daemon with /usr/local/etc/rc.d/haproxy hardstop using SIGTERM.

# openssl x509 -noout -in /tmp/haproxy/ssl/6029e37fc87ca.pem -serial -dates -subject
serial=038121C688432012BF2BFDED363F6DEAACF9
notBefore=Dec 23 07:39:35 2024 GMT
notAfter=Mar 23 07:39:34 2025 GMT
subject=CN = host.domain.tld

# /usr/local/opnsense/scripts/OPNsense/HAProxy/syncCerts.py sync --output json
{"modified": [], "deleted": [], "add_count": 0, "remove_count": 0, "update_count": 0, "del_count": 0}

# echo 'show info' | socat stdio /var/run/haproxy.socket | grep Uptime
Uptime: 0d 13h23m47s
Uptime_sec: 48227

However, HAProxy reports it is using the older cert.

# echo 'show ssl cert /tmp/haproxy/ssl/6029e37fc87ca.pem' | socat stdio /var/run/haproxy.socket
Filename: /tmp/haproxy/ssl/6029e37fc87ca.pem
Status: Used
Serial: 037BA494BD21DC21034C0941EB7DE0655CF3
notBefore: Oct 24 06:39:56 2024 GMT
notAfter: Jan 22 06:39:55 2025 GMT
Subject Alternative Name: DNS:host.domain.tld
Algorithm: EC384
SHA1 FingerPrint: [deleted]
Subject: /CN=host.domain.tld
Issuer: /C=US/O=Let's Encrypt/CN=E6
Chain Subject: /C=US/O=Let's Encrypt/CN=E6
Chain Issuer: /C=US/O=Internet Security Research Group/CN=ISRG Root X1
OCSP Response Key: [deleted]

The system log shows AcmeClient: running automation (configd): Restart HAProxy happened on 02:38:06.

<13>1 2024-12-23T02:38:06-06:00 gw opnsense 63863 - [meta sequenceId="9"] AcmeClient: successfully issued/renewed certificate: host.domain.tld
<13>1 2024-12-23T02:38:06-06:00 gw opnsense 63863 - [meta sequenceId="10"] AcmeClient: updated ACME X.509 certificate: host.domain.tld
<173>1 2024-12-23T02:38:06-06:00 gw config 63863 - [meta sequenceId="11"] AcmeClient: running automations for certificate: host.domain.tld
<173>1 2024-12-23T02:38:06-06:00 gw config 63863 - [meta sequenceId="12"] AcmeClient: running automation (configd): Restart HAProxy
<13>1 2024-12-23T02:38:07-06:00 gw configctl 73680 - [meta sequenceId="13"] event @ 1734943086.71 msg: Dec 23 02:38:06 gw config[63863]: config-event: new_config /conf/backup/config-1734943086.6812.xml 
<13>1 2024-12-23T02:38:07-06:00 gw configctl 73680 - [meta sequenceId="14"] event @ 1734943086.71 exec: system event config_changed response: OK  
<13>1 2024-12-23T02:38:13-06:00 gw configctl 73680 - [meta sequenceId="15"] event @ 1734943093.42 msg: Dec 23 02:38:13 gw config[63863]: config-event: new_config /conf/backup/config-1734943093.3932.xml 
<13>1 2024-12-23T02:38:13-06:00 gw configctl 73680 - [meta sequenceId="16"] event @ 1734943093.42 exec: system event config_changed response: OK  

But file mtime and ctime both show the new certificate was written to disk after HAProxy restarted. btime is my last reboot so I believe this file is regenerated on every boot.

# stat -f '%N%nbtime: %SB%nctime: %Sc%nmtime: %Sm' /tmp/haproxy/ssl/6029e37fc87ca.pem
/tmp/haproxy/ssl/6029e37fc87ca.pem
btime: Dec  5 17:48:35 2024
ctime: Dec 23 02:38:13 2024
mtime: Dec 23 02:38:13 2024

There is maybe a race condition, or flushing dirty buffers to disk is needed before restarting HAProxy. We can test the latter by adding sync before the hardstop line in /usr/local/opnsense/scripts/OPNsense/HAProxy/rc-wrapper.sh. I plan on doing that after my upgrade to 24.7.11_2.

aque avatar Dec 23 '24 22:12 aque

I have a second cert served by HAProxy and flushing disk cache did not fix the problem. Both cert file ctime and mtime are after HAProxy was restarted.

<173>1 2024-12-26T02:38:07-06:00 gw config 18411 - [meta sequenceId="11"] AcmeClient: running automation (configd): Restart HAProxy

# echo 'show info' | socat stdio /var/run/haproxy.socket | grep Uptime
Uptime: 0d 6h19m08s
Uptime_sec: 22748

# stat -f '%N%nbtime: %SB%nctime: %Sc%nmtime: %Sm' /tmp/haproxy/ssl/60295e5d400a1.pem
/tmp/haproxy/ssl/60295e5d400a1.pem
btime: Dec 24 01:49:20 2024
ctime: Dec 26 02:38:23 2024
mtime: Dec 26 02:38:23 2024

# echo 'show ssl cert /tmp/haproxy/ssl/60295e5d400a1.pem' | socat stdio /var/run/haproxy.socket
Filename: /tmp/haproxy/ssl/60295e5d400a1.pem
Status: Used
Serial: 03BDFF1BE4E7D378CEF7FEBA93CF7D9F2BF3
notBefore: Oct 26 15:02:02 2024 GMT
notAfter: Jan 24 15:02:01 2025 GMT
Subject Alternative Name: DNS:host.domain.tld
Algorithm: EC384
SHA1 FingerPrint: [deleted]
Subject: /CN=host.domain.tld
Issuer: /C=US/O=Let's Encrypt/CN=E6
Chain Subject: /C=US/O=Let's Encrypt/CN=E6
Chain Issuer: /C=US/O=Internet Security Research Group/CN=ISRG Root X1
OCSP Response Key: [deleted]

aque avatar Dec 26 '24 15:12 aque

@meyergru are you still having issues? Do you see a similar error in /var/log/configd/*.log? This might be related.

<13>1 2024-12-21T02:42:00-06:00 gw configd.py 432 - [meta sequenceId="6593"] [2c5d1499-3214-435a-85d1-5fb1d5f73a7a] Sync ssl certificates into HAProxy memory for all frontends
<13>1 2024-12-21T02:42:00-06:00 gw configd.py 432 - [meta sequenceId="6594"] [da07d1d4-70dc-4933-be24-ae159dc42743] generate template OPNsense/HAProxy
<11>1 2024-12-21T02:42:00-06:00 gw configd.py 432 - [meta sequenceId="6595"] [da07d1d4-70dc-4933-be24-ae159dc42743] Inline action failed with not all arguments converted during string formatting at
 Traceback (most recent call last):
   File "/usr/local/opnsense/service/modules/actions/inline.py", line 44, in execute
     act_parameters = self.parameters % tuple(parameters)
                      ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~
 TypeError: not all arguments converted during string formatting 
<14>1 2024-12-21T02:42:00-06:00 gw configd.py 432 - [meta sequenceId="6596"] message 2c5d1499-3214-435a-85d1-5fb1d5f73a7a [] returned b'{"modified": [], "deleted": [], "add_count": 0, "remove_count": 0, "update_count": 0, "del_count": 0' 

aque avatar Jan 09 '25 22:01 aque

I did not check in a while. It does not really matter because I restart my box quite often.

meyergru avatar Jan 09 '25 22:01 meyergru

This issue has been automatically timed-out (after 180 days of inactivity).

For more information about the policies for this repository, please read https://github.com/opnsense/plugins/blob/master/CONTRIBUTING.md for further details.

If someone wants to step up and work on this issue, just let us know, so we can reopen the issue and assign an owner to it.

OPNsense-bot avatar Feb 23 '25 10:02 OPNsense-bot