fleet icon indicating copy to clipboard operation
fleet copied to clipboard

Bug? http: TLS handshake error from ... local error: tls: bad record MAC

Open benatsb opened this issue 3 years ago • 14 comments

Fleet version: 4.15.0

Operating system: Windows 11

Web browser: Edge and Chrome (latest)


🧑‍💻  Expected behavior

Build a windows agent, deploy the windows agent, connect to the Fleet server with no errors.

💥  Actual behavior

Windows 11 device will connect to the Fleet server, but only after I build the agent using the "insecure" flag. The server logs show the following:

level=info ts=2022-06-03T18:56:16.880740187Z component=http path=/api/latest/fleet/device/e7c7dac7-df3e-41dc-91ff-83d6317d2b40 internal="authentication error: invalid device authentication token" err=": Authentication required"
2022/06/03 18:57:11 http: TLS handshake error from local_ip:46872: local error: tls: bad record MAC
2022/06/03 18:59:12 http: TLS handshake error from local_ip:46878: local error: tls: bad record MAC
2022/06/03 18:59:12 http: TLS handshake error from local_ip:46882: local error: tls: bad record MAC
2022/06/03 18:59:12 http: TLS handshake error from local_ip:46884: local error: tls: bad record MAC
2022/06/03 18:59:13 http: TLS handshake error from local_ip:46886: local error: tls: bad record MAC
2022/06/03 18:59:13 http: TLS handshake error from local_ip:46890: local error: tls: bad record MAC
2022/06/03 18:59:13 http: TLS handshake error from local_ip:46892: local error: tls: bad record MAC
2022/06/03 18:59:21 http: TLS handshake error from local_ip:46894: local error: tls: bad record MAC
2022/06/03 18:59:21 http: TLS handshake error from local_ip:46896: local error: tls: bad record MAC
2022/06/03 18:59:22 http: TLS handshake error from local_ip:46898: local error: tls: bad record MAC
2022/06/03 18:59:22 http: TLS handshake error from local_ip:46900: local error: tls: bad record MAC
2022/06/03 18:59:26 http: TLS handshake error from local_ip:46902: local error: tls: bad record MAC
2022/06/03 18:59:26 http: TLS handshake error from local_ip:46904: local error: tls: bad record MAC
2022/06/03 18:59:36 http: TLS handshake error from local_ip:46906: local error: tls: bad record MAC
2022/06/03 18:59:36 http: TLS handshake error from local_ip:46910: local error: tls: bad record MAC
2022/06/03 19:01:50 http: TLS handshake error from local_ip:46930: remote error: tls: bad certificate

More info

Fleet server is on a fresh Ubuntu server 22.04 machine. I used certbot and the "certonly" module there to generate a LetsEncrypt certificate for the server. Copied the certificates over to the fleet installation directory at /etc/fleetdm/

Set permissions for the .key to 600.

Running server for testing with /etc/fleetdm/fleet serve --config /etc/fleetdm/fleet.yml

fleet.yml

mysql:
  address: 127.0.0.1:3306
  database: fleet
  username: fleetadmin
  password: 'password'
redis:
  address: 127.0.0.1:6379
server:
  address: 0.0.0.0:443
  tls_compatibility: modern
  cert: /etc/fleetdm/server.cert
  key: /etc/fleetdm/server.key
  keepalive: true
logging:
  json: true
vulnerabilities:
  current_instance_checks: yes
  databases_path: /etc/fleetdm/vulns
  periodicity: 1h
  #https://nvd.nist.gov/vuln/data-feeds
  #cve_database_url:
logging:
    error_retention_period: 168h
osquery:
    detail_update_interval: 30m
    status_log_plugin: filesystem
#filesystem:
#    status_log_file: /var/log/osquery/status.log
#    result_log_file: /var/log/osquery/result.log
#    enable_log_rotation: true

Built the installer for Windows using the 4.15.0 fleetctl on the same Windows Machine with no osquery or orbit installed. Docker is installed though.

.\fleetctl.exe package --type=msi --fleet-desktop --fleet-url=https://fleettest --enroll-secret=SECRET --insecure

I tried without the "--insecure" flag but that never connected. After a reboot and installing the package with the flag it connects, but error for TLS still occurs server side.

benatsb avatar Jun 03 '22 19:06 benatsb

Hey @benatsb sorry you're experiencing this issue.

I'm brining this issue to the Fleet team. This way, the team can provide follow up questions and potential next steps to resolve the issue.

noahtalerman avatar Jun 07 '22 15:06 noahtalerman

Hey @benatsb the following "Why aren't my osquery agents connecting to Fleet?" section of the docs includes a "Common problems" section: https://fleetdm.com/docs/deploying/faq#why-arent-my-osquery-agents-connecting-to-fleet

bad record MAC: When generating your certificate for your Fleet server, ensure you set the hostname to the FQDN or the IP of the server. This error is common when setting up Fleet servers and accepting defaults when generating certificates using openssl.

I pulled the above from the docs because it looks like you're seeing bad record MAC entries in your logs.

Please let me know if these instructions don't help in successfully resolving your issue.

noahtalerman avatar Jun 07 '22 15:06 noahtalerman

@benatsb I'm going to close this issue for now. If you are still encountering issues please feel free to re-open this ticket with any new information about the problem. Thank you!

xpkoala avatar Aug 12 '22 20:08 xpkoala

hi, i am confronting the same problems in this thread

SERVER centos stream 9 fleet version 4.38.1

CLIENTS macOS 13 Ventura + 12 Monterey

Certificate von Let´sEncrypt renewed with Dehydrated

Browsers: Firefox 115 ESR + Chrome 117

my client repeated this logs:

W1025 15:16:03.459451 1334582912 tls_enroll.cpp:101] Failed enrollment request to https://my-fleet-server.com:8080/api/v1/osquery/enroll (Request error: certificate verify failed) retrying...

and my Server this:

Oct 25 15:18:38 my-fleet-server fleet[1062]: 2023/10/25 15:18:38 http: TLS handshake error from 129.13.171.194:50805: local error: tls: bad record MAC

Out of all Logs, my fleet client run and is showed in fleet server site, but only the hostname and serialnumber, no more. For this short time the client shine online, after go Offline an no more sucedeed.

grafik

Last fetched almost 54 years ago (that is a lot of time!)

If i turn the client "add host" command with --insecure, all run right. But the logs in server are still present.

xastherion avatar Oct 25 '23 13:10 xastherion

I have encountered this issue with Windows clients while setting up a testing environment based on Ubuntu 22.04 LTS and fleetdm version 4.49.2 and following (rather translating) the installation guide for CentOS. One aspect that made my deployment special was the fact that I utilized a TLS certificate issued by a particular internal certification authority belonging to a public key infrastructure dedicated to testing purposes. While I maintained proper full chain certificates and keys on the server side, I experienced these issues in the server log referring to client side TLS validation errors right after client installation and indefinitely ongoing, all whilst the clients had been registered but were displayed as "offline". Thus I took a deeper look at the installed Orbit client and found out that in the client files' root directory there is an accumulation of Base64 coded root CA certificates, called "certs.pem" and comment-titled "Bundle of CA Root Certificates" from Mozilla.

This said, I made the experiment inserting my own CA certificate into this file, restarted the Orbit client and suddenly the error was no longer present in the logs and the client was being displayed as "online" in the web UI. Data could be fetched, so far no functional restrictions in terms of the free version. This said - I think that the Orbit client does not fetch any custom CA that might be installed system-wide in any valid way - thus far, I can only speculate that on Windows devices, the CA certificate being installed in the Windows machine wide cryptstore.

One could speculate that this might also happen while utilizing self signed certificates.

@noahtalerman I have some followup questions:

  1. Is this expected behaviour? Is there any workaround or fix?
  2. Is there some configuration option on the client side that would be more appropriate aside of certs.pem?
  3. If not, can clients, that are built by fleetctl, be configured to automatically involve custom CA's in their configuration? I have thus far not found a parameter for this.

N0rthg4t3 avatar May 03 '24 14:05 N0rthg4t3

Was able to reproduce it - this time with a TLS certificate that should be publicy trusted through validatable intermediate CA's, however, until the root CA and all intermediate CA's were added to the certificate on the server (effectively full-chaining it) OR the client's cert.pem file, the error persisted.

N0rthg4t3 avatar May 06 '24 15:05 N0rthg4t3

Thanks @N0rthg4t3!

Heads up @xpkoala, re-opening this issue now that we have a lead on repro.

noahtalerman avatar May 07 '24 20:05 noahtalerman

Thanks! It's on my radar.

xpkoala avatar May 07 '24 21:05 xpkoala

Estimating at 5 as it may be hard to reproduce.

sharon-fdm avatar May 13 '24 17:05 sharon-fdm

@N0rthg4t3 Can you give details on full-chaining the server certificate? Does order matter? I'm currently using a certificate file that contains the actual server certificate, then the intermediates below it and the root cert at the very bottom. However, I'm still seeing the errors. Do I maybe need to call fleet prepare with a specific argument in order for Fleet to accept it?

DasFaultier avatar May 15 '24 11:05 DasFaultier

@DasFaultier When full chaining certificates order does matter, specifically where to put the server certificate and where the intermediate and root CA certificates. Depending on the system, I have been fine by adhering to the order Server Certificate > Intermediate 1 > Intermediate 2 > [..,] > Root CA certificate. And at least that is what I understand of RFC5280, detailing the profile of X.509, section 3.2 (https://datatracker.ietf.org/doc/html/rfc5280#section-3.2).

N0rthg4t3 avatar May 15 '24 12:05 N0rthg4t3

@xpkoala I unassigned you so we do not miss this when we have capacity. Still need to reproduce.

sharon-fdm avatar May 15 '24 21:05 sharon-fdm

Hi folks!

Was able to reproduce it - this time with a TLS certificate that should be publicy trusted through validatable intermediate CA's, however, until the root CA and all intermediate CA's were added to the certificate on the server (effectively full-chaining it) OR the client's cert.pem file, the error persisted.

I performed the following tests with fake certificates and can confirm the above.

Tests

Dummy test certificates:

  • CA root (ca.cert.pem)
  • intermediate
  • leaf server certificate
  • root+intermediate bundle (ca-chain.cert.pem)

They were generated using the following guide.

Using fullchain in Fleet server and root CA only client side

  • curl connect to Fleet with --cacert set to the ca.cert.pem
  • built fleetd using fleetctl package --fleet-certificate=ca.cert.pem

Using fullchain in Fleet server and root+intermediate bundle client side

  • curl connect to Fleet with --cacert set to the ca-chain.cert.pem
  • built fleetd using fleetctl package --fleet-certificate=ca-chain.cert.pem

Using leaf cert in Fleet server and root+intermediate bundle client side

  • curl connect to Fleet with --cacert set to the ca-chain.cert.pem
  • built fleetd using fleetctl package --fleet-certificate=ca-chain.cert.pem

Using leaf cert + intermediate bundle in Fleet server and root CA only client side

  • curl connect to Fleet with --cacert set to the ca.cert.pem
  • built fleetd using fleetctl package --fleet-certificate=ca.cert.pem

Using leaf cert in Fleet server and root CA only client side

  • curl connect to Fleet with --cacert set to the ca.cert.pem
  • built fleetd using fleetctl package --fleet-certificate=ca.cert.pem ❌ The errors were of the following form server side:
2024/07/05 15:03:52 http: TLS handshake error from 127.0.0.1:55182: remote error: tls: bad certificate
2024/07/05 15:03:53 http: TLS handshake error from 127.0.0.1:55183: local error: tls: bad record MAC

and client side:

586 2024-07-05T15:04:52-03:00 DBG get config error="POST /api/fleet/orbit/config: Post \"https://fleet.example.com/api/fleet/orbit/config\": tls: failed to verify certificate: x509: certificate signed by unknown authority"
[...]
W0705 15:16:44.739495 1251102656 init.cpp:760] Error reading config: Request error: certificate verify failed

This is expected to fail because fleetd/osquery doesn't know of the intermediate certificate so it requires the server to send it.

Next steps

  1. Document that root CA + intermediates must be present in the bundled certificate in fleetd. A default bundle is embedded in fleetctl (when built) and may not contain intermediate certificates present in your server certificate.
  2. Discuss with product team if we can do a TLS connection check to the provided --fleet-url using the certificate (default or provided) during the fleetctl package execution. This will help everyone catch issues during package generation instead of during deploy. We have an existing command fleetctl debug connection to do connection checks to a Fleet URL: , but users may now be aware of it (e.g. fleetctl debug connection --fleet-certificate /opt/orbit/certs.pem https://fleet.example.com). /cc @noahtalerman @rachaelshaw.

For (2) I've created https://github.com/fleetdm/fleet/issues/20142.

lucasmrod avatar Jul 01 '24 22:07 lucasmrod

I forgot to thank @N0rthg4t3 for your feedback here! (it helped me reproduce the issue)

lucasmrod avatar Jul 01 '24 22:07 lucasmrod

@xpkoala @PezHub I've added QA notes to the description.

lucasmrod avatar Jul 11 '24 17:07 lucasmrod

The above scenarios were run with the certs provided and I received the expected success / fail states outlined in the steps.

xpkoala avatar Jul 15 '24 16:07 xpkoala

In a secure cloud city, TLS handshake finds harmony, Fleet's code, more trustworthy.

fleet-release avatar Jul 17 '24 23:07 fleet-release