kola: AWS/GCP platforms getting afterburn SSH key when running the coreos.ignition.ssh.key test
If I look at the the logs for the kola AWS and GCP runs for the coreos.ignition.ssh.key test I see in the console output:
ens4: 10.128.0.13 fe80::4001:aff:fe80:d
Ignition: user provided config was applied
Ignition: wrote ssh authorized keys file for user: core
Afterburn: wrote ssh authorized keys file for user: core
Where on openstack (now that I'm running openstack tests) the test is actually failing because the [email protected] systemd unit fails.
Two questions:
-
Should AWS and GCP platforms show
Afterburn: wrote ssh authorized keys file for user: corewhen this test is run? My understanding is "no" because theNoSSHKeyInMetadataflag is passed. -
Should the
[email protected]hard fail on openstack (or ever)? It's valid for a user to not provide an SSH key via the metadata service IIUC.
Here's the failure:
kola --denylist-test fcos.internet --denylist-test podman.workflow --output-dir tmp/kola run -p openstack --openstack-config-file=/srv/clouds.yaml --openstack-flavor v1-standard-4 --openstack-network=private --openstack-floating-ip-network=public --openstack-image fedora-coreos-32.20201101.2.0-openstack.x86_64 coreos.ignition.ssh.key
=== RUN coreos.ignition.ssh.key
--- FAIL: coreos.ignition.ssh.key (121.47s)
harness.go:955: Cluster failed starting machines: machine "768c2c5e-70c2-4883-b452-b753ae6337c7" failed basic checks: some systemd units failed:
[email protected] loaded failed failed Afterburn (SSH Keys)
FAIL, output in tmp/kola
Error: harness: test suite failed
2020-11-04T16:20:34Z cli: harness: test suite failed
In the logs of the machine:
afterburn[984]: Nov 04 16:20:15.362 INFO Fetching http://169.254.169.254/latest/meta-data/public-keys: Attempt #10
afterburn[984]: Nov 04 16:20:15.808 INFO Failed to fetch: 404 Not Found
afterburn[984]: Nov 04 16:20:20.808 INFO Fetching http://169.254.169.254/latest/meta-data/public-keys: Attempt #11
afterburn[984]: Nov 04 16:20:21.025 INFO Failed to fetch: 404 Not Found
afterburn[984]: Error: failed to run
afterburn[984]: Caused by: writing ssh keys
afterburn[984]: Caused by: maximum number of retries (10) reached
afterburn[984]: Caused by: failed to fetch: 404 Not Found
systemd[1]: [email protected]: Main process exited, code=exited, status=1/FAILURE
systemd[1]: [email protected]: Failed with result 'exit-code'.
systemd[1]: Failed to start Afterburn (SSH Keys).
So my understanding of the afterburn code on OpenStack is the following:
- If configdrive exists then it will check for
openstack/latest/meta_data.json, attempt to parse it for thepublic_keysfield adding any present to the SSH keys - If configdrive did not exist it will fall back to the metadata service querying
http://169.254.169.254/latest/meta-data/public-keyswith up to 10 retries. If no SSH keys are provided this endpoint should still return but just be empty.
The failure here seems to be that both the configdrive didn't exist and the metadata service wasn't available for query (or at least that endpoint failed).
cc @lucab for sanity checking
If no SSH keys are provided this endpoint should still return but just be empty.
I think that might be where things are differing, at least in this one case.
That's my understanding too. How does that OpenStack setup looks like? Does it have a config-drive OR a metadata endpoint?
If no SSH keys are provided this endpoint should still return but just be empty.
I think that might be where things are differing, at least in this one case.
I'd be very surprised if the openstack metadata server actually doesn't stand up that endpoint when no SSH keys are provided. Knowing which metadata provider was used for the instance (could check the Ignition logs to see how it grabbed it's config) might be relevant. This could also be just a case of network not coming up fast enough if it is the metadata endpoint.
[core@d3 ~]$ curl http://169.254.169.254/latest/meta-data/public-keys
<html>
<head>
<title>404 Not Found</title>
</head>
<body>
<h1>404 Not Found</h1>
The resource could not be found.<br /><br />
</body>
</html>
[core@d3 ~]$
[core@d3 ~]$
[core@d3 ~]$ systemctl status [email protected]
● [email protected] - Afterburn (SSH Keys)
Loaded: loaded (/usr/lib/systemd/system/[email protected]; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Wed 2020-11-04 17:44:01 UTC; 33s ago
Process: 961 ExecStart=/usr/bin/afterburn ${AFTERBURN_OPT_PROVIDER} --ssh-keys=core (code=exited, status=1/FAILURE)
Main PID: 961 (code=exited, status=1/FAILURE)
Nov 04 17:43:56 d3 afterburn[961]: Nov 04 17:43:56.730 INFO Failed to fetch: 404 Not Found
Nov 04 17:44:01 d3 afterburn[961]: Nov 04 17:44:01.730 INFO Fetching http://169.254.169.254/latest/meta-data/public-keys: Attempt #11
Nov 04 17:44:01 d3 afterburn[961]: Nov 04 17:44:01.939 INFO Failed to fetch: 404 Not Found
Nov 04 17:44:01 d3 afterburn[961]: Error: failed to run
Nov 04 17:44:01 d3 afterburn[961]: Caused by: writing ssh keys
Nov 04 17:44:01 d3 afterburn[961]: Caused by: maximum number of retries (10) reached
Nov 04 17:44:01 d3 afterburn[961]: Caused by: failed to fetch: 404 Not Found
Nov 04 17:44:01 d3 systemd[1]: [email protected]: Main process exited, code=exited, status=1/FAILURE
Nov 04 17:44:01 d3 systemd[1]: [email protected]: Failed with result 'exit-code'.
Nov 04 17:44:01 d3 systemd[1]: Failed to start Afterburn (SSH Keys).
[core@d3 ~]$
[core@d3 ~]$
[core@d3 ~]$ journalctl -u [email protected]
-- Logs begin at Wed 2020-11-04 17:42:46 UTC, end at Wed 2020-11-04 17:44:12 UTC. --
Nov 04 17:43:07 d3 systemd[1]: Starting Afterburn (SSH Keys)...
Nov 04 17:43:14 d3 afterburn[961]: Nov 04 17:43:14.398 WARN failed to locate config-drive, using the metadata service API instead
Nov 04 17:43:14 d3 afterburn[961]: Nov 04 17:43:14.428 INFO Fetching http://169.254.169.254/latest/meta-data/public-keys: Attempt #1
Nov 04 17:43:15 d3 afterburn[961]: Nov 04 17:43:15.612 INFO Failed to fetch: 404 Not Found
Nov 04 17:43:16 d3 afterburn[961]: Nov 04 17:43:16.613 INFO Fetching http://169.254.169.254/latest/meta-data/public-keys: Attempt #2
Nov 04 17:43:16 d3 afterburn[961]: Nov 04 17:43:16.832 INFO Failed to fetch: 404 Not Found
Nov 04 17:43:18 d3 afterburn[961]: Nov 04 17:43:18.833 INFO Fetching http://169.254.169.254/latest/meta-data/public-keys: Attempt #3
Nov 04 17:43:19 d3 afterburn[961]: Nov 04 17:43:19.018 INFO Failed to fetch: 404 Not Found
Nov 04 17:43:23 d3 afterburn[961]: Nov 04 17:43:23.018 INFO Fetching http://169.254.169.254/latest/meta-data/public-keys: Attempt #4
Nov 04 17:43:23 d3 afterburn[961]: Nov 04 17:43:23.230 INFO Failed to fetch: 404 Not Found
Nov 04 17:43:28 d3 afterburn[961]: Nov 04 17:43:28.230 INFO Fetching http://169.254.169.254/latest/meta-data/public-keys: Attempt #5
Nov 04 17:43:28 d3 afterburn[961]: Nov 04 17:43:28.440 INFO Failed to fetch: 404 Not Found
Nov 04 17:43:33 d3 afterburn[961]: Nov 04 17:43:33.440 INFO Fetching http://169.254.169.254/latest/meta-data/public-keys: Attempt #6
Nov 04 17:43:34 d3 afterburn[961]: Nov 04 17:43:34.681 INFO Failed to fetch: 404 Not Found
Nov 04 17:43:39 d3 afterburn[961]: Nov 04 17:43:39.681 INFO Fetching http://169.254.169.254/latest/meta-data/public-keys: Attempt #7
Nov 04 17:43:39 d3 afterburn[961]: Nov 04 17:43:39.903 INFO Failed to fetch: 404 Not Found
Nov 04 17:43:44 d3 afterburn[961]: Nov 04 17:43:44.904 INFO Fetching http://169.254.169.254/latest/meta-data/public-keys: Attempt #8
Nov 04 17:43:45 d3 afterburn[961]: Nov 04 17:43:45.102 INFO Failed to fetch: 404 Not Found
Nov 04 17:43:50 d3 afterburn[961]: Nov 04 17:43:50.102 INFO Fetching http://169.254.169.254/latest/meta-data/public-keys: Attempt #9
Nov 04 17:43:51 d3 afterburn[961]: Nov 04 17:43:51.517 INFO Failed to fetch: 404 Not Found
Nov 04 17:43:56 d3 afterburn[961]: Nov 04 17:43:56.517 INFO Fetching http://169.254.169.254/latest/meta-data/public-keys: Attempt #10
Nov 04 17:43:56 d3 afterburn[961]: Nov 04 17:43:56.730 INFO Failed to fetch: 404 Not Found
Nov 04 17:44:01 d3 afterburn[961]: Nov 04 17:44:01.730 INFO Fetching http://169.254.169.254/latest/meta-data/public-keys: Attempt #11
Nov 04 17:44:01 d3 afterburn[961]: Nov 04 17:44:01.939 INFO Failed to fetch: 404 Not Found
Nov 04 17:44:01 d3 afterburn[961]: Error: failed to run
Nov 04 17:44:01 d3 afterburn[961]: Caused by: writing ssh keys
Nov 04 17:44:01 d3 afterburn[961]: Caused by: maximum number of retries (10) reached
Nov 04 17:44:01 d3 afterburn[961]: Caused by: failed to fetch: 404 Not Found
Nov 04 17:44:01 d3 systemd[1]: [email protected]: Main process exited, code=exited, status=1/FAILURE
Nov 04 17:44:01 d3 systemd[1]: [email protected]: Failed with result 'exit-code'.
Nov 04 17:44:01 d3 systemd[1]: Failed to start Afterburn (SSH Keys).
If you run curl http://169.254.169.254/latest does anything respond?
The above output was on an instance that wasn't started with --key-name (i.e. no key via metadata service). On an instance with --key-name I do get a non 404 valid result from http://169.254.169.254/latest/meta-data/public-keys.
If you run curl http://169.254.169.254/latest does anything respond?
[core@d3 ~]$ curl http://169.254.169.254/latest; echo
meta-data/
user-data
[core@d3 ~]$
Seems like we need to update https://github.com/coreos/afterburn/blob/v4.5.3/src/providers/openstack/network.rs#L20 to let client = retry::Client::try_new()?.return_on_404(true); similar to what is being done in AWS & GCP
Seems like we need to update https://github.com/coreos/afterburn/blob/v4.5.3/src/providers/openstack/network.rs#L20 to
let client = retry::Client::try_new()?.return_on_404(true);similar to what is being done in AWS & GCP
+1 we can probably transfer this issue to the afterburn repo..
One more question to answer first:
- Should AWS and GCP platforms show Afterburn: wrote ssh authorized keys file for user: core when this test is run? My understanding is "no" because the NoSSHKeyInMetadata flag is passed.
@dustymabe does the Afterburn fragment under authorized_keys exist there? If yes then it's likely a bug in the tests, if not then something may be off in FCOS.
Should AWS and GCP platforms show Afterburn: wrote ssh authorized keys file for user: core when this test is run? My understanding is "no" because the NoSSHKeyInMetadata flag is passed.
I think on AWS we might still be shoving in an unused garbage key; I don't think so for GCP. Would have to read through the mantle code again to make sure on that assumption though. I think irrespective of the log messaging though they should be passing the key being used to SSH into the instance via Ignition only and so the test can still be valid regardless of if afterburn is fetching a key.
https://github.com/coreos/afterburn/pull/513 adds return_on_404 to both openstack & cloudstack network providers
I broke out the afterburn issue (question 2. from the original description in this issue) into https://github.com/coreos/fedora-coreos-tracker/issues/662 to raise awareness.
I just looked through the console logs of recent kola-aws and kola-gcp jobs and I couldn't find any Afterburn: wrote ssh authorized keys file for user: core messages. I wonder if this is still an issue.
I just looked through the console logs of recent kola-aws and kola-gcp jobs
specifically ones for the coreos.ignition.ssh.key test?
Yes, in that test I'm only seeing Ignition: wrote ssh authorized keys file for user: core messages
Thanks. Ok I looked into this a bit more too.
Most likely fixed by https://github.com/coreos/afterburn/pull/631