cloudstack icon indicating copy to clipboard operation
cloudstack copied to clipboard

RootCAProvider error on management server with L3-network config

Open mosys0815 opened this issue 8 months ago • 11 comments

I am currently preparing a cluster and installed a new (the 1st) management server. That server's network is a fully routed L3-network. Starting the cloudstack-management service does not create the server certificate from root-ca with the following error:

2025-06-10 08:28:57,082 DEBUG [o.a.c.f.c.i.ConfigDepotImpl] (main:[]) (logid:) Retrieving keys from RootCAProvider
2025-06-10 08:28:58,589 DEBUG [o.a.c.s.l.r.ExtensionRegistry] (main:[]) (logid:) Registering extension [RootCAProvider] in [Ca Providers Registry]
2025-06-10 08:28:58,589 DEBUG [o.a.c.s.l.r.RegistryLifecycle] (main:[]) (logid:) Registered org.apache.cloudstack.ca.provider.RootCAProvider@611f82a8
2025-06-10 08:28:58,589 INFO  [o.a.c.s.l.CloudStackExtendedLifeCycle] (main:[]) (logid:) Configuring CloudStack Components
2025-06-10 08:28:58,589 INFO  [o.a.c.s.l.CloudStackExtendedLifeCycle] (main:[]) (logid:) configuring bean RootCAProvider.
2025-06-10 08:28:59,037 DEBUG [c.c.u.s.Script] (main:[]) (logid:) Executing command [/bin/bash -c ip route show default 0.0.0.0/0 | head -1 | awk '{print $5}' ].
2025-06-10 08:28:59,043 DEBUG [c.c.u.s.Script] (main:[]) (logid:) Successfully executed process [105870] for command [/bin/bash -c ip route show default 0.0.0.0/0 | head -1 | awk '{print $5}' ].
2025-06-10 08:28:59,127 ERROR [o.a.c.s.l.CloudStackExtendedLifeCycle] (main:[]) (logid:) Error on configuring bean RootCAProvider - Cannot invoke "java.net.NetworkInterface.getInterfaceAddresses()" because "nic" is null java.lang.NullPointerException: Cannot invoke "java.net.NetworkInterface.getInterfaceAddresses()" because "nic" is null
  • these are the active network-interfaces, "hostip" is the bgp-announced ip on a virtual interface for networking, eth1* are the ethernet interfaces
~# ip -4 -br a | egrep '(eth1|hostip)'
eth1a            UP             10.72.44.198/30
eth1b            UP             10.72.45.198/30
hostip           UNKNOWN        10.72.44.3/32
  • CS did retrieve the correct ip from hostip interface
~# ip route show default 0.0.0.0/0 | head -1 | awk '{print $5}'
10.72.44.3
  • certificate check
~# openssl s_client -connect 10.72.44.3:9090 </dev/null 2>/dev/null | openssl x509 -noout -ext subjectAltNam
Could not read certificate from <stdin>
4087895B58700000:error:1608010C:STORE routines:ossl_store_handle_load_result:unsupported:../crypto/store/store_result.c:151:
Unable to load certificate

For testing i installed 2 virtual machines with cloudstack-management and a new mysql-database on one of these. Both instances are connected on a layer 2 network. Here the management servers (one after another) came once with fully functional certificates, both servers see each other as peers in the cs-ui management server overview.

  • certificates SAN on one of the test-instances:
~# openssl s_client -connect 10.65.254.48:9090 </dev/null 2>/dev/null | openssl x509 -noout -ext subjectAltName
X509v3 Subject Alternative Name:
    IP Address:FE80:0:0:0:1C00:B1FF:FE00:164, IP Address:10.65.254.48, DNS:<redacted> DNS:cloudstack.internal

I then connected the former server with a fresh installation to the database of my test-setup. I got the same error as mentioned above.

I suspect some issue with the L3-network and retrieving some networking-stuff by cloudstack-management-server.

Any idea how to proceed here?

mosys0815 avatar Jun 10 '25 09:06 mosys0815

Thanks for opening your first issue here! Be sure to follow the issue template!

boring-cyborg[bot] avatar Jun 10 '25 09:06 boring-cyborg[bot]

what's the output of command ip route show default 0.0.0.0/0 ?

weizhouapache avatar Jun 10 '25 09:06 weizhouapache

~# ip route show default 0.0.0.0/0
default proto bird src 10.72.44.3 metric 32
	nexthop via inet6 fe80::429e:a4ff:fe79:f50f dev eth1b weight 1
	nexthop via inet6 fe80::429e:a4ff:fe7b:8f0f dev eth1a weight 1
default via 10.72.44.197 dev eth1a proto static src 10.72.44.198 metric 1024 onlink
default via 10.72.45.197 dev eth1b proto static src 10.72.45.198 metric 1024 onlink

mosys0815 avatar Jun 10 '25 09:06 mosys0815

~# ip route show default 0.0.0.0/0
default proto bird src 10.72.44.3 metric 32
	nexthop via inet6 fe80::429e:a4ff:fe79:f50f dev eth1b weight 1
	nexthop via inet6 fe80::429e:a4ff:fe7b:8f0f dev eth1a weight 1
default via 10.72.44.197 dev eth1a proto static src 10.72.44.198 metric 1024 onlink
default via 10.72.45.197 dev eth1b proto static src 10.72.45.198 metric 1024 onlink

maybe we should add a default via (or at least via) to make sure the command gets the default device, like

ip route show default 0.0.0.0/0 | grep ' via ' | head -1 | awk '{print $5}'

weizhouapache avatar Jun 10 '25 09:06 weizhouapache

What exactly is the device used for? In our setup we explicitly do not want to use a fixed interface, since these are supposed to "failover" each other.

mosys0815 avatar Jun 10 '25 10:06 mosys0815

What exactly is the device used for? In our setup we explicitly do not want to use a fixed interface, since these are supposed to "failover" each other.

@mosys0815 the command gets the default device, and certificate is for all IPs of the default device

what IP would you like to use primarily ? 10.72.44.3 ?

btw: is the management server running , or stopped after Error on configuring bean RootCAProvider ?

weizhouapache avatar Jun 10 '25 10:06 weizhouapache

AH, thx for clarification :)

Yes, i' like to expect the ip "10.72.44.3" added to the certificate, in fact with your current ip route command it already gets the correct ip.

The server is up and running, but https://10.72.44.3:9090/clusterservice is not available due to a missing certificate. Therefore a lot of certificate error messages appear in the logs and the both test virtual management servers are not able to communicate with this one server.

mosys0815 avatar Jun 10 '25 10:06 mosys0815

Yes, i' like to expect the ip "10.72.44.3" added to the certificate, in fact with your current ip route command it already gets the correct ip.

well, the command intends to get the default device name, but it returns the IP ... so it is unexpected

to support this special user case, some simple code changes are required I think

weizhouapache avatar Jun 10 '25 12:06 weizhouapache

yes, I thought so, a one-liner using structured data could like this:

ip -j a | jq -r '.[] | .addr_info | map(select(.local == "'"$(ip -j r s default | jq -r '.[0] | .prefsrc')"'")) | .[].label'

output on my L2-network test virtual machine:

~# ip -j a | jq -r '.[] | .addr_info | map(select(.local == "'"$(ip -j r s default | jq -r '.[0] | .prefsrc')"'")) | .[].label'
ens3

output on L3-network server:

~# ip -j a | jq -r '.[] | .addr_info | map(select(.local == "'"$(ip -j r s default | jq -r '.[0] | .prefsrc')"'")) | .[].label'
hostip

Feel free to use and adapt :)

But for now, is there anything I can do to get the certificate, even manually?

mosys0815 avatar Jun 10 '25 14:06 mosys0815

But for now, is there anything I can do to get the certificate, even manually?

I do not think so, unless the command returns results like

$ ip route show default 0.0.0.0/0
default via 10.72.44.3 dev hostip

weizhouapache avatar Jun 11 '25 07:06 weizhouapache

ok, thx, I will have a look into the source and try to get it running :)

mosys0815 avatar Jun 12 '25 15:06 mosys0815