Cis sessions are reaching maximum, causing cluster deployments to fail
Bug description When deploying a new workload cluster before the cluster is fully deployed, the deployment times out. The cluster itself is not fully deployed it seems.
After this it turns out it's no longer possible to connect to the vCenter server.
Command tanzu cluster create --file workload.yaml is issued, resulting in the following output:
Error: workload cluster configuration validation failed: vSphere config validation failed: failed to get VC client: failed to create vc client: failed to login to vSphere: cannot login to vc: POST https://{vcenter-server}/rest/com/vmware/cis/session: 503 Service Unavailable
Upon further inspection by manually using the API to create a session it turns out that the maximum session count is now reached:
"default_message": "User session count is limited to 550. Existing session count is 550 for user user@domain",
"id": "com.vmware.vapi.endpoint.failedToLoginMaxUserSessionCountReached"
This now also blocks deleting the not-created cluster.
Rebooting vCenter clears the sessions and allows operation again. However new deployment will again not succeed.
Expected behavior The cluster is deployed successfully without the session count limit to be reached.
Steps to reproduce the bug / Relevant debug output
- Have a standalone management cluster running using TKG version 2.5.0 (OVA 1.28.4)
- dev type, medium control plane, large worker.
- Further default settings
- Deploy a workload cluster using management cluster config as base
Output of tanzu version
version: v1.0.0
buildDate: 2023-08-08
sha: 006d0429
Environment where the bug was observed (cloud, OS, etc)
- vSphere version 8
- Tanzu CLI 1.0.0
- TKG plugins group version 2.5.0
- TKG Photon v5 Kubernetes v1.28.4 OVA image
Tanzu CLI is running on a bootstrapping VM:
- Alpine Extended version 3.19.1
- Kubectl version:
Client Version: v1.28.7+vmware.1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.4+vmware.1
If any more information / input is desired I'll do my best to provide it.
Thanks for reaching out.
Please note that cluster lifecycle functionality (such as the cluster create command mentioned in the issue) is supported through TKGm and vCenter product groups. Please open a support ticket with the VMware GSS team to help triage and resolve.