ContosoTraders
ContosoTraders copied to clipboard
Fix AKS Scaling (Cluster Autoscaler)
We're investigating the following options for AKS scaling:
1. ACI VIRTUAL NODES
Status
Currently BLOCKED.
Where
- The code changes are in
mithun/hpa2branch. (see PR microsoft/ContosoTraders#78) - The deployment is in the contoso-traders test subscription.
Change Description
- Redeployed AKS cluster via bicep template from
mithun/hpa2branch, which has theAzure CNInetwork policy (instead of the defaultkubenetpolicy). - Had to manually modify AKS's vnet to create a new subnet
aci-subnetwith address space10.255.0.0/16. - Tethered it to existing AKS cluster using
az aks enable-addons(full instructions here). - Applied the
Deployment.yamlmanifest frommithun/hpa2branch, which has thenodeSelector,tolerationschanges to configure pods to only run in virtual nodes.
Issue Details
The pods (configured to run in ACI virtual nodes) are stuck in waiting state.

The logs only show that an active endpoint is not being detected for the services / ingress

Hypothesis
- Could have something to do with the fact that we switched over to
Azure CNInetwork policy instead of the defaultkubenetpolicy. - Could have something to do with the
nodeSelector,tolerationschanges made in theDeployment.yamlfile to configure pod to only run in virtual nodes.
2. CLUSTER AUTOSCALER
Status
Currently INVESTIGATING
Where
- The changes are in my fork in
[mithun/enable-autoscal](mithunshanbhag:mithun/cluster-autoscaler)branch (See PR microsoft/ContosoTraders#81) - Deployed in Jithin's MSDN subscription.
Change Description
- Enable autoscaling with
minCount: 1andmaxCount: 10
Issue Details
-
Load test has a high failure rate. This issue is being tracked separately in microsoft/Contoso-Traders-Cloud-Testing#3

-
The pods are also not scaling out (this could be related to above issue).

Hypothesis
Currently none, still investigating.
Misc Notes
Ingress controller was stuck in PENDING state for a few minutes after provisioning. Then automatically went to OK state.
