AKS: npm not enforcing policies properly
What happened:
Azure network policy manager does not enforce defined network policies on the local node.
For example if you define a network policy to filter out all egress traffic from the pod, the traffic going toward the local node private ip (not the public one if any) won't be filtered out.
Consequently any listening service on the private ip can be connected to (containerd, kubelet, ssh…).
What you expected to happen:
All specified traffic to be filtered out properly with no exception (other than the ones requested by the customer)
How to reproduce it:
- Spawn an aks with Azure cni + Azure network policy manager for policy enforcement
- Once the cluster is spawned, connect there and apply the two following manifests
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: np1
namespace: default
spec:
egress:
- ports:
- port: 53
protocol: UDP
- port: 53
protocol: TCP
to:
- ipBlock:
cidr: 0.0.0.0/0
- ports:
- port: 80
protocol: TCP
- port: 443
protocol: TCP
- port: 22
protocol: TCP
- endPort: 65535
port: 1024
protocol: TCP
- endPort: 65535
port: 1024
protocol: UDP
to:
- ipBlock:
cidr: 0.0.0.0/0
except:
- 10.0.0.0/8
- 172.16.0.0/12
- 192.168.0.0/16
- 169.254.169.254/32
podSelector:
matchExpressions:
- key: test
operator: Exists
policyTypes:
- Egress
---
apiVersion: v1
kind: Pod
metadata:
labels:
test: "true"
name: test
namespace: default
spec:
containers:
- image: ubuntu:latest
imagePullPolicy: Always
command:
- sleep
- infinity
name: main
terminationGracePeriodSeconds: 0
- Get the node host private ip
$ k get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
test 1/1 Running 0 9m28s 10.224.0.110 aks-agentpool-31351106-vmss000000 <none> <none>
$ k get node aks-agentpool-31351106-vmss000000 -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
aks-agentpool-31351106-vmss000000 Ready agent 13m v1.28.9 10.224.0.4 20.231.2.119 Ubuntu 22.04.4 LTS 5.15.0-1064-azure containerd://1.7.15-1
- Go into the pod and verify the traffic toward the local node private ip is let through
$ k exec -it test -- /bin/bash
# apt-get update && apt-get install curl
# curl --insecure https://10.224.0.4:10250/pods
Unauthorized
# curl --insecure https://10.224.0.4:10250
404 page not found
# nc 10.224.0.4 22
- Reproduce the same with Calico network policy plugin instead to verify the policy is well defined and correctly filtering egresses
Kubernetes Version:
The one proposed with AKS by default, at the time of reporting the issue (1.28 or so)
Kernel (e.g. uname -a):
The one of azure aks nodes
Hi @oOraph, thanks for authoring this issue. In general, NPM does enforce policies properly, but it sounds like you discovered an edge case with NPM. Trying to decipher this scenario: it seems like we can reduce the problem to a NetworkPolicy allowing egress to all IPs/ports except your Node's private IP? So the NetworkPolicy should drop traffic from the Pod to its Node? Please let me know if I misinterpreted.
you're right. Allowing anything but sth related to local node will show the issue (policy won't be enforced for pods deployed on the said node, but for others, filtering will be effective)
This issue is stale because it has been open for 2 weeks with no activity. Remove stale label or comment or this will be closed in 7 days
comment anti-stale
Hi @oOraph, would you be able to validate your scenario on an AKS cluster with Cilium? If that solves your problem, we would recommend using Cilium to enforce your network policies going forward.
@huntergregory I tested with calico and did not reproduce. For cilium I did not test but I would bet it's not concerned either as many people use it with kubernetes for policy enforcement. Also note that switching the np manager on an existing aks cluster is not possible. One needs to remove it first (leaving the cluster with no policy enforcement for the migration time), then select the new one, with no node pool rolling upgrade, causing workload downtimes...
Also note that switching the np manager on an existing aks cluster is not possible
Please reference this documentation: Upgrade an existing cluster to Azure CNI Powered by Cilium
Also note that switching the np manager on an existing aks cluster is not possible
Please reference this documentation: Upgrade an existing cluster to Azure CNI Powered by Cilium
I should have specified "not possible without workload and, worse, policy enforcement downtime" (see note and warning in the doc page you point to)
This issue is stale because it has been open for 2 weeks with no activity. Remove stale label or comment or this will be closed in 7 days
This issue is stale because it has been open for 2 weeks with no activity. Remove stale label or comment or this will be closed in 7 days
This issue is stale because it has been open for 2 weeks with no activity. Remove stale label or comment or this will be closed in 7 days
Issue closed due to inactivity.