azure-container-networking icon indicating copy to clipboard operation
azure-container-networking copied to clipboard

AKS: npm not enforcing policies properly

Open oOraph opened this issue 1 year ago • 4 comments

What happened:
Azure network policy manager does not enforce defined network policies on the local node.

For example if you define a network policy to filter out all egress traffic from the pod, the traffic going toward the local node private ip (not the public one if any) won't be filtered out.

Consequently any listening service on the private ip can be connected to (containerd, kubelet, ssh…).

What you expected to happen:

All specified traffic to be filtered out properly with no exception (other than the ones requested by the customer)

How to reproduce it:

  • Spawn an aks with Azure cni + Azure network policy manager for policy enforcement
  • Once the cluster is spawned, connect there and apply the two following manifests
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: np1
  namespace: default 
spec:
  egress:
  - ports:
    - port: 53
      protocol: UDP
    - port: 53
      protocol: TCP
    to:
    - ipBlock:
        cidr: 0.0.0.0/0
  - ports:
    - port: 80
      protocol: TCP
    - port: 443
      protocol: TCP
    - port: 22
      protocol: TCP
    - endPort: 65535
      port: 1024
      protocol: TCP
    - endPort: 65535
      port: 1024
      protocol: UDP
    to:
    - ipBlock:
        cidr: 0.0.0.0/0
        except:
        - 10.0.0.0/8
        - 172.16.0.0/12
        - 192.168.0.0/16
        - 169.254.169.254/32
  podSelector:
    matchExpressions:
    - key: test
      operator: Exists
  policyTypes:
  - Egress
---
apiVersion: v1
kind: Pod
metadata:
  labels:
    test: "true"
  name: test 
  namespace: default
spec:
  containers:
  - image: ubuntu:latest
    imagePullPolicy: Always
    command:
    - sleep
    - infinity
    name: main
  terminationGracePeriodSeconds: 0
  • Get the node host private ip
$ k get pods -o wide
NAME        READY   STATUS    RESTARTS   AGE     IP             NODE                                NOMINATED NODE   READINESS GATES
test   1/1     Running   0          9m28s   10.224.0.110   aks-agentpool-31351106-vmss000000   <none>           <none>
$ k get node aks-agentpool-31351106-vmss000000 -o wide
NAME                                STATUS   ROLES   AGE   VERSION   INTERNAL-IP   EXTERNAL-IP    OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
aks-agentpool-31351106-vmss000000   Ready    agent   13m   v1.28.9   10.224.0.4    20.231.2.119   Ubuntu 22.04.4 LTS   5.15.0-1064-azure   containerd://1.7.15-1
  • Go into the pod and verify the traffic toward the local node private ip is let through
$ k exec -it test -- /bin/bash
# apt-get update && apt-get install curl
# curl --insecure https://10.224.0.4:10250/pods
Unauthorized
# curl --insecure https://10.224.0.4:10250
404 page not found
# nc 10.224.0.4 22
  • Reproduce the same with Calico network policy plugin instead to verify the policy is well defined and correctly filtering egresses

Kubernetes Version:

The one proposed with AKS by default, at the time of reporting the issue (1.28 or so)

Kernel (e.g. uname -a):

The one of azure aks nodes

oOraph avatar Jun 13 '24 14:06 oOraph

Hi @oOraph, thanks for authoring this issue. In general, NPM does enforce policies properly, but it sounds like you discovered an edge case with NPM. Trying to decipher this scenario: it seems like we can reduce the problem to a NetworkPolicy allowing egress to all IPs/ports except your Node's private IP? So the NetworkPolicy should drop traffic from the Pod to its Node? Please let me know if I misinterpreted.

huntergregory avatar Jun 14 '24 21:06 huntergregory

you're right. Allowing anything but sth related to local node will show the issue (policy won't be enforced for pods deployed on the said node, but for others, filtering will be effective)

oOraph avatar Jun 17 '24 07:06 oOraph

This issue is stale because it has been open for 2 weeks with no activity. Remove stale label or comment or this will be closed in 7 days

github-actions[bot] avatar Jul 02 '24 00:07 github-actions[bot]

comment anti-stale

oOraph avatar Jul 02 '24 08:07 oOraph

Hi @oOraph, would you be able to validate your scenario on an AKS cluster with Cilium? If that solves your problem, we would recommend using Cilium to enforce your network policies going forward.

huntergregory avatar Jul 10 '24 21:07 huntergregory

@huntergregory I tested with calico and did not reproduce. For cilium I did not test but I would bet it's not concerned either as many people use it with kubernetes for policy enforcement. Also note that switching the np manager on an existing aks cluster is not possible. One needs to remove it first (leaving the cluster with no policy enforcement for the migration time), then select the new one, with no node pool rolling upgrade, causing workload downtimes...

oOraph avatar Jul 25 '24 09:07 oOraph

Also note that switching the np manager on an existing aks cluster is not possible

Please reference this documentation: Upgrade an existing cluster to Azure CNI Powered by Cilium

huntergregory avatar Jul 26 '24 18:07 huntergregory

Also note that switching the np manager on an existing aks cluster is not possible

Please reference this documentation: Upgrade an existing cluster to Azure CNI Powered by Cilium

I should have specified "not possible without workload and, worse, policy enforcement downtime" (see note and warning in the doc page you point to)

oOraph avatar Jul 26 '24 20:07 oOraph

This issue is stale because it has been open for 2 weeks with no activity. Remove stale label or comment or this will be closed in 7 days

github-actions[bot] avatar Aug 10 '24 00:08 github-actions[bot]

This issue is stale because it has been open for 2 weeks with no activity. Remove stale label or comment or this will be closed in 7 days

github-actions[bot] avatar Aug 27 '24 00:08 github-actions[bot]

This issue is stale because it has been open for 2 weeks with no activity. Remove stale label or comment or this will be closed in 7 days

github-actions[bot] avatar Sep 13 '24 00:09 github-actions[bot]

Issue closed due to inactivity.

github-actions[bot] avatar Sep 20 '24 00:09 github-actions[bot]