falcon-helm icon indicating copy to clipboard operation
falcon-helm copied to clipboard

Deploy webhook conditionally in falcon-kac

Open maxsokolovsky opened this issue 1 year ago • 4 comments

This PR introduces a new setting admissionControl.enabled (bool). By default, it's set to true. When users deploy a falcon-kac release with it set to false, Helm will do the following:

  1. Exclude the ValidatingWebhookConfiguration definition from rendered resources.
  2. Exclude the Service definition from rendered resources.
  3. Lower the CPU & Memory resource requests and limits of the falcon-client container in the Deployment. Without the webhook configuration present in the cluster, the container is expected to get a significantly lower load.

The PR also adds a validation check (may need a test) that doesn't allow users to disable both admission control and visibility. At least one must be enabled. Visibility is enabled if at least one of the following is enabled: clusterVisibility.resourceSnapshots, clusterVisibility.resourceWatcher.

A couple of notes for reviewers:

  1. The final values for CPU & Memory requests and limits may need to be tweaked for when the webhook is missing.
  2. Perhaps the name of Values.falconClientNoWebhookResources could be changed to something similar to the setting's name.

maxsokolovsky avatar Feb 26 '25 16:02 maxsokolovsky

I tested this with the KAC version 7.22.0-2001 but the containers failed to start up.

admissionControl:
  enabled: false
k get pods -A
NAMESPACE            NAME                                               READY   STATUS             RESTARTS       AGE
falcon-kac-testing   falcon-kac-5869f9c4f5-fqq9g                        1/3     CrashLoopBackOff   11 (46s ago)   3m59s

Logs

k logs falcon-kac-5869f9c4f5-fqq9g --all-containers -n falcon-kac-testing
I0228 19:41:57.703510     149 leaderelection.go:254] attempting to acquire leader lease falcon-kac-testing/falcon-kac-lock...
I0228 19:41:57.708480     149 leaderelection.go:268] successfully acquired lease falcon-kac-testing/falcon-kac-lock
I0228 19:41:58.035022     149 leaderelection.go:254] attempting to acquire leader lease falcon-kac-testing/falcon-kac-lock...
I0228 19:41:58.048709     149 leaderelection.go:268] successfully acquired lease falcon-kac-testing/falcon-kac-lock
E0228 19:41:58.368289    149] "Failed to read webhook metadata" "error"="KAC webhook configuration not found"

Events

Events:
  Type     Reason     Age                     From               Message
  ----     ------     ----                    ----               -------
  Normal   Scheduled  4m15s                   default-scheduler  Successfully assigned falcon-kac-testing/falcon-kac-5869f9c4f5-fqq9g to falcon-kac-control-plane
  Normal   Pulling    4m14s                   kubelet            Pulling image "<internal_registry_url>/falcon-kac:7.22.0-2001"
  Normal   Pulled     4m14s                   kubelet            Successfully pulled image "<internal_registry_url>/falcon-kac:7.22.0-2001" in 642ms (642ms including waiting). Image size: 42666343 bytes.
  Normal   Pulled     4m13s                   kubelet            Successfully pulled image "<internal_registry_url>/falcon-kac:7.22.0-2001" in 481ms (481ms including waiting). Image size: 42666343 bytes.
  Normal   Started    4m13s                   kubelet            Started container falcon-watcher
  Normal   Pulling    4m13s                   kubelet            Pulling image "<internal_registry_url>/falcon-kac:7.22.0-2001"
  Normal   Pulled     4m13s                   kubelet            Successfully pulled image "<internal_registry_url>/falcon-kac:7.22.0-2001" in 596ms (596ms including waiting). Image size: 42666343 bytes.
  Normal   Created    4m13s                   kubelet            Created container: falcon-watcher
  Normal   Started    4m12s                   kubelet            Started container falcon-ac
  Normal   Created    4m12s                   kubelet            Created container: falcon-ac
  Normal   Pulled     4m10s                   kubelet            Successfully pulled image "<internal_registry_url>/falcon-kac:7.22.0-2001" in 459ms (459ms including waiting). Image size: 42666343 bytes.
  Normal   Pulled     3m52s                   kubelet            Successfully pulled image "<internal_registry_url>/falcon-kac:7.22.0-2001" in 611ms (611ms including waiting). Image size: 42666343 bytes.
  Warning  Unhealthy  3m51s (x3 over 4m11s)   kubelet            Startup probe failed: Get "https://10.244.0.6:4443/startz": dial tcp 10.244.0.6:4443: connect: connection refused
  Warning  Unhealthy  3m45s (x14 over 4m11s)  kubelet            Startup probe failed: Get "https://10.244.0.6:4443/startz-kac": dial tcp 10.244.0.6:4443: connect: connection refused
  Warning  BackOff    3m45s (x8 over 4m8s)    kubelet            Back-off restarting failed container falcon-client in pod falcon-kac-5869f9c4f5-fqq9g_falcon-kac-testing(bd91ac2f-b6c9-4edc-bcdd-6d2101e40057)
  Normal   Pulling    3m20s (x4 over 4m15s)   kubelet            Pulling image "<internal_registry_url>/falcon-kac:7.22.0-2001"
  Normal   Created    3m19s (x4 over 4m14s)   kubelet            Created container: falcon-client
  Normal   Started    3m19s (x4 over 4m14s)   kubelet            Started container falcon-client
  Normal   Pulled     3m19s                   kubelet            Successfully pulled image "<internal_registry_url>/falcon-kac:7.22.0-2001" in 633ms (633ms including waiting). Image size: 42666343 bytes.
  Normal   Killing    3m13s                   kubelet            Container falcon-ac failed startup probe, will be restarted

I did some testing a few days ago by modifying the chart only to start the falcon-watcher but it didn't work either. Discussion

r3motecontrol avatar Feb 28 '25 19:02 r3motecontrol

@r3motecontrol, this is normal, as there is change on the way to KAC itself to support this new setting.

maxsokolovsky avatar Feb 28 '25 19:02 maxsokolovsky

Ah, I see. I didn't realize you're from CrowdStrike.

as there is change on the way to KAC itself to support this new setting.

I am excited to hear about it. Thanks for the clarification.

r3motecontrol avatar Feb 28 '25 19:02 r3motecontrol

We need this functionality. When do you think it will be released?

skoskie-olo avatar Apr 01 '25 16:04 skoskie-olo

We received the Extension of End of Support email for KPA yesterday.

On June 1st - Falcon KPA will stop collecting cluster visibility data and only clusters monitored with Falcon KAC will show new data in the Falcon console.

Can we expect this functionality before then?

On December 12, 2024, we announced our intention to end support for the Falcon Kubernetes Protection Agent (KPA) on April 1st, 2025. Starting today, new customers cannot download or deploy Falcon KPA. Instead, they will use the Falcon Kubernetes Admission Controller (KAC), version 7.20 or later, which offers Kubernetes visibility equivalent to the Falcon KPA.

We know that you’re working towards the transition to Falcon KAC and we want to make sure that you have enough time to complete this without disruption to your business operations. As such, we’re extending your End of Service (EoS) date for Falcon KPA from March 31, 2025 to May 31, 2025. 

April 1, 2025: Extended Falcon KPA support period begins
Falcon KPA installer is not available for download.

May 31, 2025: Extended Falcon KPA support period ends
End of all maintenance of the Falcon KPA, including security updates.
For up to 7 more days (depending on when KPA last sent data to the CrowdStrike), Container Security APIs will continue to serve existing KPA data from the CrowdStrike cloud. No new KPA data will be added to the CrowdStrike cloud. Once the existing data expires no KPA data will be returned from the APIs.
You must complete your transition to Falcon KAC version 7.20 or later, we recommend using the latest version of Falcon KAC.
Falcon KPA will stop collecting cluster visibility data and only clusters monitored with Falcon KAC will show new data in the Falcon console.

r3motecontrol avatar Apr 09 '25 17:04 r3motecontrol

LGTM

pflanno avatar May 01 '25 10:05 pflanno

We deployed KAC v7.25 using this branch with admissionControl disabled. The pod and all the containers are up and running and the validating webhook wasn't deployed. However, we noticed errors in the logs. I think it's a kac issue, unrelated to helm chart. When can we expect these changes to be merged into main?

k get pods -n falcon-kac 
NAME                          READY   STATUS    RESTARTS   AGE
falcon-kac-7fb896977d-xgfqn   3/3     Running   0          87m
k logs falcon-kac-7fb896977d-xgfqn -n falcon-kac -c falcon-watcher
I0514 21:18:20.118305      38 leaderelection.go:254] attempting to acquire leader lease falcon-kac/falcon-kac-lock...
I0514 21:18:20.129638      38 leaderelection.go:268] successfully acquired lease falcon-kac/falcon-kac-lock
E0514 21:18:20.445182     38] "visibility/podWatcher: Watch error" "error"="too old resource version: 1002008929 (1002015397)"
E0514 22:01:13.844746     38] "comms: Failed to create packet for event" "error"="Message is too large" "seq"=475 "event"="K8SResourceK8SV2 (0x81000cf1)" "len"=90282
E0514 22:01:13.844773     38] "visibility/dedupSender: Failed to send K8SResource event" "error"="Message is too large" "pod"="<redacted>/<redacted>"
E0514 22:01:33.732334     38] "comms: Failed to create packet for event" "error"="Message is too large" "seq"=481 "event"="K8SResourceK8SV2 (0x81000cf1)" "len"=90770
E0514 22:01:33.732360     38] "visibility/dedupSender: Failed to send K8SResource event" "error"="Message is too large" "pod"="<redacted>/<redacted>"
E0514 22:02:26.844584     38] "comms: Failed to create packet for event" "error"="Message is too large" "seq"=494 "event"="K8SResourceK8SV2 (0x81000cf1)" "len"=90696
E0514 22:02:26.844611     38] "visibility/dedupSender: Failed to send K8SResource event" "error"="Message is too large" "pod"="<redacted>/<redacted>"

r3motecontrol avatar May 14 '25 22:05 r3motecontrol

@r3motecontrol You're correct, the error messages are unrelated to the helm change It looks like KAC is working correctly for you with admissionControl disabled

The errors indicate that some pod specs were nearly 90kb in size and were too large to send, so were dropped as they exceeded maximum size for memory/network protection limits in KAC

pflanagan-cs avatar May 15 '25 17:05 pflanagan-cs