deepflow icon indicating copy to clipboard operation
deepflow copied to clipboard

[FR] Optimize the configuration method of specified Watcher.

Open Hyzhou opened this issue 2 years ago • 1 comments

Search before asking

  • [X] I had searched in the issues and found no similar feature requirement.

Description

We want to use a set of Agents deployed by Deployment as Kubernetes Watchers. Now the way to designate an Agent as a Kubernetes Watcher is to configure its IP address and Mac address. After the rolling update, the configuration will become invalid. We need some other way to point Kubernetes Watcher to a set of workloads.

Use case

name: k8s-d-rrcmIdbEph
type: kubernetes
config:
  controller_ip: 192.168.4.40
  node_port_name_regex: ^(cni|flannel|vxlan.calico|tunl|en[ospx])
  pod_net_ipv4_cidr_max_mask: 16
  pod_net_ipv6_cidr_max_mask: 64
  region_uuid: ffffffff-ffff-ffff-ffff-ffffffffffff
  vtap_id: 192.168.4.46-52:54:00:d5:d0:8e

Now, setting the vtap_id by ip and mac. What about a better way to configure this. Stable pointing to a set of workloads.

Related issues

No response

Are you willing to submit a PR?

  • [ ] Yes I am willing to submit a PR!

Code of Conduct

Hyzhou avatar Jan 26 '24 03:01 Hyzhou

K8s Watcher Election Supports Exclusion of Specified Agents

1. Requirements

  • In a serverless cluster, an extra deepflow-agent deployment is expected to be deployed for K8s watch. Since the agent for eBPF data collection runs as a serverless sidecar, the amount of resources that can be allocated is very small.

2. Design

In 6.2, we supported injecting the ONLY_WATCH_K8S_RESOURCE environment variable into the agent, which allows a agent to only perform watch and not enable other functions.

In 6.5, the logic of the above K8s watcher is sorted out as follows:

Removed:

  • Agent no longer supports the ONLY_WATCH_K8S_RESOURCE environment variable
  • When this environment variable is found, an ERROR log is printed, sleep 60s, and then an exception is exited

Added:

  • Agent supports the K8S_WATCH_POLICY=only-watch environment variable, which has the same effect as ONLY_WATCH_K8S_RESOURCE
  • If the agent is running in a K8s Pod (with the IN_CONTAINER environment variable), it will only watch K8s resources and not enable any other functions
  • This information should also be reported to the Server, so that it is prioritized as the Watcher. Note the following special cases:
    • If there are two deepflow-agents on a K8s Node, one of which reports K8S_WATCH_POLICY=only-watch (Pod) and the other does not report this ENV (normal process, or another Pod)
    • At this time, the Server needs to be careful not to issue the Watch K8s command to the "other deepflow-agent"

Added:

  • Agent can configure the K8S_WATCH_POLICY=watch-disabled environment variable
  • At this time, the agent needs to refuse to watch K8s resources
  • This information should also be reported to the Server so that it is not elected as the watcher

How to determine the k8s-watcher agent in a K8s cluster:

  1. Do not consider the Agent with K8S_WATCH_POLICY=watch-disabled as a watcher candidate
  2. Otherwise, give priority to the agent with K8S_WATCH_POLICY=only-watch as the watcher candidate
  3. Otherwise, other agents are used as fallback candidates
  4. Finally, if there is no suitable Agent to be elected as the Watcher, there needs to be a mechanism to prompt the user (such as alarm, abnormal display, warn log)

Documentation:

  • Remind about the deprecation of ONLY_WATCH_K8S_RESOURCE in the 6.5 upgrade operation of the enterprise version documentation and in the release notes of the community version documentation.

3. Test Cases

Scenario 1: Pure ordinary Node K8s cluster, do not want the deepflow-agent of the eBPF function to synchronize K8s resources (it may consume significantly more memory than other deepflow-agents)

  • Deploy 1 deepflow-agent daemonset: for eBPF collection, do not inject any environment variables
  • Deploy 1 deepflow-agent deployment: inject K8S_WATCH_POLICY=only-watch to synchronize K8s resources only

Scenario 2: K8s cluster with serverless nodes, not allowing sidecar deepflow-agent to synchronize K8s resources (no permission)

  • Deploy 1 deepflow-agent daemonset (sidecar): for collection, inject K8S_WATCH_POLICY=watch-disabled to not synchronize K8s resources
  • How to synchronize resources:
    • If there are also ordinary K8s nodes in the cluster, deepflow-agent daemonset runs on them and does not need to inject any environment variables, the Watcher is naturally elected
    • If there are no ordinary K8s nodes, deploy 1 deepflow-agent deployment. Inject K8S_WATCH_POLICY=only-watch to synchronize K8s resources only.

sharang avatar Mar 04 '24 12:03 sharang