deepflow icon indicating copy to clipboard operation
deepflow copied to clipboard

[BUG] server 报错: pod xxx type () ownerReferences not found or sci cluster type not support

Open Terrynech opened this issue 1 year ago • 2 comments

Search before asking

  • [X] I had searched in the issues and found no similar feature requirement.

DeepFlow Component

Server

What you expected to happen

发现grafana只监控到了一小部分namespace, 没有采集有其他的的namespace的pod数据,通过在server端开启debug日志, 发现没有被采集到pod的相关日志如下:

$kubectl -n deepflow logs -f deepflow-server-78488cc955-p5bnq |grep '\[cloud.kuber' |grep 'pod.go' |grep 'stargate.1000003300.s55178.g1.3'
2024-03-19 13:22:24.614 [DEBU] [cloud.kubernetes_gather] pod.go:79 pod (stargate.1000003300.s55178.g1.3) type () ownerReferences not found or sci cluster type not support

pod缺少了ownerReferences 信息, 原因是我们的k8s集群没有使用deployment, daemonset,自研了一套控制器对象管理平台, 直接对pod进行创建, 导致pod缺少了ownerReferences 字段(资源对象间的关联关系)

How to reproduce

直接对pod进行创建, pod会缺少ownerReferences 字段(资源对象间的关联关系)

DeepFlow version

2024/03/19 13:39:39 ENV K8S_NODE_NAME_FOR_DEEPFLOW=archredis016049.ppdgdslfat.com; K8S_NODE_IP_FOR_DEEPFLOW=10.114.16.49; K8S_POD_NAME_FOR_DEEPFLOW=deepflow-server-78488cc955-p5bnq; K8S_POD_IP_FOR_DEEPFLOW=10.0.0.46; K8S_NAMESPACE_FOR_DEEPFLOW=deepflow Name: deepflow-server community edition Branch: v6.4 CommitID: ce65d37a1de3efd7699ba5caa33ae3dcdba29abf RevCount: 9726 Compiler: go version go1.20.14 linux/amd64 CompileTime: 2024-03-14 13:38:05

Defaulted container "deepflow-agent" out of: deepflow-agent, configure-sysctl (init) 9724-02b23645b2c92d27ee74ca383e05b5ab67b56ac8 Name: deepflow-agent community edition Branch: v6.4 CommitId: 02b23645b2c92d27ee74ca383e05b5ab67b56ac8 RevCount: 9724 Compiler: rustc 1.75.0 (82e1608df 2023-12-21) CompileTime: 2024-03-13 08:03:09

DeepFlow agent list

image

Kubernetes CNI

No response

Operation-System/Kernel version

$ uname -r 3.10.0-693.el7.x86_64 "CentOS Linux 7 (Core)

Anything else

想问下我们的server 使用ownerReferences的原因是什么, 项目中哪里会用到, 如果不重要的话, 我们能不能通过mock的方式把ownerReferences 字段给补上? 或者在新版本中支持一下缺少ownerReferences的场景?

Are you willing to submit a PR?

  • [X] Yes I am willing to submit a PR!

Code of Conduct

Terrynech avatar Mar 19 '24 05:03 Terrynech

我看代码里面如果给pod的metaData labels里面添加label:virtual-kubelet.io/provider-cluster-type和virtual-kubelet.io/provider-resource-name 可以规避掉server对pod的ownerReferences字段检查, 我找了一个pod试了之后,发现确实可以, 数据能正常上报上来了,但这样做有没有什么影响? image image

Terrynech avatar Mar 20 '24 09:03 Terrynech

  • 关于ownerReferences的原因:我们遇到的K8s中,基本都是通过这个字段来标记 pod 所关联的deployment/statefulset/...信息,所以我们的代码实现中是用这个字段做关联的。而且目前没有关联deployment/statefulset/...的pod是不学习的
  • 关于额外增加label的影响:如果咱们自己代码使用的label中,没有用到lvirtual-kubelet.io/provider-cluster-type和virtual-kubelet.io/provider-resource-name这两个label,那就不会对功能产生什么影响
  • 关于后续的版本迭代
    • 后续版本我们会考虑支持按需配置如何计算pod关联的deployment/statefulset/...,为了避免二次编译,它可能是一段lua代码
    • 大概会是在6.6版本

SongZhen0704 avatar Apr 03 '24 08:04 SongZhen0704