WangZzzhe
WangZzzhe
> 1. what's the difference between this eviction plugin and cpu-load && cpu-suppression eviction plugins? > 2. why don't we implement this eviction in qrm but in eviction-manager? will it...
@flpanbin 可以提供下节点的相关信息吗? 1、创建测试pod前节点的request总量和负载; 2、测试pod的request和负载
@flpanbin 对于内存来说是合理的,因为内存的申请量增加了但是负载没有变化。 理论上CPU在pod创建成功,但stress负载还没起来的情况下是可能上升的,但稳定后相比之前应该是下降的。可以调整日志等级为6后观察下采集的数据是否准确。 https://github.com/kubewharf/katalyst-core/blob/main/pkg/agent/sysadvisor/plugin/overcommitmentaware/realtime/realtime.go#L154 https://github.com/kubewharf/katalyst-core/blob/main/pkg/agent/sysadvisor/plugin/overcommitmentaware/realtime/realtime.go#L158
@flpanbin 在负载不变的情况下,资源申请量增加,节点可分配资源减少,导致节点需要超分更多的资源来达到目标负载值。 具体的规则可以参考https://github.com/kubewharf/katalyst-core/blob/main/pkg/agent/sysadvisor/plugin/overcommitmentaware/realtime/realtime.go#L286
@flpanbin 请问是在什么模式下运行的?(QRM or ORM) 另外可以提供下节点的NUMA信息吗? ``` "katalyst.kubewharf.io/memory_enhancement": '{ "numa_binding": "true", "numa_exclusive": "true" }' ``` 在`numa_exclusive = true`的情况下pod会独占整个NUMA
@flpanbin 修改 `--topology-policy-name=none` 为 `--topology-policy-name=best-effort` 再尝试下 none policy下不会进行资源分配
@flpanbin 根据katalyst-agent的启动参数,应该是运行了ORM模式,kubelet中的qosResourceManager是没有生效的。  从KCNR的信息看这个pod已经分配成功了,独占了一个NUMA的24C
@flpanbin 1. 观察 /var/lib/katalyst/qrm_advisor/cpu_plugin_state文件中是否有dedicated pod的分配信息 2. 如果有,查找日志中是否有 https://github.com/kubewharf/katalyst-core/blob/main/pkg/agent/orm/manager.go#L517 相关流程的错误日志 3. 如果没有分配信息,查找日志中是否有 https://github.com/kubewharf/katalyst-core/blob/main/pkg/agent/qrm-plugins/cpu/dynamicpolicy/policy_allocation_handlers.go#L275 相关流程的分配日志或者错误日志
regionDedicatedNUMAExclusive -> regionDedicatedNUMA
The QPS of the webhook will increase with the increase of the number of nodes in the cluster. To address this issue, we have simplified the design of the webhook....