[Question] Question about Knative Consumergroup Replica Logic - xpost
Hi team,
I’d like to ask for some clarification about the replica distribution logic in Knative Eventing (Kafka Broker / Channel).
Context:
- I’ve configured pod_capacity = 20, so the total capacity is 60. My current total replicas is 54.
- I have 3 dispatcher pods, which are enough to handle my ConsumerGroup replicas.
- When I set the ConsumerGroup replicas of a Trigger to 3, I’ve observed several distribution patterns:
- 1 pod: controls all 3 replicas
- 2 pods: one pod controls 2 replicas, another controls 1
- 3 pods: each pod controls 1 replica.
I’d like to understand the difference between these scenarios:
-
Parallelism and Bottlenecks
- From our observation, each pod — even when handling multiple replicas — is seen by the Kafka broker as a single consumer in the consumer group. We suspect that all assigned partitions within that pod are processed on a single thread, potentially becoming a bottleneck.
- Is this understanding correct?
- Would multiple consumer replicas on a single pod reduce throughput compared to distributing them across multiple pods?
- From our observation, each pod — even when handling multiple replicas — is seen by the Kafka broker as a single consumer in the consumer group. We suspect that all assigned partitions within that pod are processed on a single thread, potentially becoming a bottleneck.
-
Consumer Instance Behavior
- As a user, we expect that each consumer replica corresponds to a distinct consumer instance (or at least its own processing thread).
- Ideally, if 3 consumer replicas are placed on 1 pod, we’d expect to see 3 distinct consumers from the Kafka coordinator’s perspective — each capable of consuming in parallel.
- Is there any way to configure Knative to allow this kind of multi-threaded consumer behavior per pod?
- As a user, we expect that each consumer replica corresponds to a distinct consumer instance (or at least its own processing thread).
-
Replica Distribution Logic
- From reviewing the code, it appears that the consumerClient is created per placement, meaning one placement maps to one pod.
- What is the logic behind this design choice?
- Is there a mechanism to ensure replicas are evenly distributed across dispatcher pods?
- Can we configure the system so that a Trigger with 3 replicas always ensures each replica runs on a different pod?
- From reviewing the code, it appears that the consumerClient is created per placement, meaning one placement maps to one pod.
-
Configuration Options
- I understand that the default placement limit per pod (pod_capacity), which can be configured. I’m wondering if there’s any way to dedicate or scale resources for replicas that end up consolidated within a single pod to avoid contention.
- For example:
- How does performance differ between 30 topics with 1 replica each (all placed into one pod) versus 1 topic with 30 replicas (also consolidated into one pod)?
- Does Knative handle these scenarios differently in terms of resource allocation, threading, or consumption throughput?
I first asked this question in knative-extensions/eventing-kafka-broker#4551, but since the main scheduling logic appears to live in knative/eventing/pkg/scheduler, I re-post it here .
Any clarification or guidance on these aspects — especially regarding how to achieve better parallelism or enforce replica spreading — would be greatly appreciated.
Thanks in advance for your help!
@pierDipi @Cali0707 Maybe you have some insights for this. Comparing the following 2 scenarios with exact same total number of partitions, whether in a single topic or across all participating topics, equal to the per pod placement limit:
- With 1 pod, maxed out on placements from vreplica request targeting only a single consumergroup (i.e.1 topic) and thus creating only a single kafka client consumer instance and get assigned a total no. of partitions matching the placement limit; vs
- Similarly sized 1 pod, maxed out on placements from N number of vreplica requests targeting N number of consumergroups (i.e. N topics) and getting assigned the exact same total no. of partitions
Assuming the tasks in the partitions are the same across both set ups, should we expect performance to be different due to different no. of instances of vertx client consumer created? If so, which setup should be expected to be more performant?
What we've observed is that for the former, our throughput is lower with the kafka broker reporting only 1 instance of consumer being assigned all partitions while the latter will show a total of N instances of consumers each with 1 partition.
If so, this pattern does not seem to work well for scaling because placements for the same consumergroups are consolidated to a single consumer instance when placed in the same pod. In a world where there is only a single pod with 30 slots, a consumergroup will never be consumed any faster when vreplica is 1 or vreplica is 30.
If this observation is on the right track, is there a simple patch we can apply today to disable this consolidation when the vreplicas are placed into the same pod?