kafka-python kafka.errors.CommitFailedError: CommitFailedError: Commit cannot be completed since the group has already rebalanced

Hey, looking at my logs I can see lot of entries with following errors / warnings,

i have latest kafka-python==2.0.2, i also wondering if it has to do with this fixed defect: https://github.com/dpkp/kafka-python/pull/2064 , i'm also using default autocomit settings

WARNING:kafka.coordinator:Heartbeat failed for group ml-continue-shopping-live-etl because it is rebalancing WARNING:kafka.coordinator:Heartbeat failed for group ml-continue-shopping-live-etl because it is rebalancing WARNING:kafka.coordinator:Heartbeat failed for group ml-continue-shopping-live-etl because it is rebalancing WARNING:kafka.coordinator:Heartbeat failed for group ml-continue-shopping-live-etl because it is rebalancing WARNING:kafka.coordinator:Heartbeat session expired, marking coordinator dead WARNING:kafka.coordinator:Marking the coordinator dead (node coordinator-1965) for group ml-continue-shopping-live-etl: Heartbeat session expired. WARNING:kafka.coordinator:Heartbeat: local member_id was not recognized; this consumer needs to re-join WARNING:kafka.coordinator:Heartbeat: local member_id was not recognized; this consumer needs to re-join WARNING:kafka.coordinator.consumer:Auto offset commit failed for group ml-continue-shopping-live-etl: CommitFailedError: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max_poll_interval_ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the rebalance timeout with max_poll_interval_ms, or by reducing the maximum size of batches returned in poll() with max_poll_records.

ERROR:kafka.coordinator.consumer:Offset commit failed: This is likely to cause duplicate message delivery Traceback (most recent call last): File "/opt/conda/lib/python3.7/site-packages/kafka/coordinator/consumer.py", line 528, in _maybe_auto_commit_offsets_sync self.commit_offsets_sync(self._subscription.all_consumed_offsets()) File "/opt/conda/lib/python3.7/site-packages/kafka/coordinator/consumer.py", line 521, in commit_offsets_sync raise future.exception # pylint: disable-msg=raising-bad-type kafka.errors.CommitFailedError: CommitFailedError: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max_poll_interval_ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the rebalance timeout with max_poll_interval_ms, or by reducing the maximum size of batches returned in poll() with max_poll_records.

Dec 04 '20 16:12 robertmujica

I'm also having this problem and doesn't seem to happen on a Windows 10 WSL environment but happens once I put everything in Docker. Seems like it works for a few minutes when the worker comes online, then this shows up.

Dec 30 '20 17:12 dyerrington

hello any update about this sir ?

Feb 21 '23 13:02 marcianorama