poseidon icon indicating copy to clipboard operation
poseidon copied to clipboard

Trouble producing if too many brokers are down

Open Ben-M opened this issue 10 years ago • 1 comments

I was experimenting with different failure conditions and found that if I shut down two of five Kafka brokers poseidon would often fail to produce, and instead responded with "Failed to send all messages". More details:

  • I ran zookeeper and 5 Kafka brokers locally (more or less as described here).
  • I created a topic with 5 partitions and a replication factor of 3: bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 3 --topic 3
  • I killed two of the brokers using an interrupt signal, and allowed time for the brokers to rebalance.
  • I attempted to produce:
producer = Poseidon::Producer.new(["127.0.0.1:9091", "127.0.0.1:9092", "127.0.0.1:9093", "127.0.0.1:9094", "127.0.0.1:9095"], "my_test_producer", {required_acks: 0 })
  messages = [Poseidon::MessageToSend.new('3', "value1")]
  producer.send_messages(messages)
  • I saw a RuntimeError with the message 'Failed to send all messages'

For certain combinations of brokers being down I was able to produce, for other combinations I was not able to produce.

Ben-M avatar Jan 05 '16 13:01 Ben-M

I investigated this a little further by comparing the broker metadata when Poseidon can produce and when it can't. It cannot produce when every partition in a topic has at least one replica offline. It can produce (to any partition) as long as one partition has all its replicas online. This also means that if you create a topic with a replication factor of 5 (on a 5 broker cluster) if any broker is down then Poseidon cannot produce.

Ben-M avatar Jan 07 '16 19:01 Ben-M