openwhisk icon indicating copy to clipboard operation
openwhisk copied to clipboard

ansible/openwhisk.yml fails for waiting kafka server started up

Open celery1124 opened this issue 5 years ago • 5 comments

Environment details:

  • local deployment
  • Ubuntu 16.04
  • Docker version 19.03.13, build 4484c46d9d

Steps to reproduce the issue:

  1. cd tools/ubuntu-setup && ./all.sh
  2. ansible-playbook setup.yml ; ansible-playbook prereq.yml (with envrionment variable setup for couchDB)
  3. ./gradlew distDocker
  4. ansible-playbook initdb.yml ; ansible-playbook wipe.yml
  5. ansible-playbook openwhisk.yml

Provide the actual results and outputs:

TASK [kafka : wait until the kafka server started up] ***********************************************************************************************************
Tuesday 01 December 2020  14:03:16 -0600 (0:00:27.886)       0:00:49.298 ******
FAILED - RETRYING: wait until the kafka server started up (10 retries left).
FAILED - RETRYING: wait until the kafka server started up (9 retries left).
FAILED - RETRYING: wait until the kafka server started up (8 retries left).
FAILED - RETRYING: wait until the kafka server started up (7 retries left).
FAILED - RETRYING: wait until the kafka server started up (6 retries left).
FAILED - RETRYING: wait until the kafka server started up (5 retries left).
FAILED - RETRYING: wait until the kafka server started up (4 retries left).
FAILED - RETRYING: wait until the kafka server started up (3 retries left).
FAILED - RETRYING: wait until the kafka server started up (2 retries left).
FAILED - RETRYING: wait until the kafka server started up (1 retries left).
fatal: [kafka0]: FAILED! => {"attempts": 10, "changed": true, "cmd": "(echo dump; sleep 1) | nc 172.17.0.1 2181 | grep /brokers/ids/0", "delta": "0:00:01.005511", "end": "2020-12-01 14:04:20.335370", "msg": "non-zero return code", "rc": 1, "start": "2020-12-01 14:04:19.329859", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}

[FAILED]
> (echo dump; sleep 1) | nc 172.17.0.1 2181 | grep /brokers/ids/0
non-zero return code

PLAY RECAP ******************************************************************************************************************************************************
kafka0                     : ok=9    changed=3    unreachable=0    failed=1

Additional information you deem important:

  • docker ps (not sure why kafka keeps restarting??)
CONTAINER ID        IMAGE                           COMMAND                  CREATED             STATUS                                  PORTS                                                                    NAMES
3980b79c4ad0        wurstmeister/kafka:2.12-2.3.1   "start-kafka.sh"         9 minutes ago       Restarting (1) Less than a second ago                                                                            kafka0
471187e2ba20        zookeeper:3.4                   "/docker-entrypoint.…"   10 minutes ago      Up 10 minutes                           0.0.0.0:2181->2181/tcp, 0.0.0.0:2888->2888/tcp, 0.0.0.0:3888->3888/tcp   zookeeper0
  • Tried this on a fresh Ubuntu 18.04 with same setup steps, no problem found.

celery1124 avatar Dec 01 '20 20:12 celery1124

any chance you're out of disk space? you can check the kafka logs - another reason is that kafka isn't able to reach zookeeper - which means networking issue. try sudo ifconfig lo0 alias 172.17.0.1/24.

rabbah avatar Dec 29 '20 20:12 rabbah

I am getting this same error, and it seems to be a problem of kafka not being able to keep a stable connection to zookeeper. Using Ubuntu 16.01

Relevant kafka log section:

[2021-04-01 17:54:53,847] INFO Initiating client connection, connectString=172.17.0.1:2181 sessionTimeout=6000 watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$@7e0b85f9 (org.apache.zookeeper.ZooKeeper)
[2021-04-01 17:54:53,892] INFO [ZooKeeperClient Kafka server] Waiting until connected. (kafka.zookeeper.ZooKeeperClient)
[2021-04-01 17:54:53,898] INFO Opening socket connection to server 172.17.0.1/172.17.0.1:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2021-04-01 17:54:59,896] INFO [ZooKeeperClient Kafka server] Closing. (kafka.zookeeper.ZooKeeperClient)
[2021-04-01 17:54:59,902] WARN Client session timed out, have not heard from server in 6012ms for sessionid 0x0 (org.apache.zookeeper.ClientCnxn)
[2021-04-01 17:55:00,009] INFO Session: 0x0 closed (org.apache.zookeeper.ZooKeeper)
[2021-04-01 17:55:00,012] INFO EventThread shut down for session: 0x0 (org.apache.zookeeper.ClientCnxn)
[2021-04-01 17:55:00,014] INFO [ZooKeeperClient Kafka server] Closed. (kafka.zookeeper.ZooKeeperClient)
[2021-04-01 17:55:00,019] ERROR Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
kafka.zookeeper.ZooKeeperClientTimeoutException: Timed out waiting for connection while in state: CONNECTING
	at kafka.zookeeper.ZooKeeperClient.$anonfun$waitUntilConnected$3(ZooKeeperClient.scala:258)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
	at kafka.zookeeper.ZooKeeperClient.waitUntilConnected(ZooKeeperClient.scala:254)
	at kafka.zookeeper.ZooKeeperClient.<init>(ZooKeeperClient.scala:112)
	at kafka.zk.KafkaZkClient$.apply(KafkaZkClient.scala:1826)
	at kafka.server.KafkaServer.createZkClient$1(KafkaServer.scala:364)
	at kafka.server.KafkaServer.initZkClient(KafkaServer.scala:387)
	at kafka.server.KafkaServer.startup(KafkaServer.scala:207)
	at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:38)
	at kafka.Kafka$.main(Kafka.scala:84)
	at kafka.Kafka.main(Kafka.scala)
[2021-04-01 17:55:00,022] INFO shutting down (kafka.server.KafkaServer)
[2021-04-01 17:55:00,032] INFO shut down completed (kafka.server.KafkaServer)
[2021-04-01 17:55:00,034] ERROR Exiting Kafka. (kafka.server.KafkaServerStartable)
[2021-04-01 17:55:00,039] INFO shutting down (kafka.server.KafkaServer)

aFuerst avatar Apr 01 '21 19:04 aFuerst

I am getting this same error, and it seems to be a problem of kafka not being able to keep a stable connection to zookeeper. Using Ubuntu 16.01

Relevant kafka log section:

[2021-04-01 17:54:53,847] INFO Initiating client connection, connectString=172.17.0.1:2181 sessionTimeout=6000 watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$@7e0b85f9 (org.apache.zookeeper.ZooKeeper)
[2021-04-01 17:54:53,892] INFO [ZooKeeperClient Kafka server] Waiting until connected. (kafka.zookeeper.ZooKeeperClient)
[2021-04-01 17:54:53,898] INFO Opening socket connection to server 172.17.0.1/172.17.0.1:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2021-04-01 17:54:59,896] INFO [ZooKeeperClient Kafka server] Closing. (kafka.zookeeper.ZooKeeperClient)
[2021-04-01 17:54:59,902] WARN Client session timed out, have not heard from server in 6012ms for sessionid 0x0 (org.apache.zookeeper.ClientCnxn)
[2021-04-01 17:55:00,009] INFO Session: 0x0 closed (org.apache.zookeeper.ZooKeeper)
[2021-04-01 17:55:00,012] INFO EventThread shut down for session: 0x0 (org.apache.zookeeper.ClientCnxn)
[2021-04-01 17:55:00,014] INFO [ZooKeeperClient Kafka server] Closed. (kafka.zookeeper.ZooKeeperClient)
[2021-04-01 17:55:00,019] ERROR Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
kafka.zookeeper.ZooKeeperClientTimeoutException: Timed out waiting for connection while in state: CONNECTING
	at kafka.zookeeper.ZooKeeperClient.$anonfun$waitUntilConnected$3(ZooKeeperClient.scala:258)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
	at kafka.zookeeper.ZooKeeperClient.waitUntilConnected(ZooKeeperClient.scala:254)
	at kafka.zookeeper.ZooKeeperClient.<init>(ZooKeeperClient.scala:112)
	at kafka.zk.KafkaZkClient$.apply(KafkaZkClient.scala:1826)
	at kafka.server.KafkaServer.createZkClient$1(KafkaServer.scala:364)
	at kafka.server.KafkaServer.initZkClient(KafkaServer.scala:387)
	at kafka.server.KafkaServer.startup(KafkaServer.scala:207)
	at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:38)
	at kafka.Kafka$.main(Kafka.scala:84)
	at kafka.Kafka.main(Kafka.scala)
[2021-04-01 17:55:00,022] INFO shutting down (kafka.server.KafkaServer)
[2021-04-01 17:55:00,032] INFO shut down completed (kafka.server.KafkaServer)
[2021-04-01 17:55:00,034] ERROR Exiting Kafka. (kafka.server.KafkaServerStartable)
[2021-04-01 17:55:00,039] INFO shutting down (kafka.server.KafkaServer)

I didn't dig much into this case since I found no issues on Ubuntu 18.04 (with the same scripts). Maybe you can try with a more up to date OS.

Mian

celery1124 avatar Apr 01 '21 19:04 celery1124

any chance you're out of disk space? you can check the kafka logs - another reason is that kafka isn't able to reach zookeeper - which means networking issue. try sudo ifconfig lo0 alias 172.17.0.1/24.

@rabbah I have same issue.

I met this error when try to alia lo, do you know how to fix it?

:/$ sudo ifconfig lo alias 172.17.0.1/24
alias: Host name lookup failure
ifconfig: `--help' gives usage information.

OS: Ubuntu 22.04.1 LTS

ifconfig

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 18592  bytes 2881713 (2.8 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 18592  bytes 2881713 (2.8 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

jemmy512 avatar Oct 19 '22 16:10 jemmy512

According to the logs, you guys need to check the sanity of zookeeper first. Is your zookeeper accessible from other containers?

style95 avatar Oct 20 '22 07:10 style95