apisix bug: get upstream from etcd: has no healthy etcd endpoint available

Current Behavior

etcd configs

etcd:
  host:                          # it's possible to define multiple etcd hosts addresses of the same etcd cluster
    - "http://apisix-etcd:2379"
  prefix: "/apisix"    # apisix configurations prefix
  timeout: 30    # 30 seconds

when i call admin api, it show me this

curl http://127.0.0.1:9180/apisix/admin/routes -H 'X-API-KEY:d8xxxxxxxxxxxxxxxxx'
{"error_msg":"has no healthy etcd endpoint available"}

i checked etcd,i didn't find any error

[root@apisix-f46b6f84d-ssbwt apisix]# curl http://apisix-etcd:2379/version
{"etcdserver":"3.5.4","etcdcluster":"3.5.0"}

etcdctl -w table member list
+------------------+---------+---------------+---------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+------------+
|        ID        | STATUS  |     NAME      |                                   PEER ADDRS                                    |                                                               CLIENT ADDRS                                                               | IS LEARNER |
+------------------+---------+---------------+---------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+------------+
| 39e3f227d2d33a6d | started | apisix-etcd-2 | http://apisix-etcd-2.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380 | http://apisix-etcd-2.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2379,http://apisix-etcd.ingress-apisix.svc.cluster.local:2379 |      false |
| 47324e080c98137d | started | apisix-etcd-1 | http://apisix-etcd-1.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380 | http://apisix-etcd-1.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2379,http://apisix-etcd.ingress-apisix.svc.cluster.local:2379 |      false |
| e2571e968b89c849 | started | apisix-etcd-0 | http://apisix-etcd-0.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380 | http://apisix-etcd-0.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2379,http://apisix-etcd.ingress-apisix.svc.cluster.local:2379 |      false |
+------------------+---------+---------------+---------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+------------+

:/opt/bitnami/etcd$ etcdctl -w table endpoint health
+----------------+--------+------------+-------+
|    ENDPOINT    | HEALTH |    TOOK    | ERROR |
+----------------+--------+------------+-------+
| 127.0.0.1:2379 |   true | 2.631563ms |       |
+----------------+--------+------------+-------+

etcdctl get --prefix /apisix/ | head
/apisix/consumers/
init_dir
/apisix/consumers/hmac_auth_appid_123456
{"username":"hmac_auth_appid_123456","plugins":{"hmac-auth":{"access_key":"xxxxx","clock_skew":300,"disable":false,"secret_key":"xxxxxx","signed_headers":["appid","token"],"validate_request_body":true}},"create_time":1642588052,"update_time":1647503172}
/apisix/consumers/hmac_auth_appid_992099001
{"username":"hmac_auth_appid_992099001","plugins":{"hmac-auth":{"access_key":"xxxx","clock_skew":300,"disable":false,"secret_key":"xxxxxx","signed_headers":["appid","token"],"validate_request_body":true}},"create_time":1647502495,"update_time":1647502495}
/apisix/consumers/key_auth_appid_123456
{"username":"key_auth_appid_123456","plugins":{"key-auth":{"disable":false,"key":"xxxx"},"limit-req":{"burst":0,"disable":false,"key":"http_x_forwarded_for","rate":1,"rejected_code":503}},"create_time":1648545605,"update_time":1648545984}
/apisix/consumers/key_auth_appid_67890
{"username":"key_auth_appid_67890","plugins":{"key-auth":{"disable":false,"key":"xxxx"}},"create_time":1655461361,"update_time":1655461361}

please give me help!

Expected Behavior

No response

Error Logs

594248 [lua] utils.lua:54: inject_conf_with_prev_conf(): failed to get upstream[/upstreams/c9e16880] from etcd: has no healthy etcd endpoint available, client: 10.17.5.99, server: , request: "PUT /apisix/admin/upstreams/c9e16880 HTTP/1.1", host: "apisix-admin.ingress-apisix.svc.cluster.local:9180"

Steps to Reproduce

i upgraded from version 2.10 to 2.15.2 using Helm

Environment

APISIX version (run apisix version): 2.15.2
Operating system (run uname -a): docker 18.09.9
OpenResty / Nginx version (run openresty -V or nginx -V):nginx version: openresty/1.21.4.1
etcd version, if relevant (run curl http://127.0.0.1:9090/v1/server_info): {"etcdserver":"3.5.4","etcdcluster":"3.5.0"}
APISIX Dashboard version, if relevant:
Plugin runner version, for issues related to plugin runners:
LuaRocks version, for installation issues (run luarocks --version):

Mar 07 '23 10:03 Aaron199

Maybe you shoud check the etcd's error log?

Mar 08 '23 03:03 monkeyDluffy6017

I did not find any error messages, and I even reinstalled etcd but the issue still persists.

Mar 09 '23 02:03 Aaron199

I have found a similar issue and I'm not sure if it's related to this one,https://github.com/apache/apisix/issues/7162

Mar 09 '23 02:03 Aaron199

I'm seeing this as well in my logs. no etcd errors during that time. I'm on the helm chart that gives apisix 3.1.0 debian

Mar 09 '23 16:03 coffeebe4code

I found the corresponding function through the debug error log and noticed that the "new" function of etcd sets the link address of etcd through deployment.role. The default setting for apisix is traditional, and the code is as follows:

local function new()
    local local_conf, err = fetch_local_conf()
    if not local_conf then
        return nil, nil, err
    end

    local etcd_conf = clone_tab(local_conf.etcd)
    local proxy_by_conf_server = false

    if local_conf.deployment then
        if local_conf.deployment.role == "traditional"
            -- we proxy the etcd requests in traditional mode so we can test the CP's behavior in
            -- daily development. However, a stream proxy can't be the CP.
            -- Hence, generate a HTTP conf server to proxy etcd requests in stream proxy is
            -- unnecessary and inefficient.
            and is_http
        then
            local sock_prefix = ngx_config_prefix
            etcd_conf.unix_socket_proxy =
                "unix:" .. sock_prefix .. "/conf/config_listen.sock"
            etcd_conf.host = {"http://127.0.0.1:2379"}
            proxy_by_conf_server = true

        elseif local_conf.deployment.role == "control_plane" then
            local addr = local_conf.deployment.role_control_plane.conf_server.listen
            etcd_conf.host = {"https://" .. addr}
            etcd_conf.tls = {
                verify = false,
            }

            if has_mtls_support() and local_conf.deployment.certs.cert then
                local cert = local_conf.deployment.certs.cert
                local cert_key = local_conf.deployment.certs.cert_key
                etcd_conf.tls.cert = cert
                etcd_conf.tls.key = cert_key
            end

However, in apisix-0.13.1 helm, the etcd configuration in https://github.com/apache/apisix-helm-chart/blob/apisix-0.13.1/charts/apisix/templates/configmap.yaml is set in the outermost layer using an old configuration method. This caused the configuration of etcd to not take effect, even though it was configured. This issue was resolved by adding deployment.role.

Mar 10 '23 03:03 Aaron199

add etcd resource or instance

Mar 27 '23 02:03 wood-zhang

I experience has no healthy etcd endpoint available at startup, but it goes away after a few seconds.

Dec 20 '23 04:12 kayx23

@Aaron199 have you resolved your problem?

Jan 04 '24 01:01 monkeyDluffy6017

I'm using the apisix helm chart 2.7.0 (currently, the latest) and sometimes I have this same error.

Just deleting the apisix-ingress-controller pod it starts to work again.

The apisix-ingress-controller pod output:

I0519 16:17:41.437043       1 leaderelection.go:250] attempting to acquire leader lease apisix-gateway/ingress-apisix-leader...
I0519 16:17:41.451338       1 leaderelection.go:260] successfully acquired lease apisix-gateway/ingress-apisix-leader
2024-05-19T16:17:41+08:00	warn	providers/controller.go:220	found a new leader apisix-ingress-controller-7b6f897698-rv4wq
2024-05-19T16:17:41+08:00	warn	apisix/cluster.go:423	waiting cluster default to ready, it may takes a while
2024-05-19T16:17:41+08:00	error	apisix/route.go:90	failed to list routes: unexpected status code 503; error message: {"error_msg":"has no healthy etcd endpoint available"}
2024-05-19T16:17:41+08:00	error	apisix/cluster.go:298	failed to list routes in APISIX: unexpected status code 503; error message: {"error_msg":"has no healthy etcd endpoint available"}
2024-05-19T16:17:43+08:00	error	apisix/route.go:90	failed to list routes: unexpected status code 503; error message: {"error_msg":"has no healthy etcd endpoint available"}
2024-05-19T16:17:43+08:00	error	apisix/cluster.go:298	failed to list routes in APISIX: unexpected status code 503; error message: {"error_msg":"has no healthy etcd endpoint available"}
2024-05-19T16:17:45+08:00	error	apisix/route.go:90	failed to list routes: unexpected status code 503; error message: {"error_msg":"has no healthy etcd endpoint available"}
2024-05-19T16:17:45+08:00	error	apisix/cluster.go:298	failed to list routes in APISIX: unexpected status code 503; error message: {"error_msg":"has no healthy etcd endpoint available"}
2024-05-19T16:17:47+08:00	error	apisix/route.go:90	failed to list routes: unexpected status code 503; error message: {"error_msg":"has no healthy etcd endpoint available"}
2024-05-19T16:17:47+08:00	error	apisix/cluster.go:298	failed to list routes in APISIX: unexpected status code 503; error message: {"error_msg":"has no healthy etcd endpoint available"}
2024-05-19T16:17:49+08:00	error	apisix/route.go:90	failed to list routes: unexpected status code 503; error message: {"error_msg":"has no healthy etcd endpoint available"}
2024-05-19T16:17:49+08:00	error	apisix/cluster.go:298	failed to list routes in APISIX: unexpected status code 503; error message: {"error_msg":"has no healthy etcd endpoint available"}
2024-05-19T16:17:49+08:00	error	apisix/cluster.go:258	failed to sync cache	{"cost_time": "8.019457909s", "cluster": "default"}
2024-05-19T16:17:49+08:00	error	providers/controller.go:419	failed to wait the default cluster to be ready: unexpected status code 503; error message: {"error_msg":"has no healthy etcd endpoint available"}
2024-05-19T16:17:49+08:00	error	apisix/route.go:90	failed to list routes: Get "http://apisix-admin.apisix-gateway.svc.cluster.local:9180/apisix/admin/routes": context canceled
2024-05-19T16:17:49+08:00	error	apisix/plugin.go:46	failed to list plugins' names: Get "http://apisix-admin.apisix-gateway.svc.cluster.local:9180/apisix/admin/plugins?all=true": context canceled
2024-05-19T16:17:49+08:00	error	apisix/cluster.go:483	failed to list plugin names in APISIX: Get "http://apisix-admin.apisix-gateway.svc.cluster.local:9180/apisix/admin/plugins?all=true": context canceled
2024-05-19T16:17:49+08:00	error	apisix/cluster.go:298	failed to list routes in APISIX: Get "http://apisix-admin.apisix-gateway.svc.cluster.local:9180/apisix/admin/routes": context canceled
2024-05-19T16:17:49+08:00	error	apisix/cluster.go:446	failed to sync schema: Get "http://apisix-admin.apisix-gateway.svc.cluster.local:9180/apisix/admin/plugins?all=true": context canceled
2024-05-19T16:17:49+08:00	error	apisix/cluster.go:258	failed to sync cache	{"cost_time": "84.421µs", "cluster": "default"}

No errors on etc.

Some "errors" on apisix pod:

2024/05/20 07:35:17 [error] 50#50: *1482951 upstream timed out (110: Connection timed out) while connecting to upstream, client: 10.0.0.87, server: _, request: "POST /api/email/xxxxxx HTTP/1.1", upstream: "http://10.16.0.4:8080/api/email/xxxxxxx", host: "xxxxxx"

My current config:

replicaCount: 2
resources:
  requests:
    cpu: 20m
    memory: 200Mi
  limits:
    cpu: 1
    memory: 512Mi
ingress-controller:
  enabled: true
  gateway:
    tls:
      enabled: true
  config:
    logLevel: "warn"
    apisix:
      serviceNamespace: apisix-gateway
  resources:
    requests:
      memory: 50Mi
etcd:
  logLevel: warn
  replicaCount: 1
  pdb:
    create: false
service:
  loadBalancerIP: xxxxx
  type: LoadBalancer
dashboard:
  enabled: false
apisix:
  enableServerTokens: false
  ssl:
    enabled: true
  pluginAttrs:
    redirect:
      https_port: 443
  nginx:
    logs:
      enableAccessLog: true
      accessLogFormatEscape: json
      accessLogFormat: '{"datetime":"$time_iso8601","http_referer":"$http_referer","host":"$host","remote_addr":"$remote_addr","request_method":"$request_method","request_time":"$request_time","request_uri":"$request_uri","http_status":"$status"}'
metrics:
  serviceMonitor:
    enabled: true
    interval: 30s
    labels:
      release: 'kube-prometheus-stack'

May 20 '24 08:05 JoniJnm