bug: get upstream from etcd: has no healthy etcd endpoint available
Current Behavior
etcd configs
etcd:
host: # it's possible to define multiple etcd hosts addresses of the same etcd cluster
- "http://apisix-etcd:2379"
prefix: "/apisix" # apisix configurations prefix
timeout: 30 # 30 seconds
when i call admin api, it show me this
curl http://127.0.0.1:9180/apisix/admin/routes -H 'X-API-KEY:d8xxxxxxxxxxxxxxxxx'
{"error_msg":"has no healthy etcd endpoint available"}
i checked etcd,i didn't find any error
[root@apisix-f46b6f84d-ssbwt apisix]# curl http://apisix-etcd:2379/version
{"etcdserver":"3.5.4","etcdcluster":"3.5.0"}
etcdctl -w table member list
+------------------+---------+---------------+---------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
+------------------+---------+---------------+---------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+------------+
| 39e3f227d2d33a6d | started | apisix-etcd-2 | http://apisix-etcd-2.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380 | http://apisix-etcd-2.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2379,http://apisix-etcd.ingress-apisix.svc.cluster.local:2379 | false |
| 47324e080c98137d | started | apisix-etcd-1 | http://apisix-etcd-1.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380 | http://apisix-etcd-1.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2379,http://apisix-etcd.ingress-apisix.svc.cluster.local:2379 | false |
| e2571e968b89c849 | started | apisix-etcd-0 | http://apisix-etcd-0.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380 | http://apisix-etcd-0.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2379,http://apisix-etcd.ingress-apisix.svc.cluster.local:2379 | false |
+------------------+---------+---------------+---------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+------------+
:/opt/bitnami/etcd$ etcdctl -w table endpoint health
+----------------+--------+------------+-------+
| ENDPOINT | HEALTH | TOOK | ERROR |
+----------------+--------+------------+-------+
| 127.0.0.1:2379 | true | 2.631563ms | |
+----------------+--------+------------+-------+
etcdctl get --prefix /apisix/ | head
/apisix/consumers/
init_dir
/apisix/consumers/hmac_auth_appid_123456
{"username":"hmac_auth_appid_123456","plugins":{"hmac-auth":{"access_key":"xxxxx","clock_skew":300,"disable":false,"secret_key":"xxxxxx","signed_headers":["appid","token"],"validate_request_body":true}},"create_time":1642588052,"update_time":1647503172}
/apisix/consumers/hmac_auth_appid_992099001
{"username":"hmac_auth_appid_992099001","plugins":{"hmac-auth":{"access_key":"xxxx","clock_skew":300,"disable":false,"secret_key":"xxxxxx","signed_headers":["appid","token"],"validate_request_body":true}},"create_time":1647502495,"update_time":1647502495}
/apisix/consumers/key_auth_appid_123456
{"username":"key_auth_appid_123456","plugins":{"key-auth":{"disable":false,"key":"xxxx"},"limit-req":{"burst":0,"disable":false,"key":"http_x_forwarded_for","rate":1,"rejected_code":503}},"create_time":1648545605,"update_time":1648545984}
/apisix/consumers/key_auth_appid_67890
{"username":"key_auth_appid_67890","plugins":{"key-auth":{"disable":false,"key":"xxxx"}},"create_time":1655461361,"update_time":1655461361}
please give me help!
Expected Behavior
No response
Error Logs
594248 [lua] utils.lua:54: inject_conf_with_prev_conf(): failed to get upstream[/upstreams/c9e16880] from etcd: has no healthy etcd endpoint available, client: 10.17.5.99, server: , request: "PUT /apisix/admin/upstreams/c9e16880 HTTP/1.1", host: "apisix-admin.ingress-apisix.svc.cluster.local:9180"
Steps to Reproduce
i upgraded from version 2.10 to 2.15.2 using Helm
Environment
- APISIX version (run
apisix version): 2.15.2 - Operating system (run
uname -a): docker 18.09.9 - OpenResty / Nginx version (run
openresty -Vornginx -V):nginx version: openresty/1.21.4.1 - etcd version, if relevant (run
curl http://127.0.0.1:9090/v1/server_info): {"etcdserver":"3.5.4","etcdcluster":"3.5.0"} - APISIX Dashboard version, if relevant:
- Plugin runner version, for issues related to plugin runners:
- LuaRocks version, for installation issues (run
luarocks --version):
Maybe you shoud check the etcd's error log?
I did not find any error messages, and I even reinstalled etcd but the issue still persists.
I have found a similar issue and I'm not sure if it's related to this one,https://github.com/apache/apisix/issues/7162
I'm seeing this as well in my logs. no etcd errors during that time. I'm on the helm chart that gives apisix 3.1.0 debian
I found the corresponding function through the debug error log and noticed that the "new" function of etcd sets the link address of etcd through deployment.role. The default setting for apisix is traditional, and the code is as follows:
local function new()
local local_conf, err = fetch_local_conf()
if not local_conf then
return nil, nil, err
end
local etcd_conf = clone_tab(local_conf.etcd)
local proxy_by_conf_server = false
if local_conf.deployment then
if local_conf.deployment.role == "traditional"
-- we proxy the etcd requests in traditional mode so we can test the CP's behavior in
-- daily development. However, a stream proxy can't be the CP.
-- Hence, generate a HTTP conf server to proxy etcd requests in stream proxy is
-- unnecessary and inefficient.
and is_http
then
local sock_prefix = ngx_config_prefix
etcd_conf.unix_socket_proxy =
"unix:" .. sock_prefix .. "/conf/config_listen.sock"
etcd_conf.host = {"http://127.0.0.1:2379"}
proxy_by_conf_server = true
elseif local_conf.deployment.role == "control_plane" then
local addr = local_conf.deployment.role_control_plane.conf_server.listen
etcd_conf.host = {"https://" .. addr}
etcd_conf.tls = {
verify = false,
}
if has_mtls_support() and local_conf.deployment.certs.cert then
local cert = local_conf.deployment.certs.cert
local cert_key = local_conf.deployment.certs.cert_key
etcd_conf.tls.cert = cert
etcd_conf.tls.key = cert_key
end
However, in apisix-0.13.1 helm, the etcd configuration in https://github.com/apache/apisix-helm-chart/blob/apisix-0.13.1/charts/apisix/templates/configmap.yaml is set in the outermost layer using an old configuration method. This caused the configuration of etcd to not take effect, even though it was configured. This issue was resolved by adding deployment.role.
add etcd resource or instance
I experience has no healthy etcd endpoint available at startup, but it goes away after a few seconds.
@Aaron199 have you resolved your problem?
I'm using the apisix helm chart 2.7.0 (currently, the latest) and sometimes I have this same error.
Just deleting the apisix-ingress-controller pod it starts to work again.
The apisix-ingress-controller pod output:
I0519 16:17:41.437043 1 leaderelection.go:250] attempting to acquire leader lease apisix-gateway/ingress-apisix-leader...
I0519 16:17:41.451338 1 leaderelection.go:260] successfully acquired lease apisix-gateway/ingress-apisix-leader
2024-05-19T16:17:41+08:00 warn providers/controller.go:220 found a new leader apisix-ingress-controller-7b6f897698-rv4wq
2024-05-19T16:17:41+08:00 warn apisix/cluster.go:423 waiting cluster default to ready, it may takes a while
2024-05-19T16:17:41+08:00 error apisix/route.go:90 failed to list routes: unexpected status code 503; error message: {"error_msg":"has no healthy etcd endpoint available"}
2024-05-19T16:17:41+08:00 error apisix/cluster.go:298 failed to list routes in APISIX: unexpected status code 503; error message: {"error_msg":"has no healthy etcd endpoint available"}
2024-05-19T16:17:43+08:00 error apisix/route.go:90 failed to list routes: unexpected status code 503; error message: {"error_msg":"has no healthy etcd endpoint available"}
2024-05-19T16:17:43+08:00 error apisix/cluster.go:298 failed to list routes in APISIX: unexpected status code 503; error message: {"error_msg":"has no healthy etcd endpoint available"}
2024-05-19T16:17:45+08:00 error apisix/route.go:90 failed to list routes: unexpected status code 503; error message: {"error_msg":"has no healthy etcd endpoint available"}
2024-05-19T16:17:45+08:00 error apisix/cluster.go:298 failed to list routes in APISIX: unexpected status code 503; error message: {"error_msg":"has no healthy etcd endpoint available"}
2024-05-19T16:17:47+08:00 error apisix/route.go:90 failed to list routes: unexpected status code 503; error message: {"error_msg":"has no healthy etcd endpoint available"}
2024-05-19T16:17:47+08:00 error apisix/cluster.go:298 failed to list routes in APISIX: unexpected status code 503; error message: {"error_msg":"has no healthy etcd endpoint available"}
2024-05-19T16:17:49+08:00 error apisix/route.go:90 failed to list routes: unexpected status code 503; error message: {"error_msg":"has no healthy etcd endpoint available"}
2024-05-19T16:17:49+08:00 error apisix/cluster.go:298 failed to list routes in APISIX: unexpected status code 503; error message: {"error_msg":"has no healthy etcd endpoint available"}
2024-05-19T16:17:49+08:00 error apisix/cluster.go:258 failed to sync cache {"cost_time": "8.019457909s", "cluster": "default"}
2024-05-19T16:17:49+08:00 error providers/controller.go:419 failed to wait the default cluster to be ready: unexpected status code 503; error message: {"error_msg":"has no healthy etcd endpoint available"}
2024-05-19T16:17:49+08:00 error apisix/route.go:90 failed to list routes: Get "http://apisix-admin.apisix-gateway.svc.cluster.local:9180/apisix/admin/routes": context canceled
2024-05-19T16:17:49+08:00 error apisix/plugin.go:46 failed to list plugins' names: Get "http://apisix-admin.apisix-gateway.svc.cluster.local:9180/apisix/admin/plugins?all=true": context canceled
2024-05-19T16:17:49+08:00 error apisix/cluster.go:483 failed to list plugin names in APISIX: Get "http://apisix-admin.apisix-gateway.svc.cluster.local:9180/apisix/admin/plugins?all=true": context canceled
2024-05-19T16:17:49+08:00 error apisix/cluster.go:298 failed to list routes in APISIX: Get "http://apisix-admin.apisix-gateway.svc.cluster.local:9180/apisix/admin/routes": context canceled
2024-05-19T16:17:49+08:00 error apisix/cluster.go:446 failed to sync schema: Get "http://apisix-admin.apisix-gateway.svc.cluster.local:9180/apisix/admin/plugins?all=true": context canceled
2024-05-19T16:17:49+08:00 error apisix/cluster.go:258 failed to sync cache {"cost_time": "84.421µs", "cluster": "default"}
No errors on etc.
Some "errors" on apisix pod:
2024/05/20 07:35:17 [error] 50#50: *1482951 upstream timed out (110: Connection timed out) while connecting to upstream, client: 10.0.0.87, server: _, request: "POST /api/email/xxxxxx HTTP/1.1", upstream: "http://10.16.0.4:8080/api/email/xxxxxxx", host: "xxxxxx"
My current config:
replicaCount: 2
resources:
requests:
cpu: 20m
memory: 200Mi
limits:
cpu: 1
memory: 512Mi
ingress-controller:
enabled: true
gateway:
tls:
enabled: true
config:
logLevel: "warn"
apisix:
serviceNamespace: apisix-gateway
resources:
requests:
memory: 50Mi
etcd:
logLevel: warn
replicaCount: 1
pdb:
create: false
service:
loadBalancerIP: xxxxx
type: LoadBalancer
dashboard:
enabled: false
apisix:
enableServerTokens: false
ssl:
enabled: true
pluginAttrs:
redirect:
https_port: 443
nginx:
logs:
enableAccessLog: true
accessLogFormatEscape: json
accessLogFormat: '{"datetime":"$time_iso8601","http_referer":"$http_referer","host":"$host","remote_addr":"$remote_addr","request_method":"$request_method","request_time":"$request_time","request_uri":"$request_uri","http_status":"$status"}'
metrics:
serviceMonitor:
enabled: true
interval: 30s
labels:
release: 'kube-prometheus-stack'