bug: /status/ready endpoint always returns 503 in file-driven standalone mode
Current Behavior
When running APISIX 3.13.0 in file-driven standalone mode (deployment.role=data_plane, config_provider=yaml), the /status/ready health check endpoint always returns HTTP 503 with error "worker id: X has not received configuration", despite:
- Routes working correctly
- Configuration being successfully loaded from apisix.yaml
- All workers functioning normally
Example error response:
{"error":"worker id: 0 has not received configuration","status":"error"}
Expected Behavior
The /status/ready endpoint should return HTTP 200 with {"status":"ok"} when all workers have successfully loaded the configuration from the YAML file.
Error Logs
2025/01/10 00:41:47 [warn] 33#33: *3 [lua] init.lua:1003: status_ready(): worker id: 0 has not received configuration, context: ngx.timer
Steps to Reproduce
- Configure APISIX in file-driven standalone mode:
# config.yaml
deployment:
role: data_plane
role_data_plane:
config_provider: yaml
apisix:
enable_admin: false
- Create a valid apisix.yaml with routes
- Start APISIX
- Test the health check endpoint:
curl http://127.0.0.1:7085/status/ready
- Observe HTTP 503 error despite routes working correctly
Environment
- APISIX version: 3.13.0
- Operating System: Docker (apache/apisix:3.13.0-debian)
- OpenResty / Nginx version: From official image
- Deployment mode: data_plane with yaml config_provider
Root Cause Analysis (UPDATED)
After extensive debugging with added logging, I've identified the actual root cause. The issue occurs when the configuration file is rendered before APISIX starts (common in container environments):
Timing Issue:
- Configuration file (
apisix.yaml) is created by an entrypoint script before APISIX starts - Master process reads the file during startup, setting
apisix_yaml_mtimeglobal variable - Workers initialize and call
sync_status_to_shdict(false)marking themselves as unhealthy - Workers create timers that call
read_apisix_config()every second -
Critical bug:
read_apisix_config()checks if file mtime has changed:if apisix_yaml_mtime == last_modification_time then return -- File hasn't changed, return early end - Because the file was rendered before startup, the mtime never changes
-
update_config()is never called by workers - Workers remain marked as unhealthy forever
-
/status/readyendpoint fails perpetually
Debug Evidence:
Adding logging to config_yaml.lua confirmed:
-
update_config()is only called once by the master process (PID 1) during startup - Master's call to
sync_status_to_shdict(true)does nothing because it checksif process.type() ~= "worker" then return end - All 12 workers successfully create timers
- Timers fire every second but return early due to unchanged mtime
- Workers never call
update_config(), thus never callsync_status_to_shdict(true)
Relevant Code
apisix/core/config_yaml.lua - Lines ~565-585:
function _M.init_worker()
sync_status_to_shdict(false) -- Mark worker as unhealthy
if is_use_admin_api() then
apisix_yaml = {}
apisix_yaml_mtime = 0
return true
end
-- sync data in each non-master process
ngx.timer.every(1, read_apisix_config) -- Timer created but never calls update_config
return true
end
apisix/core/config_yaml.lua - Lines ~150-165:
local function read_apisix_config(premature, pre_mtime)
if premature then
return
end
local attributes, err = lfs.attributes(config_file.path)
if not attributes then
log.error("failed to fetch ", config_file.path, " attributes: ", err)
return
end
local last_modification_time = attributes.modification
if apisix_yaml_mtime == last_modification_time then
return -- BUG: Returns early, never calls update_config()
end
-- This code is never reached if file hasn't changed since startup
local config_new, err = config_file:parse()
if err then
log.error("failed to parse the content of file ", config_file.path, ": ", err)
return
end
update_config(config_new, last_modification_time)
log.warn("config file ", config_file.path, " reloaded.")
end
apisix/core/config_yaml.lua - Lines ~136-148:
local function sync_status_to_shdict(status)
if process.type() ~= "worker" then
return -- Master process calls are ignored
end
local dict_name = "status-report"
local key = worker_id()
local shdict = ngx.shared[dict_name]
local _, err = shdict:set(key, status)
if err then
log.error("failed to ", status and "set" or "clear",
" shdict " .. dict_name .. ", key=" .. key, ", err: ", err)
end
end
Proposed Solution
In init_worker(), immediately call update_config() after creating the timer to mark the worker as healthy:
function _M.init_worker()
sync_status_to_shdict(false)
if is_use_admin_api() then
apisix_yaml = {}
apisix_yaml_mtime = 0
return true
end
-- sync data in each non-master process
ngx.timer.every(1, read_apisix_config)
-- FIX: Mark worker as healthy immediately if config already loaded
if apisix_yaml then
update_config(apisix_yaml, apisix_yaml_mtime)
end
return true
end
This ensures workers are marked healthy on initialization, before the timer even fires. The timer will still update configuration when the file changes.
Verified Fix
I patched the code in a running container and confirmed:
- All 12 workers call
update_config()ininit_worker_by_lua*context -
/status/readyreturns{"status":"ok"}with HTTP 200 - Docker health check passes (container shows "healthy" status)
- Routes continue working correctly
Impact
This bug affects production deployments using:
- Kubernetes readiness probes with file-driven standalone mode
- Docker health checks
- Load balancers that depend on
/status/readyendpoint - Any container orchestration that renders config files before starting APISIX
The health check always fails, preventing proper deployment orchestration, even though APISIX is functioning correctly and serving traffic.
Additional Context
The bug is specific to the timing of when the configuration file is created relative to APISIX startup. If the file is created and never modified, workers never get marked as healthy. This is a common pattern in containerized deployments where entrypoint scripts render configuration from environment variables before starting the main process.
Hi @Falven, thanks for your report. I'm still trying to reproduce this issue.
Hi @Baoyuantop, I confirmed that workers stay unhealthy if the YAML config is present before they start. The master loads the file first, records its mtime, then each worker's timer runs read_apisix_config(), sees the same mtime, exits before update_config() runs, and never calls sync_status_to_shdict(true), so /status/ready keeps returning 503. Repro: place config.yaml and apisix.yaml (any route) in conf/, start APISIX, curl 127.0.0.1:7085/status/ready → 503; touch apisix.yaml; curl again → 200. Fix would be to let workers call update_config() once during init when the master already loaded the config.
Same here in "standalone"-mode in k8s
@Baoyuantop for reproduction maybe:
My value.yaml:
# gateway-values.yaml
apisix:
# from the website -> https://apisix.apache.org/docs/apisix/deployment-modes/
deployment:
mode: standalone
role: traditional
role_traditional:
config_provider: yaml
# NOT DEFAULT: https://github.com/apache/apisix-helm-chart/blob/apisix-2.12.1/charts/apisix/values.yaml
ssl:
enabled: true
nginx:
logs:
errorLogLevel: "debug" # just to check
# image:
# tag: 3.14.1-ubuntu
etcd:
enabled: false
ingress-controller:
enabled: true # false - Disable the built-in controller as we install it separately
config:
provider:
type : apisix-standalone
gatewayProxy:
createDefault: true
# needed to patch because of none arm654 support -> https://github.com/api7/adc/issues/332
deployment:
adcContainer:
image:
tag: "0.21.2"
config:
logLevel: "debug" # nor now to get it up and running
service:
type: LoadBalancer
The APISIX pod Error-Logs:
2025/10/23 10:53:36 [warn] 55#55: *27360 [lua] init.lua:1003: status_ready(): worker id: 0 has not received configuration, client: 10.244.0.129, server: , request: "GET /status/ready HTTP/1.1", host: "10.244.0.135:7085"
2025/10/23 10:53:46 [warn] 55#55: *27670 [lua] init.lua:1003: status_ready(): worker id: 1 has not received configuration, client: 10.244.0.129, server: , request: "GET /status/ready HTTP/1.1", host: "10.244.0.135:7085"
2025/10/23 10:53:56 [warn] 55#55: *27982 [lua] init.lua:1003: status_ready(): worker id: 0 has not received configuration, client: 10.244.0.129, server: , request: "GET /status/ready HTTP/1.1", host: "10.244.0.135:7085"
2025/10/23 10:54:06 [warn] 55#55: *28292 [lua] init.lua:1003: status_ready(): worker id: 1 has not received configuration, client: 10.244.0.129, server: , request: "GET /status/ready HTTP/1.1", host: "10.244.0.135:7085"
2025/10/23 10:54:16 [warn] 55#55: *28602 [lua] init.lua:1003: status_ready(): worker id: 0 has not received configuration, client: 10.244.0.129, server: , request: "GET /status/ready HTTP/1.1", host: "10.244.0.135:7085"
The APISIX Ingress pod Error-Logs:
2025-10-23T10:54:52Z debug client/client.go:168 syncing all resources
2025-10-23T10:54:52Z debug client/client.go:177 syncing resources with multiple configs {"configs": {"GatewayProxy/apisix-ingress/apisix-config":{"name":"GatewayProxy/apisix-ingress/apisix-config","serverAddrs":["http://apisix-admin.apisix-ingress.svc:9180"],"tlsVerify":false}}}
2025-10-23T10:54:52Z debug cache/store.go:232 get resources global rule items {"globalRuleItems": []}
2025-10-23T10:54:52Z debug client/client.go:219 syncing resources {"task": {"Key":"//","Name":"GatewayProxy/apisix-ingress/apisix-config-sync","Labels":null,"Configs":{"//":{"name":"GatewayProxy/apisix-ingress/apisix-config","serverAddrs":["http://apisix-admin.apisix-ingress.svc:9180"],"tlsVerify":false}},"ResourceTypes":null,"Resources":{}}}
2025-10-23T10:54:52Z debug client/client.go:295 generated adc file {"filename": "/tmp/adc-task-3401241526.json", "json": "{}"}
2025-10-23T10:54:52Z debug client/executor.go:257 running http sync {"serverAddrs": ["http://apisix-admin.apisix-ingress.svc:9180"], "mode": "apisix"}
2025-10-23T10:54:52Z debug client/executor.go:393 request body {"body": "{\"task\":{\"opts\":{\"backend\":\"apisix\",\"server\":[\"http://apisix-admin.apisix-ingress.svc:9180\"],\"token\":\"edd1c9f034335f136f87ad84b625c8f1\",\"tlsSkipVerify\":true,\"cacheKey\":\"GatewayProxy/apisix-ingress/apisix-config\"},\"config\":{}}}"}
2025-10-23T10:54:52Z debug client/executor.go:395 sending HTTP request to ADC Server {"url": "http://127.0.0.1:3000/sync", "server": "http://apisix-admin.apisix-ingress.svc:9180", "mode": "apisix", "cacheKey": "GatewayProxy/apisix-ingress/apisix-config", "labelSelector": {}, "includeResourceType": [], "tlsSkipVerify": true}
2025-10-23T10:54:52Z debug client/executor.go:422 received HTTP response from ADC Server {"server": "http://apisix-admin.apisix-ingress.svc:9180", "status": 500, "response": "{\"message\":\"Error: connect ECONNREFUSED 10.96.43.108:9180\"}"}
2025-10-23T10:54:52Z error client/executor.go:261 failed to run http sync for server {"server": "http://apisix-admin.apisix-ingress.svc:9180", "error": "ServerAddr: http://apisix-admin.apisix-ingress.svc:9180, Err: HTTP 500: {\"message\":\"Error: connect ECONNREFUSED 10.96.43.108:9180\"}"}
2025-10-23T10:54:52Z error client/client.go:255 failed to execute adc command {"error": "ADC execution error for GatewayProxy/apisix-ingress/apisix-config: [ServerAddr: http://apisix-admin.apisix-ingress.svc:9180, Err: HTTP 500: {\"message\":\"Error: connect ECONNREFUSED 10.96.43.108:9180\"}]", "config": {"name":"GatewayProxy/apisix-ingress/apisix-config","serverAddrs":["http://apisix-admin.apisix-ingress.svc:9180"],"tlsVerify":false}}
2025-10-23T10:54:52Z error client/client.go:200 failed to sync resources {"name": "GatewayProxy/apisix-ingress/apisix-config", "error": "ADC execution errors: [ADC execution error for GatewayProxy/apisix-ingress/apisix-config: [ServerAddr: http://apisix-admin.apisix-ingress.svc:9180, Err: HTTP 500: {\"message\":\"Error: connect ECONNREFUSED 10.96.43.108:9180\"}]]"}
2025-10-23T10:54:52Z debug cache/store.go:232 get resources global rule items {"globalRuleItems": []}
2025-10-23T10:54:52Z debug apisix/provider.go:273 handled ADC execution errors {"status_record": {"GatewayProxy/apisix-ingress/apisix-config":{"Errors":[{"Name":"GatewayProxy/apisix-ingress/apisix-config","FailedErrors":[{"Err":"HTTP 500: {\"message\":\"Error: connect ECONNREFUSED 10.96.43.108:9180\"}","ServerAddr":"http://apisix-admin.apisix-ingress.svc:9180","FailedStatuses":null}]}]}}, "status_update": {}}
2025-10-23T10:54:52Z error apisix/provider.go:251 failed to sync 1 configs: GatewayProxy/apisix-ingress/apisix-config
Inside the apisix-pod the apisix.yaml
routes:
-
uri: /hi
upstream:
nodes:
"127.0.0.1:1980": 1
type: roundrobin
#END
The config.yaml
1 #
2 # Licensed to the Apache Software Foundation (ASF) under one or more
3 # contributor license agreements. See the NOTICE file distributed with
4 # this work for additional information regarding copyright ownership.
5 # The ASF licenses this file to You under the Apache License, Version 2.0
6 # (the "License"); you may not use this file except in compliance with
7 # the License. You may obtain a copy of the License at
8 #
9 # http://www.apache.org/licenses/LICENSE-2.0
10 #
11 # Unless required by applicable law or agreed to in writing, software
12 # distributed under the License is distributed on an "AS IS" BASIS,
13 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14 # See the License for the specific language governing permissions and
15 # limitations under the License.
16 #
17 apisix: # universal configurations
18 node_listen: # APISIX listening port
19 - 9080
20 enable_heartbeat: true
21 enable_admin: true
22 enable_admin_cors: true
23 enable_debug: false
24
25 enable_control: true
26 control:
27 ip: 127.0.0.1
28 port: 9090
29
30 enable_dev_mode: false # Sets nginx worker_processes to 1 if set to true
31 enable_reuseport: true # Enable nginx SO_REUSEPORT switch if set to true.
32 enable_ipv6: true # Enable nginx IPv6 resolver
33 enable_http2: true
34 enable_server_tokens: true # Whether the APISIX version number should be shown in Server header
35
36 # proxy_protocol: # Proxy Protocol configuration
37 # listen_http_port: 9181 # The port with proxy protocol for http, it differs from node_listen and admin_listen.
38 # # This port can only receive http request with proxy protocol, but node_listen & admin_listen
39 # # can only receive http request. If you enable proxy protocol, you must use this port to
40 # # receive http request with proxy protocol
41 # listen_https_port: 9182 # The port with proxy protocol for https
42 # enable_tcp_pp: true # Enable the proxy protocol for tcp proxy, it works for stream_proxy.tcp option
43 # enable_tcp_pp_to_upstream: true # Enables the proxy protocol to the upstream server
44
45 proxy_cache: # Proxy Caching configuration
46 cache_ttl: 10s # The default caching time if the upstream does not specify the cache time
47 zones: # The parameters of a cache
48 - name: disk_cache_one # The name of the cache, administrator can be specify
49 # which cache to use by name in the admin api
50 memory_size: 50m # The size of shared memory, it's used to store the cache index
51 disk_size: 1G # The size of disk, it's used to store the cache data
52 disk_path: "/tmp/disk_cache_one" # The path to store the cache data
53 cache_levels: "1:2" # The hierarchy levels of a cache
54 # - name: disk_cache_two
55 # memory_size: 50m
56 # disk_size: 1G
57 # disk_path: "/tmp/disk_cache_two"
58 # cache_levels: "1:2"
59
60 router:
61 http: radixtree_host_uri # radixtree_uri: match route by uri(base on radixtree)
62 # radixtree_host_uri: match route by host + uri(base on radixtree)
63 # radixtree_uri_with_parameter: match route by uri with parameters
64 ssl: 'radixtree_sni' # radixtree_sni: match route by SNI(base on radixtree)
65
66 proxy_mode: http
67 stream_proxy: # TCP/UDP proxy
68 tcp: # TCP proxy port list
69 - 9100
70 udp: # UDP proxy port list
71 - 9200
72 # dns_resolver:
73 #
74 # - 127.0.0.1
75 #
76 # - 172.20.0.10
77 #
78 # - 114.114.114.114
79 #
80 # - 223.5.5.5
81 #
82 # - 1.1.1.1
83 #
84 # - 8.8.8.8
85 #
86 dns_resolver_valid: 30
87 resolver_timeout: 5
88 ssl:
89 enable: true
90 listen:
91 - port: 9443
92 enable_http3: false
93 ssl_protocols: "TLSv1.2 TLSv1.3"
94 ssl_ciphers: "ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:DHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA:ECDHE-RSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES256-SHA256:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:DES-CBC3-SHA"
95 status:
96 ip: 0.0.0.0
97 port: 7085
98
99 nginx_config: # config for render the template to genarate nginx.conf
100 error_log: "/dev/stderr"
101 error_log_level: "debug" # warn,error
102 worker_processes: "auto"
103 enable_cpu_affinity: true
104 worker_rlimit_nofile: 20480 # the number of files a worker process can open, should be larger than worker_connections
105 event:
106 worker_connections: 10620
107 http:
108 enable_access_log: true
109 access_log: "/dev/stdout"
110 access_log_format: '$remote_addr - $remote_user [$time_local] $http_host \"$request\" $status $body_bytes_sent $request_time \"$http_referer\" \"$http_user_agent\" $upstream_addr $upstream_status $upstream_response_time \"$upstream_scheme://$upstream_host$upstream_uri\"'
111 access_log_format_escape: default
112 keepalive_timeout: "60s"
113 client_header_timeout: 60s # timeout for reading client request header, then 408 (Request Time-out) error is returned to the client
114 client_body_timeout: 60s # timeout for reading client request body, then 408 (Request Time-out) error is returned to the client
115 send_timeout: 10s # timeout for transmitting a response to the client.then the connection is closed
116 underscores_in_headers: "on" # default enables the use of underscores in client request header fields
117 real_ip_header: "X-Real-IP" # http://nginx.org/en/docs/http/ngx_http_realip_module.html#real_ip_header
118 real_ip_from: # http://nginx.org/en/docs/http/ngx_http_realip_module.html#set_real_ip_from
119 - 127.0.0.1
120 - 'unix:'
121
122 deployment:
123 role: traditional
124 role_traditional:
125 config_provider: yaml
126 admin:
127 allow_admin: # http://nginx.org/en/docs/http/ngx_http_access_module.html#allow
128 - 127.0.0.1/24
129 - 0.0.0.0/0
130 # - "::/64"
131 admin_listen:
132 ip: 0.0.0.0
133 port: 9180
134 # Default token when use API to call for Admin API.
135 # *NOTE*: Highly recommended to modify this value to protect APISIX's Admin API.
136 # Disabling this configuration item means that the Admin API does not
137 # require any authentication.
138 admin_key:
139 # admin: can everything for configuration data
140 - name: "admin"
141 key: edd1c9f034335f136f87ad84b625c8f1
142 role: admin
143 # viewer: only can view configuration data
144 - name: "viewer"
145 key: 4054f7cf07e344346cd3f287985e76a2
146 role: viewer
Seeing the same issues as @michael-riha with pretty much the same setup. Trying to follow this in the Getting Started setup for apisix ingresss controller. In the guide it does not set apisix.deployment.mode to "standalone", but I tried this as well with the same result.
apisix:
deployment:
# mode: standalone
role: traditional
role_traditional:
config_provider: yaml
etcd:
enabled: false
ingress-controller:
enabled: true
config:
provider:
type: apisix-standalone
apisix:
adminService:
namespace: apisix
gatewayPorxy:
createDefault: true
deployment:
image:
repository: docker.io/apache/apisix-ingress-controller
After verification, it was found that this problem does exist. We will arrange to fix it and also welcome the community to submit PR fixes.
I tested out @Falven proposed solution, but I could not get apisix running with this in standalone api-driven mode. I got it working by force loading the config. I have not been able to continue testing in my OpenShift environment as I have encountered a separate issue desribed here https://github.com/apache/apisix-ingress-controller/issues/2656
_M.init_worker = function()
sync_status_to_shdict(false)
if is_use_admin_api() then
apisix_yaml = {}
apisix_yaml_mtime = 0
update_config(apisix_yaml, apisix_yaml_mtime) -- mark ready for admin API
return true
end
-- force load config if not already loaded
if not apisix_yaml then
read_apisix_config()
end
ngx.timer.every(1, read_apisix_config)
-- mark worker as healthy immediately
if apisix_yaml then
update_config(apisix_yaml, apisix_yaml_mtime)
end
return true
end
Any update on this? Should be an easy fix - I provided it in my bug report.
I tested out @Falven proposed solution, but I could not get apisix running with this in standalone api-driven mode. I got it working by force loading the config. I have not been able to continue testing in my OpenShift environment as I have encountered a separate issue desribed here apache/apisix-ingress-controller#2656
_M.init_worker = function() sync_status_to_shdict(false)
if is_use_admin_api() then apisix_yaml = {} apisix_yaml_mtime = 0 update_config(apisix_yaml, apisix_yaml_mtime) -- mark ready for admin API return true end -- force load config if not already loaded if not apisix_yaml then read_apisix_config() end ngx.timer.every(1, read_apisix_config) -- mark worker as healthy immediately if apisix_yaml then update_config(apisix_yaml, apisix_yaml_mtime) end return trueend
An easy way around it is to just touch the config post startup.
I was struggling with this too and revisited the install guide. The problem has gone away after adding
ingress-controller:
gatewayProxy:
createDefault=true
In my values.yaml. With this present, new apisix instances will get a PUT /apisix/admin/configs HTTP/1.1 call immediately after startup and start passing the health checks. When I delete the gatewayProxy resource and restart an apisix pod, it goes back to the worker id: 0 has not received configuration errors. I will use this as a workaround (for now).
@sebgott you have a typo in gatewayPorxy maybe thats why it is not working?
I was struggling with this too and revisited the install guide. The problem has gone away after adding
ingress-controller: gatewayProxy: createDefault=trueIn my
values.yaml. With this present, new apisix instances will get aPUT /apisix/admin/configs HTTP/1.1call immediately after startup and start passing the health checks. When I delete the gatewayProxy resource and restart an apisix pod, it goes back to theworker id: 0 has not received configurationerrors. I will use this as a workaround (for now).@sebgott you have a typo in gatewayPorxy maybe thats why it is not working?
I did fix this type at some point, but not sure when. I tried redeploying it locally without the custom image I made, and I get the same issue where configuration is not loading, so I do not think its because of the typo.
can confirm, tried to install fresh new installation - same behavior:
[warn] 67#67: *36705 [lua] init.lua:1003: status_ready(): worker id: 13 has not received configuration, client: 10.244.2.1, server: , request: "GET /status/ready HTTP/1.1", host: "10.244.2.2:7085