bug: dynamic upstream, one Eureka node is unavailable, half of the requests are lost after reloading
Current Behavior
为了实现动态upstream, 配置了两个eureka的ip地址
限制了一个eureka节点的访问, 持续请求无问题
如果重载了apisix,就会出现一半请求失败,稳定复现
discovery: eureka: host: - "http://10.250.200.99:8761" - "http://10.250.200.98:8761" prefix: "/eureka/" fetch_interval: 30 # 30s weight: 100 # default weight for node timeout: connect: 2000 # 2000ms send: 2000 # 2000ms read: 5000 # 5000ms
看起来受这个参数影响,如果设置抓取时间很短, 重载apisix后,会较快恢复 fetch_interval: 30
即使重载后,很短暂的几秒请求丢失, 对于nginx这种最重要的流量入口,也是不可接受的,所以希望能优化一下:
重载或重启时候,一个节点可以连接,一个节点不可用连接,因为两个eureka节点数据一致, 就可以拿到全量动态upstream的服务列表, 不能因为一个eureka节点连接不上,就导致请求一半失败!
Expected Behavior
No response
Error Logs
No response
Steps to Reproduce
1.apisix配置注册中心eureka,两个节点 discovery: eureka: host: - "http://10.250.200.99:8761" - "http://10.250.200.98:8761" prefix: "/eureka/" fetch_interval: 3 # 30s weight: 100 # default weight for node timeout: connect: 2000 # 2000ms send: 2000 # 2000ms read: 2000 # 5000ms
2.在一个eureka节点主机上禁用掉apisix的ip所有请求 iptables -A INPUT -s 10.250.200.202 -j DROP
3.持续curl请求,正常
4.systemctl reload apisix
5.持续curl请求,请求有一半失败
Environment
- APISIX version (run
apisix version): - Operating system (run
uname -a): - OpenResty / Nginx version (run
openresty -Vornginx -V): - etcd version, if relevant (run
curl http://127.0.0.1:9090/v1/server_info): - APISIX Dashboard version, if relevant:
- Plugin runner version, for issues related to plugin runners:
- LuaRocks version, for installation issues (run
luarocks --version):
Hi @liquanzhou, thanks for your report. We have verified that this issue does exist and will schedule a fix.
通过ai生成代码修复,测试是可以解决一个eureka节点宕机后,reload还能正常加载列表,可以当做一个参考
日志中会有持续的eureka节点失败的警告 2025/10/22 19:55:26 [warn] 7225#7225: *40628 [lua] init.lua:200: failed to fetch registry from http://10.250.200.98:8761/eureka/: timeout, context: ngx.timer
Hi @liquanzhou, welcome to submit a PR.