name server addresses used by rocketmq-dashboard failed to be updated at runtime
BUG REPORT
- Please describe the issue you observed:
设置name-server hostNetwork 为false时,console的启动参数JAVA_OPTS=-Drocketmq.namesrv.addr=10.244.23.205:9876是一个不存在的集群pod Ip,而不是实际真实name-server的pod Ip,导致console无法正常访问集群而报错。
- Please tell us about your environment:
k8s 1.24.6 rocketmq 5.0.0 console: apacherocketmq/rocketmq-dashboard:1.0.0
- cluster pod info
[root@dev-k8s-master1 rocketmq-operator]# k -n rocketmq get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
broker-0-master-0 1/1 Running 0 15m 10.244.23.216 dev-k8s-node1 <none> <none>
broker-0-replica-1-0 1/1 Running 0 15m 10.244.23.168 dev-k8s-node2 <none> <none>
console-6db8c44df4-vckqn 1/1 Running 0 15m 10.244.23.218 dev-k8s-node1 <none> <none>
name-service-0 1/1 Running 0 15m 10.244.23.219 dev-k8s-node1 <none> <none>
[root@dev-k8s-master1 rocketmq-operator]# k -n rocketmq exec console-6db8c44df4-vckqn -it -- bash
root@console-6db8c44df4-vckqn:/# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 4324 648 ? Ss 10:41 0:00 sh -c java $JAVA_OPTS -jar /rocketmq-console-ng.jar
root 7 8.8 3.2 9895168 527124 ? Sl 10:41 1:20 java -Drocketmq.namesrv.addr=10.244.23.205:9876; -Dcom.rocketmq.sendMessageWithVIPChannel=false -jar /rocketmq
root 450 0.6 0.0 21944 2132 pts/0 Ss 10:56 0:00 bash
root 456 0.0 0.0 19176 1300 pts/0 R+ 10:56 0:00 ps aux
root@console-6db8c44df4-vckqn:/# ps -ef | grep 7
root 7 1 8 10:41 ? 00:01:21 java -Drocketmq.namesrv.addr=10.244.23.205:9876; -Dcom.rocketmq.sendMessageWithVIPChannel=false -jar /rocketmq-console-ng.jar
root 457 450 0 10:56 pts/0 00:00:00 ps -ef
root 458 450 0 10:56 pts/0 00:00:00 grep 7
Have name sever redeployed (with IP changed)? It seems the value of rocketmq.namesrv.addr may not be updated.
BUG REPORT
- Please describe the issue you observed:
设置name-server hostNetwork 为false时,console的启动参数JAVA_OPTS=-Drocketmq.namesrv.addr=10.244.23.205:9876是一个不存在的集群pod Ip,而不是实际真实name-server的pod Ip,导致console无法正常访问集群而报错。
- Please tell us about your environment:
k8s 1.24.6 rocketmq 5.0.0 console: apacherocketmq/rocketmq-dashboard:1.0.0
- cluster pod info
[root@dev-k8s-master1 rocketmq-operator]# k -n rocketmq get pod -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES broker-0-master-0 1/1 Running 0 15m 10.244.23.216 dev-k8s-node1 <none> <none> broker-0-replica-1-0 1/1 Running 0 15m 10.244.23.168 dev-k8s-node2 <none> <none> console-6db8c44df4-vckqn 1/1 Running 0 15m 10.244.23.218 dev-k8s-node1 <none> <none> name-service-0 1/1 Running 0 15m 10.244.23.219 dev-k8s-node1 <none> <none>[root@dev-k8s-master1 rocketmq-operator]# k -n rocketmq exec console-6db8c44df4-vckqn -it -- bash root@console-6db8c44df4-vckqn:/# ps aux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.0 4324 648 ? Ss 10:41 0:00 sh -c java $JAVA_OPTS -jar /rocketmq-console-ng.jar root 7 8.8 3.2 9895168 527124 ? Sl 10:41 1:20 java -Drocketmq.namesrv.addr=10.244.23.205:9876; -Dcom.rocketmq.sendMessageWithVIPChannel=false -jar /rocketmq root 450 0.6 0.0 21944 2132 pts/0 Ss 10:56 0:00 bash root 456 0.0 0.0 19176 1300 pts/0 R+ 10:56 0:00 ps aux root@console-6db8c44df4-vckqn:/# ps -ef | grep 7 root 7 1 8 10:41 ? 00:01:21 java -Drocketmq.namesrv.addr=10.244.23.205:9876; -Dcom.rocketmq.sendMessageWithVIPChannel=false -jar /rocketmq-console-ng.jar root 457 450 0 10:56 pts/0 00:00:00 ps -ef root 458 450 0 10:56 pts/0 00:00:00 grep 7
你的rocketmq 客户端能成功生产消息吗?如果客户端部署在node外客户端和broker间的网络是不通的,这个时候客户端连接不上broker也就不能生产消息。虽然客户端可以成功连接nameservice(nemeservice 的ip是弄的ip,客户端到node的网络是通的),但是从nameservice拉取的broker ip是pod ip,由于client 和 pod网络不通所以不能正常使用。
BUG REPORT
- Please describe the issue you observed:
设置name-server hostNetwork 为false时,console的启动参数JAVA_OPTS=-Drocketmq.namesrv.addr=10.244.23.205:9876是一个不存在的集群pod Ip,而不是实际真实name-server的pod Ip,导致console无法正常访问集群而报错。
- Please tell us about your environment:
k8s 1.24.6 rocketmq 5.0.0 console: apacherocketmq/rocketmq-dashboard:1.0.0
- cluster pod info
[root@dev-k8s-master1 rocketmq-operator]# k -n rocketmq get pod -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES broker-0-master-0 1/1 Running 0 15m 10.244.23.216 dev-k8s-node1 <none> <none> broker-0-replica-1-0 1/1 Running 0 15m 10.244.23.168 dev-k8s-node2 <none> <none> console-6db8c44df4-vckqn 1/1 Running 0 15m 10.244.23.218 dev-k8s-node1 <none> <none> name-service-0 1/1 Running 0 15m 10.244.23.219 dev-k8s-node1 <none> <none>[root@dev-k8s-master1 rocketmq-operator]# k -n rocketmq exec console-6db8c44df4-vckqn -it -- bash root@console-6db8c44df4-vckqn:/# ps aux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.0 4324 648 ? Ss 10:41 0:00 sh -c java $JAVA_OPTS -jar /rocketmq-console-ng.jar root 7 8.8 3.2 9895168 527124 ? Sl 10:41 1:20 java -Drocketmq.namesrv.addr=10.244.23.205:9876; -Dcom.rocketmq.sendMessageWithVIPChannel=false -jar /rocketmq root 450 0.6 0.0 21944 2132 pts/0 Ss 10:56 0:00 bash root 456 0.0 0.0 19176 1300 pts/0 R+ 10:56 0:00 ps aux root@console-6db8c44df4-vckqn:/# ps -ef | grep 7 root 7 1 8 10:41 ? 00:01:21 java -Drocketmq.namesrv.addr=10.244.23.205:9876; -Dcom.rocketmq.sendMessageWithVIPChannel=false -jar /rocketmq-console-ng.jar root 457 450 0 10:56 pts/0 00:00:00 ps -ef root 458 450 0 10:56 pts/0 00:00:00 grep 7你的rocketmq 客户端能成功生产消息吗?如果客户端部署在node外客户端和broker间的网络是不通的,这个时候客户端连接不上broker也就不能生产消息。虽然客户端可以成功连接nameservice(nemeservice 的ip是弄的ip,客户端到node的网络是通的),但是从nameservice拉取的broker ip是pod ip,由于client 和 pod网络不通所以不能正常使用。
rocketmq客户端可以成功消费消息。 我把name-service改成非host网络模式的目的是:mq的客户端不想通过nodeIp访问(包括把name-service固定在某个node上),是想通过k8s svc或ingress访问,请问有什么办法吗?
这应该是一个bug,我重新部署了一遍,观察了下rocketmq-operator pod的日志,controller_nameservice 第一次 Check the NameServers输出"NameServersStr": "10.244.23.248:9876;",第二次 Check the NameServers 输出share.NameServersStr:10.244.23.172:9876,但console pod启动的时候用的是controller_nameservice第一次输出的NameServersStr的值作为share.NameServersStr 的值传给了环境变量,但实际的name-service pod的ip是第二次输出的那个值。
2022-11-16T10:10:11.220Z INFO controller_broker Reconciling Broker. {"Request.Namespace": "rocketmq", "Request.Name": "broker"}
2022-11-16T10:10:11.220Z INFO controller_broker brokerGroupNum=1, replicaPerGroup=1 {"Request.Namespace": "rocketmq", "Request.Name": "broker"}
2022-11-16T10:10:11.220Z INFO controller_broker Check Broker cluster 1/1 {"Request.Namespace": "rocketmq", "Request.Name": "broker"}
2022-11-16T10:10:11.220Z INFO controller_broker Creating a new Master Broker StatefulSet. {"Request.Namespace": "rocketmq", "Request.Name": "broker", "StatefulSet.Namespace": "rocketmq", "StatefulSet.Name": "broker-0-master"}
2022-11-16T10:10:11.265Z INFO controller_broker Check Replica Broker of cluster-0 1/1 {"Request.Namespace": "rocketmq", "Request.Name": "broker"}
2022-11-16T10:10:11.265Z INFO controller_broker Creating a new Replica Broker StatefulSet. {"Request.Namespace": "rocketmq", "Request.Name": "broker", "StatefulSet.Namespace": "rocketmq", "StatefulSet.Name": "broker-0-replica-1"}
2022-11-16T10:10:11.267Z INFO controller_nameservice Reconciling NameService {"Request.Namespace": "rocketmq", "Request.Name": "name-service"}
2022-11-16T10:10:11.289Z INFO controller_broker broker.Status.Nodes length = 0
2022-11-16T10:10:11.289Z INFO controller_broker podNames length = 0
2022-11-16T10:10:11.289Z INFO controller_broker broker.Status.Size = 0
2022-11-16T10:10:11.289Z INFO controller_broker broker.Spec.Size = 1
2022-11-16T10:10:11.297Z INFO controller_nameservice Reconciling NameService {"Request.Namespace": "rocketmq", "Request.Name": "name-service"}
2022-11-16T10:10:11.298Z INFO controller_nameservice Check the NameServers status {"Request.Namespace": "rocketmq", "Request.Name": "name-service"}
2022-11-16T10:10:11.298Z INFO controller_nameservice Share variables {"Request.Namespace": "rocketmq", "Request.Name": "name-service", "GroupNum": 1, "NameServersStr": "10.244.23.248:9876;", "IsNameServersStrUpdated": false, "IsNameServersStrInitialized": true, "BrokerClusterName": "broker"}
2022-11-16T10:10:11.327Z ERROR controller_broker Failed to update Broker Size status. {"Request.Namespace": "rocketmq", "Request.Name": "broker", "error": "Broker.rocketmq.apache.org \"broker\" is invalid: status.nodes: Required value"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227
2022-11-16T10:10:11.467Z INFO controller_console Reconciling Console {"Request.Namespace": "rocketmq", "Request.Name": "console"}
2022-11-16T10:10:11.467Z INFO controller_console Creating RocketMQ Console Deployment {"Request.Namespace": "rocketmq", "Request.Name": "console", "Namespace": {"namespace": "rocketmq", "name": "console"}, "Name": "console"}
2022-11-16T10:10:17.299Z INFO controller_nameservice Reconciling NameService {"Request.Namespace": "rocketmq", "Request.Name": "name-service"}
2022-11-16T10:10:17.299Z INFO controller_nameservice Check the NameServers status {"Request.Namespace": "rocketmq", "Request.Name": "name-service"}
2022-11-16T10:10:17.299Z INFO controller_nameservice share.NameServersStr:10.244.23.172:9876 {"Request.Namespace": "rocketmq", "Request.Name": "name-service"}
2022-11-16T10:10:17.299Z INFO controller_nameservice oldNameServerListStr:10.244.23.172:9876 {"Request.Namespace": "rocketmq", "Request.Name": "name-service"}
2022-11-16T10:10:17.328Z INFO controller_broker Reconciling Broker. {"Request.Namespace": "rocketmq", "Request.Name": "broker"}
2022-11-16T10:10:17.328Z INFO controller_broker brokerGroupNum=1, replicaPerGroup=1 {"Request.Namespace": "rocketmq", "Request.Name": "broker"}
2022-11-16T10:10:17.328Z INFO controller_broker Check Broker cluster 1/1 {"Request.Namespace": "rocketmq", "Request.Name": "broker"}
2022-11-16T10:10:17.328Z INFO controller_broker Check Replica Broker of cluster-0 1/1 {"Request.Namespace": "rocketmq", "Request.Name": "broker"}
2022-11-16T10:10:17.329Z INFO controller_broker broker.Status.Nodes length = 0
2022-11-16T10:10:17.329Z INFO controller_broker podNames length = 2
2022-11-16T10:10:17.329Z INFO controller_broker broker.Status.Size = 0
2022-11-16T10:10:17.329Z INFO controller_broker broker.Spec.Size = 1
2022-11-16T10:10:17.342Z INFO controller_nameservice Updated the NameServers status with the host IP {"Request.Namespace": "rocketmq", "Request.Name": "name-service"}
2022-11-16T10:10:17.342Z INFO controller_nameservice NameServers IP 0: 10.244.23.172 {"Request.Namespace": "rocketmq", "Request.Name": "name-service"}
2022-11-16T10:10:17.342Z INFO controller_nameservice Share variables {"Request.Namespace": "rocketmq", "Request.Name": "name-service", "GroupNum": 1, "NameServersStr": "10.244.23.172:9876;", "IsNameServersStrUpdated": false, "IsNameServersStrInitialized": true, "BrokerClusterName": "broker"}
2022-11-16T10:10:17.343Z INFO controller_nameservice Reconciling NameService {"Request.Namespace": "rocketmq", "Request.Name": "name-service"}
2022-11-16T10:10:17.343Z INFO controller_nameservice Check the NameServers status {"Request.Namespace": "rocketmq", "Request.Name": "name-service"}
2022-11-16T10:10:17.343Z INFO controller_nameservice NameServers IP 0: 10.244.23.172 {"Request.Namespace": "rocketmq", "Request.Name": "name-service"}
2022-11-16T10:10:17.343Z INFO controller_nameservice Share variables {"Request.Namespace": "rocketmq", "Request.Name": "name-service", "GroupNum": 1, "NameServersStr": "10.244.23.172:9876;", "IsNameServersStrUpdated": false, "IsNameServersStrInitialized": true, "BrokerClusterName": "broker"}
2022-11-16T10:10:17.356Z ERROR controller_broker Failed to update Broker Size status. {"Request.Namespace": "rocketmq", "Request.Name": "broker", "error": "Broker.rocketmq.apache.org \"broker\" is invalid: status.nodes: Required value"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227
console_controller.go 源代码部分截图

1 我也发现改cluster配置时apply不生效,先delete然后再create才生效, 2 你说你的客户端能成功消费消息,你的客户端因该是部署在了node上,要不然客户端和k8s的网络不通,除非客户端的网络和k8s的网络打通了。我的客户端部署在我本地的电脑,客户端从nameservice拉取的broker IP就是pod的IP,如果网络不通发消息就会报网络错误。
Have name sever redeployed (with IP changed)? It seems the value of
rocketmq.namesrv.addrmay not be updated.
rockektmq-operator没有重新部署,rocketmq集群重新部署过N次,你看一下rocketmq-operator pod的日志
Have name sever redeployed (with IP changed)? It seems the value of
rocketmq.namesrv.addrmay not be updated.rockektmq-operator没有重新部署,rocketmq集群重新部署过N次,你看一下rocketmq-operator pod的日志
当我重新部署了rocketmq-operator,然后再部署rocketmq集群就可以了,貌似operator只记住了第一次部署name-server的ip,后面重新部署集群,不更新呢。
1 我也发现改cluster配置时apply不生效,先delete然后再create才生效, 2 你说你的客户端能成功消费消息,你的客户端因该是部署在了node上,要不然客户端和k8s的网络不通,除非客户端的网络和k8s的网络打通了。我的客户端部署在我本地的电脑,客户端从nameservice拉取的broker IP就是pod的IP,如果网络不通发消息就会报网络错误。
是这样的
这应该是一个bug,我重新部署了一遍,观察了下rocketmq-operator pod的日志,controller_nameservice 第一次 Check the NameServers输出
"NameServersStr": "10.244.23.248:9876;",第二次 Check the NameServers 输出share.NameServersStr:10.244.23.172:9876,但console pod启动的时候用的是controller_nameservice第一次输出的NameServersStr的值作为share.NameServersStr 的值传给了环境变量,但实际的name-service pod的ip是第二次输出的那个值。
@chaoyoung -Drocketmq.namesrv.addr is a read-only config set before Console starts. If the IPs of name servers change when Console is running, Console needs to be restarted so that the new config can be taken effect.
这应该是一个bug,我重新部署了一遍,观察了下rocketmq-operator pod的日志,controller_nameservice 第一次 Check the NameServers输出
"NameServersStr": "10.244.23.248:9876;",第二次 Check the NameServers 输出share.NameServersStr:10.244.23.172:9876,但console pod启动的时候用的是controller_nameservice第一次输出的NameServersStr的值作为share.NameServersStr 的值传给了环境变量,但实际的name-service pod的ip是第二次输出的那个值。@chaoyoung
-Drocketmq.namesrv.addris a read-only config set before Console starts. If the IPs of name servers change when Console is running, Console needs to be restarted so that the new config can be taken effect.
@caigy 重启console、或者重新部署rocketmq集群都无效,console pod中的进程启动参数-Drocketmq.namesrv.addr的值一直不变。只有把operator删了,重新部署一遍才行。
BUG REPORT
- Please describe the issue you observed:
设置name-server hostNetwork 为false时,console的启动参数JAVA_OPTS=-Drocketmq.namesrv.addr=10.244.23.205:9876是一个不存在的集群pod Ip,而不是实际真实name-server的pod Ip,导致console无法正常访问集群而报错。
- Please tell us about your environment:
k8s 1.24.6 rocketmq 5.0.0 console: apacherocketmq/rocketmq-dashboard:1.0.0
- cluster pod info
[root@dev-k8s-master1 rocketmq-operator]# k -n rocketmq get pod -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES broker-0-master-0 1/1 Running 0 15m 10.244.23.216 dev-k8s-node1 <none> <none> broker-0-replica-1-0 1/1 Running 0 15m 10.244.23.168 dev-k8s-node2 <none> <none> console-6db8c44df4-vckqn 1/1 Running 0 15m 10.244.23.218 dev-k8s-node1 <none> <none> name-service-0 1/1 Running 0 15m 10.244.23.219 dev-k8s-node1 <none> <none>[root@dev-k8s-master1 rocketmq-operator]# k -n rocketmq exec console-6db8c44df4-vckqn -it -- bash root@console-6db8c44df4-vckqn:/# ps aux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.0 4324 648 ? Ss 10:41 0:00 sh -c java $JAVA_OPTS -jar /rocketmq-console-ng.jar root 7 8.8 3.2 9895168 527124 ? Sl 10:41 1:20 java -Drocketmq.namesrv.addr=10.244.23.205:9876; -Dcom.rocketmq.sendMessageWithVIPChannel=false -jar /rocketmq root 450 0.6 0.0 21944 2132 pts/0 Ss 10:56 0:00 bash root 456 0.0 0.0 19176 1300 pts/0 R+ 10:56 0:00 ps aux root@console-6db8c44df4-vckqn:/# ps -ef | grep 7 root 7 1 8 10:41 ? 00:01:21 java -Drocketmq.namesrv.addr=10.244.23.205:9876; -Dcom.rocketmq.sendMessageWithVIPChannel=false -jar /rocketmq-console-ng.jar root 457 450 0 10:56 pts/0 00:00:00 ps -ef root 458 450 0 10:56 pts/0 00:00:00 grep 7
hello , 麻烦问下您5.0.0的镜像是在哪找的?还是自己打的镜像?
BUG REPORT
- Please describe the issue you observed:
设置name-server hostNetwork 为false时,console的启动参数JAVA_OPTS=-Drocketmq.namesrv.addr=10.244.23.205:9876是一个不存在的集群pod Ip,而不是实际真实name-server的pod Ip,导致console无法正常访问集群而报错。
- Please tell us about your environment:
k8s 1.24.6 rocketmq 5.0.0 console: apacherocketmq/rocketmq-dashboard:1.0.0
- cluster pod info
[root@dev-k8s-master1 rocketmq-operator]# k -n rocketmq get pod -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES broker-0-master-0 1/1 Running 0 15m 10.244.23.216 dev-k8s-node1 <none> <none> broker-0-replica-1-0 1/1 Running 0 15m 10.244.23.168 dev-k8s-node2 <none> <none> console-6db8c44df4-vckqn 1/1 Running 0 15m 10.244.23.218 dev-k8s-node1 <none> <none> name-service-0 1/1 Running 0 15m 10.244.23.219 dev-k8s-node1 <none> <none>[root@dev-k8s-master1 rocketmq-operator]# k -n rocketmq exec console-6db8c44df4-vckqn -it -- bash root@console-6db8c44df4-vckqn:/# ps aux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.0 4324 648 ? Ss 10:41 0:00 sh -c java $JAVA_OPTS -jar /rocketmq-console-ng.jar root 7 8.8 3.2 9895168 527124 ? Sl 10:41 1:20 java -Drocketmq.namesrv.addr=10.244.23.205:9876; -Dcom.rocketmq.sendMessageWithVIPChannel=false -jar /rocketmq root 450 0.6 0.0 21944 2132 pts/0 Ss 10:56 0:00 bash root 456 0.0 0.0 19176 1300 pts/0 R+ 10:56 0:00 ps aux root@console-6db8c44df4-vckqn:/# ps -ef | grep 7 root 7 1 8 10:41 ? 00:01:21 java -Drocketmq.namesrv.addr=10.244.23.205:9876; -Dcom.rocketmq.sendMessageWithVIPChannel=false -jar /rocketmq-console-ng.jar root 457 450 0 10:56 pts/0 00:00:00 ps -ef root 458 450 0 10:56 pts/0 00:00:00 grep 7hello , 麻烦问下您5.0.0的镜像是在哪找的?还是自己打的镜像?
已解决。在项目images目录先手动构建即可。
这种情况一般是 nameserver 地址变了,但是 console 没有重新调谐,重启 console 也没用,只有删掉 console 的 deployment 后,然后删除 operator,重新生成 console 的 deployment。
是一个遗留的 todo 项 https://github.com/apache/rocketmq-operator/blob/03741c8521a8d5063a1d5c1d1d704f1072e9d767/pkg/controller/console/console_controller.go#L179