open-im-server icon indicating copy to clipboard operation
open-im-server copied to clipboard

[BUG] last resolver error: produced zero addresses

Open happy2wh666 opened this issue 1 year ago • 21 comments

OpenIM Server Version

release-v3.8

Operating System and CPU Architecture

Linux (AMD)

Deployment Method

Source Code Deployment

Bug Description and Steps to Reproduce

每分钟一次调用api /auth/user_token。启动server后,调用正常。过一段时间后就会出错。

2024-08-18 17:07:19.831 | 2024-08-18 09:07:19.768	ERROR	[PID:3693]     	openim-api               	[version:3.8.0]  	[mw/rpc_client_interceptor.go:50]                 	RPC Client Response Error - userToken             	{"operationID": "8b2f17f7792ecd1e", "funcName": "/openim.auth.Auth/userToken", "error": "rpc error: code = Unavailable desc = last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 74.48.52.186:10160: connect: connection refused\"; last resolver error: produced zero addresses"}

Screenshots Link

No response

happy2wh666 avatar Aug 18 '24 09:08 happy2wh666

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


OpenIM Server Version

release-v3.8

Operating System and CPU Architecture

Linux (AMD)

Deployment Method

Source Code Deployment

Bug Description and Steps to Reproduce

Call api /auth/user_token once every minute. After starting the server, the call is normal. Something goes wrong after a while.

2024-08-18 17:07:19.831 | 2024-08-18 09:07:19.768 ERROR [PID:3693] openim-api [version:3.8.0] [mw/rpc_client_interceptor.go:50] RPC Client Response Error - userToken {"operationID": "8b2f17f7792ecd1e", "funcName": "/openim.auth.Auth/userToken", "error": "rpc error: code = Unavailable desc = last connection error: connection error: desc = \" transport: Error while dialing: dial tcp 74.48.52.186:10160: connect: connection refused\"; last resolver error: produced zero addresses"}

Screenshots Link

No response

OpenIM-Robot avatar Aug 18 '24 09:08 OpenIM-Robot

Hello! Thank you for filing an issue.

If this is a bug report, please include relevant logs to help us debug the problem.

Join slack 🤖 to connect and communicate with our developers.

OpenIM-Robot avatar Aug 18 '24 09:08 OpenIM-Robot

First, run the command "mage check" to check . After that, take a look at the results. Then, review the output to understand what it shows.

skiffer-git avatar Aug 18 '24 11:08 skiffer-git

root@host1:/openim# mage check
[2024-08-18 14:10:01 UTC] All services are running normally.
[2024-08-18 14:10:01 UTC] Display details of the ports listened to by the service:
[2024-08-18 14:10:02 UTC] Cmdline: /openim/_output/bin/platforms/linux/amd64/openim-push -i 0 -c /openim/config/, PID: 4069 is listening on ports: 20107, 10170
[2024-08-18 14:10:02 UTC] Cmdline: /openim/_output/bin/platforms/linux/amd64/openim-rpc-conversation -i 0 -c /openim/config/, PID: 4090 is listening on ports: 20105, 10180
[2024-08-18 14:10:02 UTC] Cmdline: /openim/_output/bin/platforms/linux/amd64/openim-rpc-friend -i 0 -c /openim/config/, PID: 4126 is listening on ports: 20104, 10120
[2024-08-18 14:10:02 UTC] Cmdline: /openim/_output/bin/platforms/linux/amd64/openim-rpc-third -i 0 -c /openim/config/, PID: 4131 is listening on ports: 20101, 10190
[2024-08-18 14:10:02 UTC] Cmdline: /openim/_output/bin/platforms/linux/amd64/openim-rpc-user -i 0 -c /openim/config/, PID: 4068 is listening on ports: 10110, 20100
[2024-08-18 14:10:02 UTC] Cmdline: /openim/_output/bin/platforms/linux/amd64/openim-rpc-group -i 0 -c /openim/config/, PID: 4121 is listening on ports: 20103, 10150
[2024-08-18 14:10:02 UTC] Cmdline: /openim/_output/bin/platforms/linux/amd64/openim-rpc-msg -i 0 -c /openim/config/, PID: 4070 is listening on ports: 20102, 10130
[2024-08-18 14:10:02 UTC] Cmdline: /openim/_output/bin/platforms/linux/amd64/openim-msgtransfer -i 0 -c /openim/config/, PID: 4100 is listening on ports: 20108
[2024-08-18 14:10:02 UTC] Cmdline: /openim/_output/bin/platforms/linux/amd64/openim-msgtransfer -i 1 -c /openim/config/, PID: 4105 is listening on ports: 20109
[2024-08-18 14:10:02 UTC] Cmdline: /openim/_output/bin/platforms/linux/amd64/openim-msgtransfer -i 2 -c /openim/config/, PID: 4110 is listening on ports: 20110
[2024-08-18 14:10:02 UTC] Cmdline: /openim/_output/bin/platforms/linux/amd64/openim-msgtransfer -i 3 -c /openim/config/, PID: 4116 is listening on ports: 20111
[2024-08-18 14:10:02 UTC] Cmdline: /openim/_output/bin/platforms/linux/amd64/openim-rpc-auth -i 0 -c /openim/config/, PID: 4096 is listening on ports: 20106, 10160
[2024-08-18 14:10:02 UTC] Cmdline: /openim/_output/bin/platforms/linux/amd64/openim-api -i 0 -c /openim/config/, PID: 4080 is listening on ports: 20113, 10002
[2024-08-18 14:10:02 UTC] Cmdline: /openim/_output/bin/platforms/linux/amd64/openim-crontask -i 0 -c /openim/config/, PID: 4074 is not listening on any ports.
[2024-08-18 14:10:02 UTC] Cmdline: /openim/_output/bin/platforms/linux/amd64/openim-msggateway -i 0 -c /openim/config/, PID: 4084 is listening on ports: 20112, 10001, 10140

happy2wh666 avatar Aug 18 '24 14:08 happy2wh666

2024-08-18 14:11:17.342 INFO    [PID:4096]      openim-rpc-auth                 [version:3.8.0]         [mw/rpc_server_interceptor.go:48]                       RPC Server Request - UserToken                          {"operationID": "8b2f17f7792ecd1e", "funcName": "/openim.auth.Auth/UserToken", "req": "secret:\"666\"  platformID:1  userID:\"imAdmin\""}
2024-08-18 14:11:17.343 DEBUG   [PID:4096]      openim-rpc-auth                 [version:3.8.0]         [mw/rpc_client_interceptor.go:44]                       RPC Client Request - getDesignateUsers                  {"operationID": "8b2f17f7792ecd1e", "funcName": "/openim.user.user/getDesignateUsers", "req": "userIDs:\"imAdmin\"", "conn target": "etcd:///openim/user"}
2024-08-18 14:11:17.344 ERROR   [PID:4096]      openim-rpc-auth                 [version:3.8.0]         [mw/rpc_client_interceptor.go:50]                       RPC Client Response Error - getDesignateUsers           {"operationID": "8b2f17f7792ecd1e", "funcName": "/openim.user.user/getDesignateUsers", "error": "rpc error: code = Unavailable desc = last resolver error: produced zero addresses"}
2024-08-18 14:11:17.344 WARN    [PID:4096]      openim-rpc-auth                 [version:3.8.0]         [mw/rpc_server_interceptor.go:97]                       rpc server resp WithDetails error                       {"operationID": "8b2f17f7792ecd1e", "funcName": "/openim.auth.Auth/UserToken", "error": "Error: 14 last resolver error: produced zero addresses | Error trace: 1 (/go/pkg/mod/google.golang.org/[email protected]/server.go:1027) -> handleStream (/go/pkg/mod/google.golang.org/[email protected]/server.go:1797) -> processUnaryRPC (/go/pkg/mod/google.golang.org/[email protected]/server.go:1386) -> _Auth_UserToken_Handler (/go/pkg/mod/github.com/openimsdk/[email protected]/auth/auth.pb.go:973) -> func1 (/go/pkg/mod/google.golang.org/[email protected]/server.go:1194) -> RpcServerInterceptor (/go/pkg/mod/github.com/openimsdk/[email protected]/mw/rpc_server_interceptor.go:53) -> func1 (/go/pkg/mod/google.golang.org/[email protected]/server.go:1203) -> func7 (/openim/pkg/common/startrpc/start.go:199) -> func1 (/go/pkg/mod/github.com/openimsdk/[email protected]/auth/auth.pb.go:971) -> UserToken (/openim/internal/rpc/auth/auth.go:78) -> GetUserInfo (/openim/pkg/rpcclient/user.go:90) -> GetUsersInfo (/openim/pkg/rpcclient/user.go:74) -> GetDesignateUsers (/go/pkg/mod/github.com/openimsdk/[email protected]/user/user.pb.go:5304) -> Invoke (/go/pkg/mod/google.golang.org/[email protected]/call.go:35) -> RpcClientInterceptor (/go/pkg/mod/github.com/openimsdk/[email protected]/mw/rpc_client_interceptor.go:66) -> Wrap (/go/pkg/mod/github.com/openimsdk/[email protected]/errs/coderr.go:74) -> Wrap (/go/pkg/mod/github.com/openimsdk/[email protected]/errs/coderr.go:126)"}
2024-08-18 14:11:17.344 WARN    [PID:4096]      openim-rpc-auth                 [version:3.8.0]         [mw/rpc_server_interceptor.go:116]                      RPC Server Response Error - UserToken                   {"operationID": "8b2f17f7792ecd1e", "funcName": "/openim.auth.Auth/UserToken", "req": "secret:\"666\"  platformID:1  userID:\"imAdmin\"", "err": "<nil>", "error": "rpc error: code = Unavailable desc = 14 last resolver error: produced zero addresses"}
2024-08-18 14:11:17.345 ERROR   [PID:4080]      openim-api                      [version:3.8.0]         [mw/rpc_client_interceptor.go:50]                       RPC Client Response Error - userToken                   {"operationID": "8b2f17f7792ecd1e", "funcName": "/openim.auth.Auth/userToken", "error": "rpc error: code = Unavailable desc = 14 last resolver error: produced zero addresses"}

happy2wh666 avatar Aug 18 '24 14:08 happy2wh666

When the message "produced zero addresses" appears, you need to take action. At that point, run the command "mage check".

skiffer-git avatar Aug 18 '24 14:08 skiffer-git

How did you set it up? What changes did you make to the settings? Can you explain it step by step?

skiffer-git avatar Aug 18 '24 14:08 skiffer-git

The mage check in the above content is executed after an error appears in the logs


  • Use the code from tag release-v3.8.
  • No modifications were made to the code, only the configuration files in /config were changed.
  • mongoDB and Redis use services provided by SaaS platform.
  • etcd, Minio, and Kafka use self-built Docker (not the compose provided by the openim project)

Use a Docker-based Golang environment to compile and run the server.

services:
  openim:
    image: golang
    container_name: openim
    user: root
    privileged: true
    volumes:
      - "/openim:/openim"
    restart: always
    network_mode: "host"
    command: /openim/happy.sh

docker startup command

usm@/openim$ cat ./happy.sh
#!/bin/sh
cd /openim
bash bootstrap.sh
if [ ! -e skip_build ]
then
  mage
fi
touch skip_build
mage start
mage check
tail -f /dev/stdout

happy2wh666 avatar Aug 19 '24 01:08 happy2wh666

I suggest you use the source code for the deployment.

skiffer-git avatar Aug 19 '24 08:08 skiffer-git

Can you tell me if the IP address 74.48.52.186 is a public one or just an internal network address?

skiffer-git avatar Aug 19 '24 08:08 skiffer-git

I suggest you use the source code for the deployment.

The server is compiled from source code. Both the compilation and execution are done inside a Docker container with image “golang”.

Can you tell me if the IP address 74.48.52.186 is a public one or just an internal network address?

public, with linux firewall

happy2wh666 avatar Aug 19 '24 08:08 happy2wh666

There's no need to use a public IP address. You can just stick with an internal IP address for this situation

skiffer-git avatar Aug 19 '24 10:08 skiffer-git

Is this issue related to the internal IP or public IP? The point is that the server was running normally at the beginning, but the error occurred after a period of time. Restarting the server made it normal again, but then the error occurred again after some time.

happy2wh666 avatar Aug 19 '24 11:08 happy2wh666

When you run into an issue, take a look at the data on etcd.

skiffer-git avatar Aug 19 '24 12:08 skiffer-git

I keep running the follow command for a long time, during which it only outputs some PUT and DELETE.

I have no name!@host:/opt/bitnami/etcd$ etcdctl watch / --prefix

while read -r line; do
    echo "New key: $line"
done


PUT
/check_openim_component

DELETE
/check_openim_component

PUT
/check_openim_component

DELETE
/check_openim_component

...
...
...

happy2wh666 avatar Aug 20 '24 01:08 happy2wh666

try to watch /openim

skiffer-git avatar Aug 20 '24 01:08 skiffer-git

watch / --prefix has include watch /openim

happy2wh666 avatar Aug 20 '24 01:08 happy2wh666

I deployed the server and all other components like redis and etcd kafka on the same server, and the issue has not occurred again.

The issue may be caused by a communication problem between the server and other components that has not been resolved. For example, the server may not have properly reconnected after losing connection with Redis.

happy2wh666 avatar Aug 28 '24 03:08 happy2wh666

We'll be testing it soon. Thanks for bringing this up.

skiffer-git avatar Nov 21 '24 11:11 skiffer-git

We deployed Mongo and Redis separately from the server and tested them for a while, but did not encounter this issue. You can try updating the code and testing again. If the issue persists, please provide more specific steps to reproduce the problem.

icey-yu avatar Nov 26 '24 03:11 icey-yu

I keep running the follow command for a long time, during which it only outputs some PUT and DELETE.

I have no name!@host:/opt/bitnami/etcd$ etcdctl watch / --prefix

while read -r line; do
    echo "New key: $line"
done


PUT
/check_openim_component

DELETE
/check_openim_component

PUT
/check_openim_component

DELETE
/check_openim_component

...
...
...

etcdctl get "" --prefix --keys-only openim/admin/10.3.0.11:30200

openim/auth/10.3.0.11:10200

openim/chat/10.3.0.11:30300

openim/conversation/10.3.0.11:10220

openim/encryption/10.3.0.11:10500

openim/friend/10.3.0.11:10240

openim/group/10.3.0.11:10260

openim/meeting/10.3.0.11:10112

openim/messageGateway/10.3.0.11:10140

openim/msg/10.3.0.11:10280

openim/office/10.3.0.11:30400

openim/organization/10.3.0.11:30500

openim/push/10.3.0.11:10170

openim/push/10.3.0.11:10171

openim/push/10.3.0.11:10172

openim/push/10.3.0.11:10173

openim/push/10.3.0.11:10174

openim/push/10.3.0.11:10175

openim/push/10.3.0.11:10176

openim/push/10.3.0.11:10177

openim/signal/10.3.0.11:10212

openim/third/10.3.0.11:10300

openim/user/10.3.0.11:10320

skiffer-git avatar Nov 26 '24 08:11 skiffer-git