netopeer2 icon indicating copy to clipboard operation
netopeer2 copied to clipboard

After deconnection, variable delays to reconnect via Call Home observed on v2

Open robbynet opened this issue 4 years ago • 1 comments

I switched recently to the following software version from the devel branch(v2) : Netopeer2 v2.0.35

Scenario : In CLI, alternative listen/disconnect cmd leads to the following logs at Server level. Observation : listen cmd is stuck for an randomize time linked to the time xyz displayed in following Server logs : ==> [INF]: LN: Call Home client "default-client" reconnecting in xyz seconds. This behavior was not observed in Netopeer2 v1

Full logs below with several trials of sequence listen/disconnect at CLI level along with the Call Hom configuration in place in the datastore. Time to time, we can see that xyz is different.

[INF]: SR: Successful processing of "rpc" event with ID 2 priority 0 (remaining 0 subscribers). [INF]: SR: Event "rpc" with ID 2 priority 0 succeeded.

===> Disconnect Call Home from CLI

[INF]: NP: Session 3: thread 0 event new RPC. [INF]: NP: Session 2: thread 3 event new RPC. [INF]: NP: Session 2: thread 3 event session terminated. [INF]: SR: Published event "notif" "ietf-netconf-notifications" with ID 10 priority 0 for 2 subscribers. [INF]: SR: Processing "notif" "ietf-netconf-notifications" event with ID 10. [INF]: SR: Successful processing of "notif" event with ID 10 priority 0 (remaining 0 subscribers). [INF]: NP: Generated new event (netconf-session-end). [INF]: LN: Call Home client "default-client" session terminated. [INF]: LN: Call Home client "default-client" reconnecting in 117 seconds. [ERR]: LN: Session 2: Waiting for Call Home thread failed (Connection timed out). [2021/11/28 07:25:58.583001, 1] ssh_socket_exception_callback: Socket exception callback: 1 (0) [2021/11/28 07:25:58.583032, 1] ssh_socket_exception_callback: Socket error: disconnected

===> reconnection with Call Home frozen during the 117 seconds timeout ....

[INF]: SR: Processing "notif" "ietf-netconf-notifications" event with ID 11. [INF]: SR: Successful processing of "notif" event with ID 11 priority 0 (remaining 1 subscribers). [INF]: SR: Processing "notif" "ietf-netconf-notifications" event with ID 12. [INF]: SR: Successful processing of "notif" event with ID 12 priority 0 (remaining 1 subscribers). [INF]: LN: Trying to connect via IPv4 to 172.1.1.5:4335. [INF]: LN: getsockopt() error (Connection refused). [INF]: LN: Trying to connect via IPv4 to 172.1.1.5:4335. [INF]: LN: getsockopt() error (Connection refused). [INF]: LN: Trying to connect via IPv4 to 172.1.1.5:4335. [INF]: LN: getsockopt() error (Connection refused). [INF]: LN: Trying to connect via IPv4 to 172.1.1.5:4334. [INF]: LN: Successfully connected to 172.1.1.5:4334 over IPv4. [INF]: SR: Session 177 (user "root", CID 3) created.

===> Disconnect Call Home from CLI

[INF]: NP: Session 4: thread 3 event new RPC. [INF]: NP: Session 4: thread 3 event session terminated. [INF]: SR: Published event "notif" "ietf-netconf-notifications" with ID 16 priority 0 for 2 subscribers. [INF]: SR: Processing "notif" "ietf-netconf-notifications" event with ID 16. [INF]: SR: Successful processing of "notif" event with ID 16 priority 0 (remaining 0 subscribers). [INF]: NP: Generated new event (netconf-session-end). [INF]: LN: Call Home client "default-client" session terminated. [INF]: LN: Call Home client "default-client" reconnecting in 26 seconds. [ERR]: LN: Session 4: Waiting for Call Home thread failed (Connection timed out). [2021/11/28 07:32:27.152690, 1] ssh_socket_exception_callback: Socket exception callback: 1 (0) [2021/11/28 07:32:27.152716, 1] ssh_socket_exception_callback: Socket error: disconnected

====> reconnection with Call Home frozen during the 26 seconds timeout ....

[ERR]: SR: Internal error (/opt/dev/sysrepo/src/common.c:3886). [ERR]: SR: Internal error (/opt/dev/sysrepo/src/common.c:3886). [INF]: SR: Processing "notif" "ietf-netconf-notifications" event with ID 17. [INF]: SR: Successful processing of "notif" event with ID 17 priority 0 (remaining 0 subscribers). [INF]: SR: Processing "notif" "ietf-netconf-notifications" event with ID 18. [INF]: SR: Successful processing of "notif" event with ID 18 priority 0 (remaining 1 subscribers). [INF]: LN: Trying to connect via IPv4 to 172.1.1.5:4335. [INF]: LN: getsockopt() error (Connection refused). [INF]: LN: Trying to connect via IPv4 to 172.1.1.5:4335. [INF]: LN: getsockopt() error (Connection refused). [INF]: LN: Trying to connect via IPv4 to 172.1.1.5:4335. [INF]: LN: getsockopt() error (Connection refused). [INF]: LN: Trying to connect via IPv4 to 172.1.1.5:4334. [INF]: LN: Successfully connected to 172.1.1.5:4334 over IPv4. [INF]: SR: Session 210 (user "root", CID 3) created.

====> Disconnect Call Home from CLI

[INF]: NP: Session 5: thread 1 event new RPC. [INF]: NP: Session 5: thread 1 event session terminated. [INF]: SR: Published event "notif" "ietf-netconf-notifications" with ID 20 priority 0 for 2 subscribers. [INF]: SR: Processing "notif" "ietf-netconf-notifications" event with ID 20. [INF]: SR: Successful processing of "notif" event with ID 20 priority 0 (remaining 1 subscribers). [INF]: NP: Generated new event (netconf-session-end). [INF]: LN: Call Home client "default-client" session terminated. [INF]: LN: Call Home client "default-client" reconnecting in 3 seconds. [ERR]: LN: Session 5: Waiting for Call Home thread failed (Connection timed out). [2021/11/28 07:34:04.743739, 1] ssh_socket_exception_callback: Socket exception callback: 1 (0) [2021/11/28 07:34:04.743766, 1] ssh_socket_exception_callback: Socket error: disconnected

====> reconnection with Call Home frozen during the 3 seconds timeout ....

[INF]: LN: Trying to connect via IPv4 to 172.1.1.5:4335. [INF]: LN: getsockopt() error (Connection refused). [INF]: LN: Trying to connect via IPv4 to 172.1.1.5:4335. [INF]: LN: getsockopt() error (Connection refused). [INF]: LN: Trying to connect via IPv4 to 172.1.1.5:4335. [INF]: LN: getsockopt() error (Connection refused). [INF]: LN: Trying to connect via IPv4 to 172.1.1.5:4334. [INF]: LN: getsockopt() error (Connection refused).

=====> Disconnect Call Home from CLI

[INF]: NP: Session 6: thread 1 event new RPC. [INF]: NP: Session 6: thread 1 event session terminated. [INF]: SR: Published event "notif" "ietf-netconf-notifications" with ID 24 priority 0 for 2 subscribers. [INF]: SR: Processing "notif" "ietf-netconf-notifications" event with ID 24. [INF]: SR: Successful processing of "notif" event with ID 24 priority 0 (remaining 0 subscribers). [INF]: NP: Generated new event (netconf-session-end). [INF]: LN: Call Home client "default-client" session terminated. [INF]: LN: Call Home client "default-client" reconnecting in 115 seconds. [ERR]: LN: Session 6: Waiting for Call Home thread failed (Connection timed out). [2021/11/28 07:35:56.927858, 1] ssh_socket_exception_callback: Socket exception callback: 1 (0) [2021/11/28 07:35:56.927885, 1] ssh_socket_exception_callback: Socket error: disconnected

======> reconnection with Call Home frozen during the 115 seconds timeout ....

> get --defaults report-all --filter-xpath  /netconf-server/call-home
DATA
<data xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
  <netconf-server xmlns="urn:ietf:params:xml:ns:yang:ietf-netconf-server">
    <call-home>
      <netconf-client>
        <name>default-client</name>
        <endpoints>
          <endpoint>
            <name>default-tls</name>
            <tls>
              <tcp-client-parameters>
                <remote-address>172.1.1.5</remote-address>
                <remote-port>4335</remote-port>
                <keepalives>
                  <idle-time>1</idle-time>
                  <max-probes>10</max-probes>
                  <probe-interval>5</probe-interval>
                </keepalives>
              </tcp-client-parameters>
              <tls-server-parameters>
                <server-identity>
                  <keystore-reference>
                    <asymmetric-key>serverkey</asymmetric-key>
                    <certificate>servercert</certificate>
                  </keystore-reference>
                </server-identity>
                <client-authentication>
                  <cert-maps>
                    <cert-to-name>
                      <id>1</id>
                      <fingerprint>02:E9:38:1F:F6:8B:62:DE:0A:0B:C5:03:81:A8:03:49:A0:00:7F:8B:F3</fingerprint>
                      <map-type xmlns:x509c2n="urn:ietf:params:xml:ns:yang:ietf-x509-cert-to-name">x509c2n:specified</map-type>
                      <name>tls-test</name>
                    </cert-to-name>
                  </cert-maps>
                  <required/>
                  <ca-certs>cacerts</ca-certs>
                  <client-certs>clientcerts</client-certs>
                </client-authentication>
              </tls-server-parameters>
            </tls>
          </endpoint>
          <endpoint>
            <name>default-ssh</name>
            <ssh>
              <tcp-client-parameters>
                <remote-address>172.1.1.5</remote-address>
                <remote-port>4334</remote-port>
                <keepalives>
                  <idle-time>1</idle-time>
                  <max-probes>10</max-probes>
                  <probe-interval>5</probe-interval>
                </keepalives>
              </tcp-client-parameters>
              <ssh-server-parameters>
                <server-identity>
                  <host-key>
                    <name>default-key</name>
                    <public-key>
                      <keystore-reference>genkey</keystore-reference>
                    </public-key>
                  </host-key>
                </server-identity>
                <client-authentication>
                  <supported-authentication-methods>
                    <publickey/>
                    <passsword/>
                    <other>interactive</other>
                  </supported-authentication-methods>
                </client-authentication>
              </ssh-server-parameters>
            </ssh>
          </endpoint>
        </endpoints>
        <connection-type>
          <periodic>
            <period>2</period>
            <idle-timeout>0</idle-timeout>
          </periodic>
        </connection-type>
        <reconnect-strategy>
          <start-with>first-listed</start-with>
          <max-attempts>3</max-attempts>
        </reconnect-strategy>
      </netconf-client>
    </call-home>
  </netconf-server>
</data>

robbynet avatar Nov 29 '21 06:11 robbynet

Okay, the timeout was being anchor when it should not have been, fixed in the latest libnetconf2. But I am more interested in the errors so please include the exact steps you took when connecting to/disconnecting from the server because from logs it seems you have also subscribed to notifications by some client, which I think may be the root of the problem.

michalvasko avatar Nov 29 '21 08:11 michalvasko