netopeer2 Delay in receiving rpc reply

Hi Michal,

we have netopeer2-server running on one container, and we have application(netconf client) running in other container on same VM. we have connected to server via libnetconf through ssh, and we are trying to do get operations using nc_rpc_get and rpc send, in some instances we see that get rpc (recv_reply) is timing out, and we do not see any reply for about 8-10 seconds, we want to triage and identify where is the delay caused,

on netconf server logs, we did not observe any ERR for /ietf-netconf: get logs, is there any pointers in the log to identify where delay is introduced, ( it may also be in network we dont know, so we just want to identify where is delay introduced ) Sample Log

[2024-12-18 13:14:36.846143] [INFO] Get day1 data for path /ManagedElement/GNBCUCPFunction/EP_XnC_Local
[2024-12-18 13:14:37.847223] [INFO] Couldn't receive a reply from the server ret:1
[2024-12-18 13:14:37.847249] [INFO] output data is null. envp
[2024-12-18 13:14:37.847260] [INFO] day1 path,data /ManagedElement/GNBCUCPFunction/EP_XnC_Local
[2024-12-18 13:14:37.847264] [INFO] Get day1 data for path /ManagedElement/GNBCUCPFunction/EP_X2C_Local

[2024-12-18 13:14:53.107708] [INFO] Session is valid and active.
ParseFromASession 2 [ERR]: Received a <rpc-reply> with an unexpected message-id 111 (expected 119).

netopeer2-logs

[INF]: SR: EV LISTEN: "/ietf-netconf:get" "rpc" ID 260 priority 0 processing (remaining 1 subscribers).
[INF]: SR: EV LISTEN: "/ietf-netconf:get" "rpc" ID 260 priority 0 success (remaining 0 subscribers).
[INF]: SR: EV ORIGIN: "/ietf-netconf:get" "rpc" ID 260 priority 0 succeeded.
[INF]: SR: EV ORIGIN: "/ietf-netconf:get" "rpc" ID 261 priority 0 for 1 subscribers published.
[INF]: SR: EV LISTEN: "/ietf-netconf:get" "rpc" ID 261 priority 0 processing (remaining 1 subscribers).
[INF]: SR: EV LISTEN: "/ietf-netconf:get" "rpc" ID 261 priority 0 success (remaining 0 subscribers).
[INF]: SR: EV ORIGIN: "/ietf-netconf:get" "rpc" ID 261 priority 0 succeeded.
[INF]: SR: EV ORIGIN: "/ietf-netconf:get" "rpc" ID 262 priority 0 for 1 subscribers published.
[INF]: SR: EV LISTEN: "/ietf-netconf:get" "rpc" ID 262 priority 0 processing (remaining 1 subscribers).
[INF]: SR: EV LISTEN: "/ietf-netconf:get" "rpc" ID 262 priority 0 success (remaining 0 subscribers).
[INF]: SR: EV ORIGIN: "/ietf-netconf:get" "rpc" ID 262 priority 0 succeeded.

Thanks, Srikanth

Dec 18 '24 09:12 srikanthsubbaramu

Not sure I can help you. The logs in netopeer2-server do not have timestamps but you should be able to see when they are generated and based on that learn whether the delay is before receiving the RPC, during its processing, or after the reply is sent.

Dec 18 '24 11:12 michalvasko

Design Hi Michal ,

Let me explain design and problem in detailed way,

In Scenario, there are multiple clients, who does get rpc call on certain paths for day1 on running datastore and after get is completed, they immediately do a user rpc for candidate ds for day2 changes, The below activity for client1,client2 and client3 are happening simultaneously at the server Client1 is doing usercall rpc client2 is doing usercall rpc client3 is doing multiple get rpc call on running data store

Here in client 3 is facing a timeout on get rpc and this stall is seen for about 5-6 seconds.

We wanted to understand if user rpc callback under sr context is holding any resource or causing any problems to netopeer2-server responding to get Netconf rpc calls for other clients One observation is we tried removing establish push rpc call on user-rpc call(subscribe-xpath) and we did not observe above timeout related issue

please provide your inputs.

Thanks, Srikanth

Dec 19 '24 06:12 srikanthsubbaramu

I am sorry but this is way too complex for me to be able to analyze it without actually running the use-case, so I cannot help you.

Dec 19 '24 08:12 michalvasko

Hi Michal, We have increased NC threads , and also changed from using get to getconfig at clients, and we did not encounter any issues Libnetconf2: set(MAX_PSPOLL_THREAD_COUNT 15 CACHE STRING "Maximum number of threads that could simultaneously access a ps_poll structure") NETOPEER2: set(THREAD_COUNT 12 CACHE STRING "Number of threads accepting new sessions and handling requests") We did not observe any delay on get/get-config calls with increased threads

couple of questions,

Should we pursue on using sysrepo api for yang push subscription, instead of issuing a Netconf-rpc call within sysrepo rpc callback?
can we try to make this number of threads configurable at run time ( we need to change arrays to dynamic allocation though)? Thanks, Srikanth

Dec 24 '24 09:12 srikanthsubbaramu

This is up to you and would simplify these calls and make them faster. On the other hand, there is some effort involved, especially for someone not yet familiar with the API. So I guess you can leave it as it is for now and just keep this possibility in mind if you encounter any more issues.
Yes, I believe it should not be too difficult to add support in netopeer2-server to stop some worker threads or create new ones.

Jan 06 '25 10:01 michalvasko