netopeer2-server edit-config and notification concurrently
Versions used: sysrepo-2.2.73 netopeer2-2.1.62
Hi,
I see that on that https://github.com/CESNET/netopeer2/issues/1266 , it was mentioned pipeline is implemented, but notification handling is done in a separate thread.
I'm facing an issue because for one of our scenario, on an edit config we want to simulate a rpc timeout, so we block the processing of the sysrepo's callback using a sleep (and in the case of this log, I tried to use the SHELVE system of sysrepo, expecting the callback to be called in a loop but it doesn't seem to be the case)
On another thread, we have a "supervision mechanism" that is implemented (see Oran WG4 spec, chapter 6.7) and it tries to send a notification at the same time the edit-config rpc is "hanged" due to the callback being frozen.
It seems to be netopeer2-server only send the notification once the edit-config has timed out, despite the notification being sent earlier that the timeout, it was received later on the cli. Or is it with an issue with the cli or sysrepo ?
netopeer2-server
2023-08-24 15:18:22/INF:UE_APP-OAM/sysrepo[RU01]: UE_APP-OAM/sysrepo, EV LISTEN: "o-ran-uplane-conf" "change" ID 2 priority 0 processing (remaining 6 subscribers).
2023-08-24 15:18:22,294 DEBG 'orusw' stdout output:
2023-08-24 15:18:22/INF:UE_APP-OAM/orusw[RU01]: airphone-orusim/orusw/o-ran-uplane-conf.c:655 ========== EDIT CONFIG "/o-ran-uplane-conf:user-plane-configuration/static-prach-configurations" (SUB ID = 68 ID = 2) RECEIVED ===========
2023-08-24 15:18:22/INF:UE_APP-OAM/orusw[RU01]: airphone-orusim/orusw/o-ran-uplane-conf.c:656 ========== EVENT "change" RECEIVED ===========
2023-08-24 15:18:22/INF:UE_APP-OAM/orusw[RU01]: airphone-orusim/orusw/o-ran-uplane-conf.c:667 Timeout on <edit-config> of static-prach-configuration, STE EDIT going to sleep for 1 s
2023-08-24 15:18:23,294 DEBG 'orusw' stdout output:
2023-08-24 15:18:23/INF:UE_APP-OAM/sysrepo[RU01]: UE_APP-OAM/sysrepo, EV LISTEN: "o-ran-uplane-conf" "change" ID 2 priority 0 processing shelved.
2023-08-24 15:18:23/INF:UE_APP-OAM/sysrepo[RU01]: UE_APP-OAM/sysrepo, EV LISTEN: "o-ran-uplane-conf" "change" ID 2 priority 0 success (remaining 1 subscribers).
2023-08-24 15:18:24,427 DEBG 'orusw' stdout output:
2023-08-24 15:18:24/INF:UE_APP-OAM/orusw[RU01]: airphone-orusim/orusw/o-ran-supervision.c:118 Expiration of supervision interval 10 (pid = 140032669099584)
2023-08-24 15:18:24,428 DEBG 'orusw' stdout output:
2023-08-24 15:18:24/INF:UE_APP-OAM/sysrepo[RU01]: UE_APP-OAM/sysrepo, Session 356 (user "root", CID 48) created.
2023-08-24 15:18:24/INF:UE_APP-OAM/sysrepo[RU01]: UE_APP-OAM/sysrepo, EV ORIGIN: "o-ran-supervision" "notif" ID 1 priority 0 for 1 subscribers published.
2023-08-24 15:18:24,429 DEBG 'netopeer2-server' stdout output:
[INF]: SR: EV LISTEN: "o-ran-supervision" "notif" ID 1 processing.
2023-08-24 15:18:24,429 DEBG 'netopeer2-server' stdout output:
[INF]: SR: EV LISTEN: "o-ran-supervision" "notif" ID 1 priority 0 success (remaining 0 subscribers).
2023-08-24 15:18:34,428 DEBG 'orusw' stdout output:
2023-08-24 15:18:34/INF:UE_APP-OAM/orusw[RU01]: airphone-orusim/orusw/o-ran-supervision.c:128 Expiration of guard timer overhead 10 (pid = 140032669099584)
2023-08-24 15:18:34/ERR:UE_APP-OAM/orusw[RU01]: airphone-orusim/orusw/o-ran-supervision.c:143 O-DU watchdog reset not received in time, O-RU restarts now
2023-08-24 15:18:34/WRN:UE_APP-OAM/orusw[RU01]: airphone-orusim/orusw/o-ran-supervision.c:147 O-RU reset explictly disabled in config
2023-08-24 15:19:32,293 DEBG 'netopeer2-server' stdout output:
[ERR]: SR: EV ORIGIN: "change" ID 2 processing timed out.
[WRN]: SR: EV ORIGIN: "o-ran-uplane-conf" "change" ID 2 priority 0 failed (Timeout expired).
2023-08-24 15:19:32,293 DEBG 'netopeer2-server' stdout output:
[INF]: SR: EV ORIGIN: "o-ran-uplane-conf" "abort" ID 2 priority 11 for 2 subscribers published.
netopeer2-cli output:
> subscribe
OK
> user-rpc --rpc-timeout 10 --content /workspace/UE_APP-OAM/Services/orusw/airphone-orusim/xml/supervision-watchdog-reset-interval.xml
DATA
<next-update-at xmlns="urn:o-ran:supervision:1.0">2023-08-24T15:18:23+00:00</next-update-at>
> edit-config --rpc-timeout 30 --target running --error stop --config=/workspace/UE_APP-OAM/Services/orusw/airphone-orusim/xml/o-ran-static-prach-conf
cli_send_recv: Timeout for receiving a reply expired.
>
notification (2023-08-24T15:18:24.427910277+00:00)
<supervision-notification xmlns="urn:o-ran:supervision:1.0"/>
Hard to say what exactly is happening but it is possible the edit-config is holding a lock for a module that is needed for validating the notification, for example.
But these logs means the notif is properly sent from app and received by netopeer2-server, right ?
2023-08-24 15:18:24,428 DEBG 'orusw' stdout output:
2023-08-24 15:18:24/INF:UE_APP-OAM/sysrepo[RU01]: UE_APP-OAM/sysrepo, Session 356 (user "root", CID 48) created.
2023-08-24 15:18:24/INF:UE_APP-OAM/sysrepo[RU01]: UE_APP-OAM/sysrepo, EV ORIGIN: "o-ran-supervision" "notif" ID 1 priority 0 for 1 subscribers published.
2023-08-24 15:18:24,429 DEBG 'netopeer2-server' stdout output:
[INF]: SR: EV LISTEN: "o-ran-supervision" "notif" ID 1 processing.
2023-08-24 15:18:24,429 DEBG 'netopeer2-server' stdout output:
[INF]: SR: EV LISTEN: "o-ran-supervision" "notif" ID 1 priority 0 success (remaining 0 subscribers).
Yes, that is correct. If you are suspecting the server to block the notification even though it does not have to, try updating the server to the latest release, there were lots of major improvements in its concurrency.
Ok I see you released the new version earlier, I will try again on this version and give feekback once done.
I will also try to put everything in debug mode to see if I can notice something weird.
Thanks once again for the quick support !
After enabling verb3 on the cli, I noticed we receive the notification xml before the text :
cli_send_recv: Timeout for receiving a reply expired.
So it seems to be a display issue ? the notification is only written to the console after the rpc timeout is reached, despite being received earlier.
netopeer2-cli with verb3. Sending a rpc which is expected to timeout, while being subscribed to receive a supervision notification:
> nfig --rpc-timeout 30 --target running --error stop --config=/workspace/UE_APP-OAM/Services/orusw/airphone-orusim/xml/o-ran-static-prach-config.xml
nc DEBUG: Sending message:
#1014
[2023/08/25 14:01:21.959947, 3] ssh_packet_need_rekey: rekey: [data_rekey_needed=0, out_blocks=1786, in_blocks=77953]
[2023/08/25 14:01:21.960050, 3] ssh_socket_unbuffered_write: Enabling POLLOUT for socket
[2023/08/25 14:01:21.960071, 3] packet_send2: packet: wrote [type=94, len=32, padding_size=15, comp=16, payload=16]
[2023/08/25 14:01:21.960087, 3] channel_write_common: ssh_channel_write wrote 7 bytes
nc DEBUG: Sending message:
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="91"><edit-config xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"><target><running/></target><error-option>stop-on-error</error-option><config><user-plane-configuration xmlns="urn:o-ran:uplane-conf:1.0"><static-prach-configurations><static-prach-config-id>4</static-prach-config-id><pattern-period>2</pattern-period><guard-tone-low-re>0</guard-tone-low-re><num-prach-re>0</num-prach-re><guard-tone-high-re>0</guard-tone-high-re><sequence-duration>256</sequence-duration><prach-patterns><prach-pattern-id>0</prach-pattern-id><number-of-repetitions>1</number-of-repetitions><number-of-occasions>0</number-of-occasions><re-offset>0</re-offset><occasion-parameters><occasion-id>0</occasion-id><cp-length>0</cp-length><gp-length>0</gp-length><beam-id>0</beam-id></occasion-parameters><frame-number>0</frame-number><sub-frame-id>0</sub-frame-id><time-offset>0</time-offset></prach-patterns></static-prach-configurations></user-plane-configuration></config>
[2023/08/25 14:01:21.960120, 3] ssh_packet_need_rekey: rekey: [data_rekey_needed=0, out_blocks=1849, in_blocks=78015]
[2023/08/25 14:01:21.960146, 3] packet_send2: packet: wrote [type=94, len=1040, padding_size=16, comp=1023, payload=1023]
[2023/08/25 14:01:21.960160, 3] channel_write_common: ssh_channel_write wrote 1014 bytes
[2023/08/25 14:01:21.960246, 3] ssh_socket_unbuffered_write: Enabling POLLOUT for socket
nc DEBUG: Sending message:
#20
[2023/08/25 14:01:21.960283, 3] ssh_packet_need_rekey: rekey: [data_rekey_needed=0, out_blocks=1849, in_blocks=77952]
[2023/08/25 14:01:21.960303, 3] packet_send2: packet: wrote [type=94, len=32, padding_size=17, comp=14, payload=14]
[2023/08/25 14:01:21.960315, 3] channel_write_common: ssh_channel_write wrote 5 bytes
[2023/08/25 14:01:21.960348, 3] ssh_socket_unbuffered_write: Enabling POLLOUT for socket
nc DEBUG: Sending message:
</edit-config></rpc>
[2023/08/25 14:01:21.960375, 3] ssh_packet_need_rekey: rekey: [data_rekey_needed=0, out_blocks=1850, in_blocks=77953]
[2023/08/25 14:01:21.960446, 3] packet_send2: packet: wrote [type=94, len=48, padding_size=18, comp=29, payload=29]
[2023/08/25 14:01:21.960510, 3] channel_write_common: ssh_channel_write wrote 20 bytes
[2023/08/25 14:01:21.960640, 3] ssh_socket_unbuffered_write: Enabling POLLOUT for socket
nc DEBUG: Sending message:
##
[2023/08/25 14:01:21.960675, 3] ssh_packet_need_rekey: rekey: [data_rekey_needed=0, out_blocks=1850, in_blocks=77952]
[2023/08/25 14:01:21.960701, 3] packet_send2: packet: wrote [type=94, len=32, padding_size=18, comp=13, payload=13]
[2023/08/25 14:01:21.960713, 3] channel_write_common: ssh_channel_write wrote 4 bytes
[2023/08/25 14:01:21.960741, 3] ssh_socket_unbuffered_write: Enabling POLLOUT for socket
[2023/08/25 14:01:29.852790, 3] ssh_packet_socket_callback: packet: read type 94 [len=32,padding=16,comp=15,payload=15]
[2023/08/25 14:01:29.852839, 3] ssh_packet_process: Dispatching handler for packet type 94
[2023/08/25 14:01:29.852859, 3] channel_rcv_data: Channel receiving 6 bytes data in 0 (local win=684648 remote win=1249988)
[2023/08/25 14:01:29.852869, 3] channel_default_bufferize: placing 6 bytes into channel buffer (stdout)
[2023/08/25 14:01:29.852878, 3] channel_rcv_data: Channel windows are now (local win=684642 remote win=1249988)
[2023/08/25 14:01:29.852906, 3] ssh_packet_need_rekey: rekey: [data_rekey_needed=0, out_blocks=1850, in_blocks=77952]
[2023/08/25 14:01:29.852932, 3] ssh_channel_read_timeout: Read (2) buffered : 6 bytes. Window: 684642
[2023/08/25 14:01:29.852944, 3] ssh_channel_read_timeout: Read (1) buffered : 4 bytes. Window: 684642
[2023/08/25 14:01:29.853072, 3] ssh_channel_read_timeout: Read (1) buffered : 3 bytes. Window: 684642
[2023/08/25 14:01:29.853081, 3] ssh_channel_read_timeout: Read (1) buffered : 2 bytes. Window: 684642
[2023/08/25 14:01:29.853087, 3] ssh_channel_read_timeout: Read (1) buffered : 1 bytes. Window: 684642
[2023/08/25 14:01:29.853095, 3] ssh_channel_read_timeout: Read (204) buffered : 0 bytes. Window: 684642
[2023/08/25 14:01:29.853123, 3] ssh_packet_socket_callback: packet: read type 94 [len=224,padding=10,comp=213,payload=213]
[2023/08/25 14:01:29.853135, 3] ssh_packet_process: Dispatching handler for packet type 94
[2023/08/25 14:01:29.853147, 3] channel_rcv_data: Channel receiving 204 bytes data in 0 (local win=684642 remote win=1249988)
[2023/08/25 14:01:29.853157, 3] channel_default_bufferize: placing 204 bytes into channel buffer (stdout)
[2023/08/25 14:01:29.853166, 3] channel_rcv_data: Channel windows are now (local win=684438 remote win=1249988)
[2023/08/25 14:01:29.853175, 3] ssh_packet_socket_callback: Processing 52 bytes left in socket buffer
[2023/08/25 14:01:29.853185, 3] ssh_packet_socket_callback: packet: read type 94 [len=32,padding=18,comp=13,payload=13]
[2023/08/25 14:01:29.853195, 3] ssh_packet_process: Dispatching handler for packet type 94
[2023/08/25 14:01:29.853204, 3] channel_rcv_data: Channel receiving 4 bytes data in 0 (local win=684438 remote win=1249988)
[2023/08/25 14:01:29.853214, 3] channel_default_bufferize: placing 4 bytes into channel buffer (stdout)
[2023/08/25 14:01:29.853223, 3] channel_rcv_data: Channel windows are now (local win=684434 remote win=1249988)
[2023/08/25 14:01:29.853241, 3] ssh_packet_need_rekey: rekey: [data_rekey_needed=0, out_blocks=1850, in_blocks=77965]
[2023/08/25 14:01:29.853251, 3] ssh_packet_need_rekey: rekey: [data_rekey_needed=0, out_blocks=1850, in_blocks=77965]
[2023/08/25 14:01:29.853263, 3] ssh_channel_read_timeout: Read (2) buffered : 4 bytes. Window: 684434
[2023/08/25 14:01:29.853272, 3] ssh_channel_read_timeout: Read (1) buffered : 2 bytes. Window: 684434
[2023/08/25 14:01:29.853287, 3] ssh_channel_read_timeout: Read (1) buffered : 1 bytes. Window: 684434
nc DEBUG: Received message:
<notification xmlns="urn:ietf:params:xml:ns:netconf:notification:1.0"><eventTime>2023-08-25T14:01:29.851561028+00:00</eventTime><supervision-notification xmlns="urn:o-ran:supervision:1.0"/></notification>
cli_send_recv: Timeout for receiving a reply expired.
>
notification (2023-08-25T14:01:29.851561028+00:00)
<supervision-notification xmlns="urn:o-ran:supervision:1.0"/>
>
nc DEBUG: Received message:
<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="91"><rpc-error><error-type>application</error-type><error-tag>operation-failed</error-tag><error-severity>error</error-severity><error-message xml:lang="en">EV ORIGIN: "change" ID 1 processing timed out.</error-message></rpc-error><rpc-error><error-type>application</error-type><error-tag>operation-failed</error-tag><error-severity>error</error-severity><error-message xml:lang="en">User callback failed.</error-message></rpc-error></rpc-reply>
It seems this is a limitation (or a feature) of libnetconf2. IO lock is being held on the socket while waiting for the reply so the notification cannot be asynchronously read. Improving this behavior should be possible but probably not worth the effort, which would like have to be substantial.