libyang icon indicating copy to clipboard operation
libyang copied to clipboard

optimize when validation with layering node_when objects

Open n-toshikazu opened this issue 1 year ago • 17 comments

I optimize when validation in libyang that reduce about 50% execution time. If patch(0001-validation-OPTIMIZE-construct-layering-node_when.patch) can be acceptable, please merge to libyang source tree.

backgroud

Our sysrepo datastore is tremendously grown up, therefore, sr_apply_change() on edit-config reply is simultaneously slower.

We calibrated each execution in sr_apply_change() and as results, sr_modinfo_validate() hold about 95% execution time. => Our calibration data owns 456917 node_when and 90123 node_type, others are 0 as validation target nodes.

calibration of lyd_validate_module()

[Calibration result with libyang-3.7.8]

patches to use for calibration

o 0001-libyang-validation-performance-check.patch calibrating time and accounting about when validation Note: Need libnetconf2 cmake option -DENABLE_EXAMPLES:String=Off to avoid compile error o 0001-sysrepo-validation-from-sr_apply_changes-calibration.patch reporting time and accounting in libyang

reported log

sr_lyd_validate_module() total=361108 msec

  • when validation takes 69836 msec o total when=456917 o topmost when=456917 o when rm=456917 o validation eval=456917 == result(disable=424534/enable=32383/autodel=0) == autodel type=0
  • incompletely validated terminal 90123 values 291271 msec

Current when validations are executed for all when nodes without autodel because they are always started from the lowest descendants to the highest ancestor.

I improve this inefficient behavior by splitting node_when entries into layering node_when objects(ly_set). Validation always starts from topmost when nodes and If when is disable, its descendants can be dropped without any search or update costs.

[Calibration result with libyang-3.7.8 + patch]

patches to use for calibration

o 0001-validation-OPTIMIZE-construct-layering-node_when.patch optimize when validation with layering node_when o 0002-libyang-validation-performance-check.patch calibrating time and accounting about when validation Note: Need libnetconf2 cmake option -DENABLE_EXAMPLES:String=Off to avoid compile error o 0001-sysrepo-validation-from-sr_apply_changes-calibration.patch reporting time and accounting in libyang

reported log

sr_lyd_validate_module() total=327047 msec

  • when validation takes 31934 msec o total when=456917 o topmost when=204690 o when rm=247996 o inner set(new=218010/free=218010) == inner type(container=211350/list=6660) o validation eval=247996 == result(disable=215613/enable=32383/autodel=208921) == autodel type=0
  • incompletely validated terminal 90123 values 295112 msec

More than 50% execution time(69836 msec -> 31934 msec) is reduced. We calibrate each pattern more than 5 times and every score is roughly the same.

Regards.

n-toshikazu avatar Feb 10 '25 10:02 n-toshikazu

I have nothing against optimizations but you must not modify struct lyd_node. So please adjust your patch accordingly or provide a smaller use-case reproducing your use-case and I can try to optimize it myself.

michalvasko avatar Feb 14 '25 10:02 michalvasko

I have nothing against optimizations but you must not modify struct lyd_node. So please adjust your patch accordingly or provide a smaller use-case reproducing your use-case and I can try to optimize it myself.

I appreciate your review and suggestion. Since our product owner won't permit to export configuration data, I just rework patch according to your advise. As results, optimization effects can be kept, virtual memory usage can be stabled, details are below.

  • Take2-0001-validation-OPTIMIZE-construct-layering-node_when.patch Change from previous patch.
    1. Instead of changing struct lyd_node, declare new struct lyd_when_chain.
    2. Declare new validation flag LYD_VALIDATE_LAYERING_WHEN because lyd_validate_unres_when() cast lyd_node or lyd_when_chain struct now, (I do not modify whole of validation entry points, there still be validated with struct lyd_node)
    3. Rewrite new when validation setup parts with new structure and flag.
  • Take2-0002-libyang-validation-performance-check.patch Rewrite with new structure accounting. Calibrate setup time in addition to validation time because patch append allocation or lookups in validation node setup phase.

Report log

sr_lyd_validate_module() total=327167 msec when validation cost total=321847 msec

  • when validation takes 31186 msec o total when=456917(13320) o topmost when=204690 o when rm=247996 o chain (alloc=456917/free=456917) o inner set(new=218010/free=218010) == inner type(container=211350/list=6660) o validation eval=247996 == result(disable=215613/enable=32383/autodel=208921) == autodel type=0
  • incompletely validated terminal 90123 values 290660 msec

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 148 46.4 3.6 791464 599560 ? Ssl 17:07 17:18 /usr/sbin/netopeer2-server -t 3600 -c MSG

Validation time can be improved like as previous patch without large setup overheads. libyang-3.7.8 total=367797msec (setup=5283msec/validation=70028msec/others=292485msec) libyang-3.7.8 + current patch total=332869msec (setup=5310msec/validation=31587msec/others=295972msec) libyang-3.7.8 + previous patch total=330919msec (setup=5280msec/validation=31399msec/others=294240)

I append VSZ/RSS of netopeer2 from last time and I can improve memory usage from previous patch. libyang-3.7.8 VSZ=790780/RSS=581440 libyang-3.7.8 + current patch (nearly equal to libyang-3.7.8) VSZ=791464/RSS=599560 ibyang-3.7.8 + previous patch (must not modify struct lyd_node) VSZ=856316/RSS=616068

And I'm sorry but if there are any improper naming or style in my patch, please revise, I'm not familiar with them...

Regards.

n-toshikazu avatar Feb 18 '25 11:02 n-toshikazu

Okay, thanks for the effort but it is always better to discuss the details of an optimization before implementing it. If I understood your changes correctly, all they effectively do is avoid this inefficient loop.

Your solution seems fine and is quite straightforward, create a "tree" of the nodes with when so you can quickly find all the descendants of an auto-deleted node with false when. However, for further maintenance these changes are not ideal.

When writing that loop, I did not even try to optimize it as I was not sure it was worth it but I suppose for large data trees with lots of when it may make a significant difference. When trying to come up with a simpler solution, I think one could use the fact that the unres when set follows DFS of the data tree. Meaning you should be able to iterate over all the following nodes checking their ancestor is the node you are deleting until you found a node that is not. And you remove from the set all the nodes that are.

What do you think about this? Should be quite a small change and I would think you should get similar results to yours with much less code changes.

michalvasko avatar Feb 18 '25 14:02 michalvasko

I appreciate your follow-up.

Unless my understanding about your explanation is wrong, "DFS of the data tree" is already executed with LYD_TREE_DFS_BEGIN() and LYD_TREE_DFS_END() loops at lyd_validate_subtree(), node_when objects are already sorted with DFS order. And "iterate over all the following nodes checking their ancestor" is implemented at lyd_validate_autodel_node_del() too.

Thus, when we reverse lyd_node objects before starting validation loop at lyd_validate_unres_when() and at lyd_validate_autodel_node_del(), change LYD_TREE_DFS_BEGIN() loop condition more effective, we may achieve similar results.

[lyd_validate_autodel_node_del()]

    if (node_when && node_when->count) {
        /* remove nested from node_when set */
        LYD_TREE_DFS_BEGIN(del, iter) {
            if ((del != iter) && ly_set_contains(node_when, iter, &idx)) {
                ly_set_rm_index(node_when, idx, NULL);
            }
            LYD_TREE_DFS_END(del, iter);
        }
    }

I have to avoid inefficient searching from the large node_when above. My current idea is appending "lysc_has_when(iter->schema)" before ly_set_contains(node_when, iter, &idx), however, I can not understarnd your "until you found a node that is not." meaning... (Less code changes with reversing and contition changing)

Please tell me what you think.

Regards.

n-toshikazu avatar Feb 19 '25 10:02 n-toshikazu

Could you please somehow provide the use-case I can test on? Either by sharing your files privately by sending them to me via email or, better yet, creating a small use-case with renamed nodes, which I can directly add to the tests.

Because some of my assumptions are wrong. If the nodes are sorted in DFS and are traversed from the end, nested nodes should always be resolved before their parents so deleting anything from the set (unresolved descendants) should never be needed.

michalvasko avatar Feb 19 '25 11:02 michalvasko

I'm trying to prepare dummy configuration with hearing howto, please wait for a while.

If the nodes are sorted in DFS and are traversed from the end, nested nodes should always be resolved before their parents so deleting anything from the set (unresolved descendants) should never be needed.

I'd like to confirm about this comment implication. Current libyang traverses from the end(set[set.count-1]) to start(set[0]) with DFS sorted set(unresolved descendants). As stated this comment, there is no unresolved descendant to be deleted, this always takes O(n) costs. However, against this theory, my proposal is that libyang traverses from start to the end (expressed by reversing) and unless ancestor when is enabled, delete its unresolved descendant(when)s at the same time, this takes O(n) - 'autodel' costs. So if you are negative to my such approach, I have no another idea any more.(Sorry if I misread)

Regards.

n-toshikazu avatar Feb 20 '25 08:02 n-toshikazu

Just please post any small example that actually executes this code.

michalvasko avatar Feb 20 '25 13:02 michalvasko

I post example. Please confirm with following procedures.

At netopeer2-server

unzip testcase.zip and cd testcase ./make_test_xml.sh ./startup.sh

At netoper2-cli

edit-config --target running --config=edit.xml.txt

RPC TAT is about 5 minutes in our environment. If you feel longer, please adjust by shell argument(default num=700). $ ./make_test_xml.sh (num)

Regards.

n-toshikazu avatar Feb 28 '25 09:02 n-toshikazu

I appreciate you made the effort to create this example although I expected some small data files, not 14MB XML file. But fine, I should be able to extract what I need, if only the scripts worked. Perhaps because of some last-minute changes, the generated data are not valid. I have given up after fixing some undefined m_index variables that were supposed to be just index in make_test_xml.sh. So, please fix the scripts so that the steps you posted work.

michalvasko avatar Feb 28 '25 10:02 michalvasko

Could you try this commandline instead of "./make_test_xml.sh". ./make_test_xml.sh 10

Least data size may be around 3MB now and if you want more compact data, we will try to retake.

Regards.

n-toshikazu avatar Feb 28 '25 10:02 n-toshikazu

Yeah, I have tried even ./make_test_xml.sh 1 but there still lots of data (them being invalid is the problem, not so that they are big).

michalvasko avatar Feb 28 '25 11:02 michalvasko

We rework script. Please replace with make_test_xml.zip and confirm.

n-toshikazu avatar Mar 03 '25 09:03 n-toshikazu

Please try the steps you posted before, they result in lots of validation errors. The very first one

[ERR] Invalid leafref value "PORT-1-1-out" - no target instance "/org-openroadm-device/circuit-packs[circuit-pack-name=current()/../circuit-pack-name]/ports/port-name" with the same value. (path "/org-openroadm-device:org-openroadm-device/degree[degree-number='1']/connection-ports[index='2']/port-name")

caused by there being no circuit-packs data generated at all.

michalvasko avatar Mar 03 '25 10:03 michalvasko

I apologize previous half-assed posting. We trimmed validation errors. Please use make_test_xml-v3.zip.

n-toshikazu avatar Mar 04 '25 11:03 n-toshikazu

Thanks, that passed validation but did not trigger the code I wanted it to. So please try this patch for libyang and it should fail on an assert on the use-case I need to reproduce. If your data fail to, I will consider committing that patch in the repo.

michalvasko avatar Mar 05 '25 08:03 michalvasko

We tried your assertion patch with our original configuration, not be asserted.

n-toshikazu avatar Mar 05 '25 10:03 n-toshikazu

Thanks, that is what I thought. So I will try to commit the change and hopefully someone will contact us and provide a suitable use-case. In the meantime, you can use a patch that completely removes that loop and it should solve the efficiency issue.

michalvasko avatar Mar 05 '25 10:03 michalvasko