virl2-client icon indicating copy to clipboard operation
virl2-client copied to clipboard

Performance optimization when querying many labs/nodes

Open sgherdao opened this issue 8 months ago • 5 comments

Hi,

I’ve noticed some performance issues when querying a controller with a large number of labs and nodes. For example, running the cmlutils cml ls command on a controller with ~40 labs takes approximately 26 seconds:

cml ls  1.18s user 0.23s system 5% cpu 26.219 total

When enabling logging, I can see redundant calls to the users, and topology endpoints.

2025-06-13 19:53:56,501 - httpx - INFO - HTTP Request: GET https://cml/api/v0/system_information "HTTP/1.1 200 OK"
2025-06-13 19:53:57,042 - httpx - INFO - HTTP Request: POST https://cml/api/v0/authenticate "HTTP/1.1 200 OK"
2025-06-13 19:53:57,120 - httpx - INFO - HTTP Request: GET https://cml/api/v0/authok "HTTP/1.1 200 OK"
2025-06-13 19:53:57,184 - httpx - INFO - HTTP Request: GET https://cml/api/v0/system_information "HTTP/1.1 200 OK"
2025-06-13 19:53:57,400 - httpx - INFO - HTTP Request: GET https://cml/api/v0/system_information "HTTP/1.1 200 OK"
2025-06-13 19:53:57,869 - httpx - INFO - HTTP Request: POST https://cml/api/v0/authenticate "HTTP/1.1 200 OK"
2025-06-13 19:53:57,945 - httpx - INFO - HTTP Request: GET https://cml/api/v0/authok "HTTP/1.1 200 OK"
2025-06-13 19:53:58,027 - httpx - INFO - HTTP Request: GET https://cml/api/v0/users "HTTP/1.1 200 OK"

# Querying labs' info

2025-06-13 19:53:58,109 - httpx - INFO - HTTP Request: GET https://cml/api/v0/labs "HTTP/1.1 200 OK"

2025-06-13 19:53:58,228 - httpx - INFO - HTTP Request: GET https://cml/api/v0/labs/73060ff2-a0c4-4475-9f55-e62401e2e5e2/topology "HTTP/1.1 200 OK"
2025-06-13 19:53:58,435 - httpx - INFO - HTTP Request: GET https://cml/api/v0/users "HTTP/1.1 200 OK"
2025-06-13 19:53:58,550 - httpx - INFO - HTTP Request: GET https://cml/api/v0/labs/90d09eb2-c9ec-458c-8429-28d898247ea5/topology "HTTP/1.1 200 OK"
2025-06-13 19:53:58,631 - httpx - INFO - HTTP Request: GET https://cml/api/v0/users "HTTP/1.1 200 OK"
2025-06-13 19:53:58,733 - httpx - INFO - HTTP Request: GET https://cml/api/v0/labs/db5cf0df-6970-4669-ae98-b6e440dd1e96/topology "HTTP/1.1 200 OK"
2025-06-13 19:53:58,814 - httpx - INFO - HTTP Request: GET https://cml/api/v0/users "HTTP/1.1 200 OK"
2025-06-13 19:53:58,908 - httpx - INFO - HTTP Request: GET https://cml/api/v0/labs/7acf9afb-2a3e-4574-a6e5-f11bf19390ae/topology "HTTP/1.1 200 OK"


<!snipped for brevity>

Labs on Server

#  Querying nodes' info

2025-06-13 19:54:08,597 - httpx - INFO - HTTP Request: GET https://cml-01/api/v0/labs/73060ff2-a0c4-4475-9f55-e62401e2e5e2/topology?exclude_configurations=true "HTTP/1.1 200 OK"
2025-06-13 19:54:08,827 - httpx - INFO - HTTP Request: GET https://cml-01/api/v0/users "HTTP/1.1 200 OK"
2025-06-13 19:54:08,914 - httpx - INFO - HTTP Request: GET https://cml-01/api/v0/labs/73060ff2-a0c4-4475-9f55-e62401e2e5e2/state "HTTP/1.1 200 OK"
2025-06-13 19:54:09,038 - httpx - INFO - HTTP Request: GET https://cml-01/api/v0/labs/90d09eb2-c9ec-458c-8429-28d898247ea5/topology?exclude_configurations=true "HTTP/1.1 200 OK"
2025-06-13 19:54:09,129 - httpx - INFO - HTTP Request: GET https://cml-01/api/v0/users "HTTP/1.1 200 OK"
2025-06-13 19:54:09,217 - httpx - INFO - HTTP Request: GET https://cml-01/api/v0/labs/90d09eb2-c9ec-458c-8429-28d898247ea5/state "HTTP/1.1 200 OK"

<!snipped for brevity>

In my use case, I perform a similar operation, but I do not query the state of the nodes. To replicate

for lab in client.all_labs():
    for node in lab.nodes():
        pass

Tested on 2.8.x main and v2.9.0 release branch.

I believe caching the results of GET users and GET topology could help improve performance. I could see most if not all of the data is already saved in protected attributes.

Thank you for looking into this! Let me know if you need any additional details or help with testing.

sgherdao avatar Jun 13 '25 19:06 sgherdao

Hello @sgherdao, there will be no changes in v2.9.0, but we will evaluate this and check if some improvements could be done in v2.9.1.

tmikuska avatar Jun 25 '25 14:06 tmikuska

In the meantime, would it help if you used a greater auto_sync_interval value in the client?

virlos avatar Jun 25 '25 14:06 virlos

Thanks for looking into it, it is not critical so whenever you find time.

In [20]: %%time
    ...: for lab in client.all_labs():
    ...:     for node in lab.nodes():
    ...:         pass

>>> .all_labs() call

2025-06-25 19:57:19,425 - httpx - INFO - HTTP Request: GET https://cml/api/v0/labs "HTTP/1.1 200 OK"
2025-06-25 19:57:19,544 - httpx - INFO - HTTP Request: GET https://cml/api/v0/labs/73060ff2-a0c4-4475-9f55-e62401e2e5e2/topology "HTTP/1.1 200 OK"

! snipped

>>> .nodes() call

2025-06-25 19:57:37,826 - httpx - INFO - HTTP Request: GET https://cml/api/v0/labs/8984d69e-7f8d-4291-992e-8f0093562359/topology?exclude_configurations=true "HTTP/1.1 200 OK"
2025-06-25 19:57:37,902 - httpx - INFO - HTTP Request: GET https://cml/api/v0/users "HTTP/1.1 200 OK"

! snipped

CPU times: user 746 ms, sys: 128 ms, total: 874 ms
Wall time: 18.9 s

Second time around the .all_labs() method gets cached data and the time is reduced by half.

In [21]: %%time
    ...: for lab in client.all_labs():
    ...:     for node in lab.nodes():
    ...:         pass

> Only .nodes() generates HTTP requests

2025-06-25 19:57:52,056 - httpx - INFO - HTTP Request: GET https://cml/api/v0/labs "HTTP/1.1 200 OK"
2025-06-25 19:57:52,157 - httpx - INFO - HTTP Request: GET https://cml/api/v0/labs/73060ff2-a0c4-4475-9f55-e62401e2e5e2/topology?exclude_configurations=true "HTTP/1.1 200 OK"

! snipped

2025-06-25 19:58:00,689 - httpx - INFO - HTTP Request: GET https://cml/api/v0/users "HTTP/1.1 200 OK"
CPU times: user 359 ms, sys: 59.5 ms, total: 418 ms
Wall time: 8.85 s

The auto_sync_interval has no impact in this case, I have tried a few different values 1, 3, 5, 10 seconds, each time from a fresh session. I get the same results ~18 seconds when I run the loop the first time, ~9 seconds the second time.

sgherdao avatar Jun 25 '25 19:06 sgherdao

You can save the nodes() result - unless there is a change in the node set itself, the instances should remain valid.

virlos avatar Jun 26 '25 06:06 virlos

In my case, I don't run the loop several times, it is very similar to the cml ls example. Each time the command is invoked, for each lab, you would send:

  • 2 HTTP requests to users endpoint
  • 2 HTTP request to labs/{lab_id}/topology endpoint (with or without exclude_configurations)
  • 1 HTTP request to labs/{lab_id}/state

Ideally, all of this info could be fetched with the below, although I understand ideal may not be practical

  • 1 HTTP request to users for all the labs
  • 1 HTTP request to labs/{lab_id}/topology per lab
  • 1 HTTP request to labs/{lab_id}/state per lab

sgherdao avatar Jun 26 '25 10:06 sgherdao