Performance optimization when querying many labs/nodes
Hi,
I’ve noticed some performance issues when querying a controller with a large number of labs and nodes. For example, running the cmlutils cml ls command on a controller with ~40 labs takes approximately 26 seconds:
cml ls 1.18s user 0.23s system 5% cpu 26.219 total
When enabling logging, I can see redundant calls to the users, and topology endpoints.
2025-06-13 19:53:56,501 - httpx - INFO - HTTP Request: GET https://cml/api/v0/system_information "HTTP/1.1 200 OK"
2025-06-13 19:53:57,042 - httpx - INFO - HTTP Request: POST https://cml/api/v0/authenticate "HTTP/1.1 200 OK"
2025-06-13 19:53:57,120 - httpx - INFO - HTTP Request: GET https://cml/api/v0/authok "HTTP/1.1 200 OK"
2025-06-13 19:53:57,184 - httpx - INFO - HTTP Request: GET https://cml/api/v0/system_information "HTTP/1.1 200 OK"
2025-06-13 19:53:57,400 - httpx - INFO - HTTP Request: GET https://cml/api/v0/system_information "HTTP/1.1 200 OK"
2025-06-13 19:53:57,869 - httpx - INFO - HTTP Request: POST https://cml/api/v0/authenticate "HTTP/1.1 200 OK"
2025-06-13 19:53:57,945 - httpx - INFO - HTTP Request: GET https://cml/api/v0/authok "HTTP/1.1 200 OK"
2025-06-13 19:53:58,027 - httpx - INFO - HTTP Request: GET https://cml/api/v0/users "HTTP/1.1 200 OK"
# Querying labs' info
2025-06-13 19:53:58,109 - httpx - INFO - HTTP Request: GET https://cml/api/v0/labs "HTTP/1.1 200 OK"
2025-06-13 19:53:58,228 - httpx - INFO - HTTP Request: GET https://cml/api/v0/labs/73060ff2-a0c4-4475-9f55-e62401e2e5e2/topology "HTTP/1.1 200 OK"
2025-06-13 19:53:58,435 - httpx - INFO - HTTP Request: GET https://cml/api/v0/users "HTTP/1.1 200 OK"
2025-06-13 19:53:58,550 - httpx - INFO - HTTP Request: GET https://cml/api/v0/labs/90d09eb2-c9ec-458c-8429-28d898247ea5/topology "HTTP/1.1 200 OK"
2025-06-13 19:53:58,631 - httpx - INFO - HTTP Request: GET https://cml/api/v0/users "HTTP/1.1 200 OK"
2025-06-13 19:53:58,733 - httpx - INFO - HTTP Request: GET https://cml/api/v0/labs/db5cf0df-6970-4669-ae98-b6e440dd1e96/topology "HTTP/1.1 200 OK"
2025-06-13 19:53:58,814 - httpx - INFO - HTTP Request: GET https://cml/api/v0/users "HTTP/1.1 200 OK"
2025-06-13 19:53:58,908 - httpx - INFO - HTTP Request: GET https://cml/api/v0/labs/7acf9afb-2a3e-4574-a6e5-f11bf19390ae/topology "HTTP/1.1 200 OK"
<!snipped for brevity>
Labs on Server
# Querying nodes' info
2025-06-13 19:54:08,597 - httpx - INFO - HTTP Request: GET https://cml-01/api/v0/labs/73060ff2-a0c4-4475-9f55-e62401e2e5e2/topology?exclude_configurations=true "HTTP/1.1 200 OK"
2025-06-13 19:54:08,827 - httpx - INFO - HTTP Request: GET https://cml-01/api/v0/users "HTTP/1.1 200 OK"
2025-06-13 19:54:08,914 - httpx - INFO - HTTP Request: GET https://cml-01/api/v0/labs/73060ff2-a0c4-4475-9f55-e62401e2e5e2/state "HTTP/1.1 200 OK"
2025-06-13 19:54:09,038 - httpx - INFO - HTTP Request: GET https://cml-01/api/v0/labs/90d09eb2-c9ec-458c-8429-28d898247ea5/topology?exclude_configurations=true "HTTP/1.1 200 OK"
2025-06-13 19:54:09,129 - httpx - INFO - HTTP Request: GET https://cml-01/api/v0/users "HTTP/1.1 200 OK"
2025-06-13 19:54:09,217 - httpx - INFO - HTTP Request: GET https://cml-01/api/v0/labs/90d09eb2-c9ec-458c-8429-28d898247ea5/state "HTTP/1.1 200 OK"
<!snipped for brevity>
In my use case, I perform a similar operation, but I do not query the state of the nodes. To replicate
for lab in client.all_labs():
for node in lab.nodes():
pass
Tested on 2.8.x main and v2.9.0 release branch.
I believe caching the results of GET users and GET topology could help improve performance. I could see most if not all of the data is already saved in protected attributes.
Thank you for looking into this! Let me know if you need any additional details or help with testing.
Hello @sgherdao, there will be no changes in v2.9.0, but we will evaluate this and check if some improvements could be done in v2.9.1.
In the meantime, would it help if you used a greater auto_sync_interval value in the client?
Thanks for looking into it, it is not critical so whenever you find time.
In [20]: %%time
...: for lab in client.all_labs():
...: for node in lab.nodes():
...: pass
>>> .all_labs() call
2025-06-25 19:57:19,425 - httpx - INFO - HTTP Request: GET https://cml/api/v0/labs "HTTP/1.1 200 OK"
2025-06-25 19:57:19,544 - httpx - INFO - HTTP Request: GET https://cml/api/v0/labs/73060ff2-a0c4-4475-9f55-e62401e2e5e2/topology "HTTP/1.1 200 OK"
! snipped
>>> .nodes() call
2025-06-25 19:57:37,826 - httpx - INFO - HTTP Request: GET https://cml/api/v0/labs/8984d69e-7f8d-4291-992e-8f0093562359/topology?exclude_configurations=true "HTTP/1.1 200 OK"
2025-06-25 19:57:37,902 - httpx - INFO - HTTP Request: GET https://cml/api/v0/users "HTTP/1.1 200 OK"
! snipped
CPU times: user 746 ms, sys: 128 ms, total: 874 ms
Wall time: 18.9 s
Second time around the .all_labs() method gets cached data and the time is reduced by half.
In [21]: %%time
...: for lab in client.all_labs():
...: for node in lab.nodes():
...: pass
> Only .nodes() generates HTTP requests
2025-06-25 19:57:52,056 - httpx - INFO - HTTP Request: GET https://cml/api/v0/labs "HTTP/1.1 200 OK"
2025-06-25 19:57:52,157 - httpx - INFO - HTTP Request: GET https://cml/api/v0/labs/73060ff2-a0c4-4475-9f55-e62401e2e5e2/topology?exclude_configurations=true "HTTP/1.1 200 OK"
! snipped
2025-06-25 19:58:00,689 - httpx - INFO - HTTP Request: GET https://cml/api/v0/users "HTTP/1.1 200 OK"
CPU times: user 359 ms, sys: 59.5 ms, total: 418 ms
Wall time: 8.85 s
The auto_sync_interval has no impact in this case, I have tried a few different values 1, 3, 5, 10 seconds, each time from a fresh session. I get the same results ~18 seconds when I run the loop the first time, ~9 seconds the second time.
You can save the nodes() result - unless there is a change in the node set itself, the instances should remain valid.
In my case, I don't run the loop several times, it is very similar to the cml ls example. Each time the command is invoked, for each lab, you would send:
- 2 HTTP requests to
usersendpoint - 2 HTTP request to
labs/{lab_id}/topologyendpoint (with or without exclude_configurations) - 1 HTTP request to
labs/{lab_id}/state
Ideally, all of this info could be fetched with the below, although I understand ideal may not be practical
- 1 HTTP request to
usersfor all the labs - 1 HTTP request to
labs/{lab_id}/topologyper lab - 1 HTTP request to
labs/{lab_id}/stateper lab