pyvmomi icon indicating copy to clipboard operation
pyvmomi copied to clipboard

multithreaded access to properties is slower than serial access

Open veber-alex opened this issue 1 year ago • 4 comments

Describe the bug

I noticed that accessing host properties from multiple threads is slower than doing so serialy. I wrote a script to reproduce the issue:

# ruff: noqa

import ssl
from threading import Thread
import time

from pyVim.connect import SmartConnect
from pyVmomi import vim

NUM_THREADS = 8
HOST = ""
PASSWORD = ""

context = ssl._create_unverified_context()
con = SmartConnect(host=HOST, pwd=PASSWORD, sslContext=context)

host = con.content.viewManager.CreateContainerView(con.content.rootFolder, [vim.HostSystem], True).view[0]


threads = []
for i in range(NUM_THREADS):

    def print_driver(i):
        print(host.config.network.pnic[i].driver)

    t = Thread(target=print_driver, args=(i,))
    threads.append(t)

start = time.time()
for t in threads:
    t.start()
for t in threads:
    t.join()
end = time.time()
print(f"multi threaded: {end - start}")


start = time.time()
for i in range(NUM_THREADS):
    print(host.config.network.pnic[i].driver)
end = time.time()
print(f"single threaded: {end - start}")

On my host with 8 vmnics I get:

multi threaded: 11.908450603485107
single threaded: 3.76969313621521

The single threaded performance is stable around 4 seconds but the multithreaded performance jumps around between 6-12 seconds each run. The script can be changed to always access pnic[0] with the same result. The more threads run at the same time, the slower it gets.

Reproduction steps

  1. set NUM_THREADS, HOST, PASSWORD
  2. run the repro script

Expected behavior

I expect multithreaded performance to be better or equal to serial performance.

Additional context

No response

veber-alex avatar Jul 20 '24 08:07 veber-alex

I did another test where I connect to 2 different hosts. Here is the code:

# ruff: noqa

import ssl
from threading import Thread
import time

from pyVim.connect import SmartConnect
from pyVmomi import vim

NUM_VMNICS = 4
HOST = ""
HOST2 = ""
PASSWORD = ""

context = ssl._create_unverified_context()

start = time.time()
for host in [HOST, HOST2]:
    con = SmartConnect(host=host, pwd=PASSWORD, sslContext=context)
    host = con.content.viewManager.CreateContainerView(con.content.rootFolder, [vim.HostSystem], True).view[0]
    for i in range(NUM_VMNICS + 1):
        print(f"host {host.name} - {host.config.network.pnic[i].driver}")

end = time.time()
print(f"single threaded: {end - start}")

threads = []
for host in [HOST, HOST2]:

    def print_driver(host):
        con = SmartConnect(host=host, pwd=PASSWORD, sslContext=context)
        host = con.content.viewManager.CreateContainerView(con.content.rootFolder, [vim.HostSystem], True).view[0]
        for i in range(NUM_VMNICS + 1):
            print(f"host {host.name} - {host.config.network.pnic[i].driver}")

    t = Thread(target=print_driver, args=(host,))
    threads.append(t)

start = time.time()
for t in threads:
    t.start()
for t in threads:
    t.join()
end = time.time()
print(f"multi threaded: {end - start}")

My results are:

single threaded: 5.33910870552063
multi threaded: 5.063408136367798

This tells me there is a bottleneck in pyvmomi itself and not in the esxi host.

veber-alex avatar Jul 21 '24 19:07 veber-alex

I did more tests and it looks like the performance issues are caused by python 3.7. Testing with python 3.11 and 3.12 the performance is much better.

veber-alex avatar Jul 22 '24 13:07 veber-alex

Also you could consider using multiprocessing instead of threading, and create a service instance (i.e. SmartConnect) in each of them, so you aren't leveraging the same connection.

prziborowski avatar Jul 22 '24 17:07 prziborowski

I decided to reopen the issue after further testing.

While the performance numbers with newer versions of python are better the trend is still the same. Connecting from multiple threads to the same host is slower than using one thread and running the code serialy and when connecting to two different hosts the performance improvement of using two threads is tiny when it theory it should be almost linear with the number of hosts.

Also you could consider using multiprocessing instead of threading, and create a service instance (i.e. SmartConnect) in each of them, so you aren't leveraging the same connection.

Thanks but that's not an option in my codebase.

veber-alex avatar Jul 22 '24 20:07 veber-alex