Serban Iorga
Serban Iorga
This is not related to AMD. It seems to be related to the host kernel version. On `5.4` I'm getting restore times that revolve around 30ms. On `4.19` I'm getting...
On host kernel `5.4` looks like the difference between 4ms and 30ms is caused by the jailer.
On host kernel `5.4` the problem seems to be caused by the cgroups. When I start jailer without applying the cgroups the overhead disappears.
The entire overhead comes from the `KVM_CREATE_VM` ioctl: When running within the jailer cgroup: ``` ioctl(12, KVM_CREATE_VM, 0) = 14 ``` When running without cgroups: ``` ioctl(12, KVM_CREATE_VM, 0) =...
For the moment I traced the overhead to the `kvm_arch_post_init_vm()` function in the host kernel. I will dig deeper.
Tracing the overhead further down the host kernel call stack: ``` kvm_arch_post_init_vm() -> kvm_mmu_post_init_vm() -> kvm_vm_create_worker_thread() -> kvm_vm_worker_thread() -> cgroup_attach_task_all() -> cgroup_attach_task() -> cgroup_migrate() -> cgroup_migrate_execute() -> cpuset_can_attach() -> percpu_down_write(&cpuset_rwsem)...
Looks like the overhead was introduced by [this](https://lore.kernel.org/lkml/[email protected]/) kernel patch. More specifically these 2 commits: [sched/core: Prevent race condition between cpuset and __sched_setscheduler()](https://github.com/torvalds/linux/commit/710da3c8ea7dfbd327920afd3831d8c82c42789d) [cgroup/cpuset: Convert cpuset_mutex to percpu_rwsem](https://github.com/torvalds/linux/commit/1243dc518c9da467da6635313a2dbb41b8ffc275)
Actually, to be even more specific, looks like the overhead was introduced only by the following commit: [cgroup/cpuset: Convert cpuset_mutex to percpu_rwsem](https://github.com/torvalds/linux/commit/1243dc518c9da467da6635313a2dbb41b8ffc275) I tried to use `cpuset_mutex` instead of `percpu_rwsem`...
Just a quick update. I stumbled upon some documentation. Looks like this is how a `percpu_rwsem` is supposed to work. Quoting from https://github.com/torvalds/linux/blob/master/Documentation/locking/percpu-rw-semaphore.rst > Locking for reading is very fast,...
I managed to reproduce the issue with this simple rust executable: ``` use kvm_ioctls::Kvm; use std::time::{Instant, Duration}; use std::thread; fn main() { thread::sleep(Duration::from_millis(500)); let kvm = Kvm::new().unwrap(); let start =...