performance issue blocking https://github.com/ostreedev/ostree/pull/1513
Let's take the conversation about https://github.com/ostreedev/ostree/pull/1513 here.
I'm trying to analyze the issue a bit more to understand the root of performance issues; are we dealing with CPU overconsumption, or I/O, both, or something else too?
I logged into a jslave while it's otherwise idle, and I noticed:
Locally on my desktop:
[ 1.083047] systemd[1]: systemd 234 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN
default-hierarchy=hybrid)
OpenStack:
[ 4.212516] systemd[1]: systemd 234 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN default-hierarchy=hybrid)
So simply booting a VM is 4 times slower. Which is understandable...I assume there's contention with the underlying guests.
I assume there's contention with the underlying guests.
That's very likely. We could investigate some more here, though I think we should just sprint to getting an OCP instance set up and switch over ostree to help weed out issues.
There's two levels to this issue. One is doing qemu in VMs that have other concurrent workloads (containers, etc.)
However, for https://github.com/projectatomic/rpm-ostree/pull/1362 where we're provisioning full VMs, performance is still awful. I think this is a generic QE OpenStack issue, but that remains to be determined. It might be specific to nested virt in QEOS.
As another data point, I've been playing with GCE nested virt and the performance is (as you might expect) quite good:
Wed Jun 20 13:28:26 UTC 2018 overlay: Starting
Wed Jun 20 13:28:38 UTC 2018 overlay: Checkout complete
Wed Jun 20 13:28:56 UTC 2018 overlay: Commit complete
Wed Jun 20 13:29:12 UTC 2018 overlay: Deploy complete
Actually a good baseline data point is:
GCE:
[ 8.476232] systemd[1]: Successfully loaded SELinux policy in 793.738ms.
But in that test:
[ 98.191917] systemd[1]: Successfully loaded SELinux policy in 8.273604s.
Hmm, are we somehow not getting nested virt enabled perhaps?
Hmm, are we somehow not getting nested virt enabled perhaps?
Ah. Yes.
Hmm, it's also possible we're using Ceph backed VMs, which have notoriously lower disk write performance. I'll double check that.
Hmm, it's also possible we're using Ceph backed VMs
OK, I've confirmed this isn't the case. For posterity, can you post the same performance outputs here once you have nested virt working?