Qu Wenruo
2021-Dec-14 00:41 UTC
Libvirt on little.BIG ARM systems unable to start guest if no cpuset is provided
On 2021/12/14 00:49, Marc Zyngier wrote:> On Mon, 13 Dec 2021 16:06:14 +0000, > Peter Maydell <peter.maydell at linaro.org> wrote: >> >> KVM on big.little setups is a kernel-level question really; I've >> cc'd the kvmarm list. > > Thanks Peter for throwing us under the big-little bus! ;-) > >> >> On Mon, 13 Dec 2021 at 15:02, Qu Wenruo <quwenruo.btrfs at gmx.com> wrote: >>> >>> >>> >>> On 2021/12/13 21:17, Michal Pr?vozn?k wrote: >>>> On 12/11/21 02:58, Qu Wenruo wrote: >>>>> Hi, >>>>> >>>>> Recently I got my libvirt setup on both RK3399 (RockPro64) and RPI CM4, >>>>> with upstream kernels. >>>>> >>>>> For RPI CM4 its mostly smooth sail, but on RK3399 due to its little.BIG >>>>> setup (core 0-3 are 4x A55 cores, and core 4-5 are 2x A72 cores), it >>>>> brings quite some troubles for VMs. >>>>> >>>>> In short, without proper cpuset to bind the VM to either all A72 cores >>>>> or all A55 cores, the VM will mostly fail to boot. > > s/A55/A53/. There were thankfully no A72+A55 ever produced (just the > though of it makes me sick). > >>>>> >>>>> Currently the working xml is: >>>>> >>>>> <vcpu placement='static' cpuset='4-5'>2</vcpu> >>>>> <cpu mode='host-passthrough' check='none'/> >>>>> >>>>> But even with vcpupin, pinning each vcpu to each physical core, VM will >>>>> mostly fail to start up due to vcpu initialization failed with -EINVAL. > > Disclaimer: I know nothing about libvirt (and no, I don't want to > know! ;-). > > However, for things to be reliable, you need to taskset the whole QEMU > process to the CPU type you intend to use.Yep, that's what I'm doing.> That's because, AFAICT, > QEMU will snapshot the system registers outside of the vcpu threads, > and attempt to use the result to configure the actual vcpu threads. If > they happen to run on different CPU types, the sysregs will differ in > incompatible ways and an error will be returned. This may or may not > be a bug, I don't know (I see it as a feature).Then this brings another question. If we can pin each vCPU to each physical core (both little and big), then as long as the registers are per-vCPU based, it should be able to pass both big and little cores to the VM. Yeah, I totally understand this screw up the scheduling, but that's at least what (some insane) users want (just like me).> > If you are annoyed with this behaviour, you can always use a different > VMM that won't care about such difference (crosvm or kvmtool, to name > a few).Sounds pretty interesting, a new world but without libvirt...> However, the guest will be able to observe the migration from > one cpu type to another. This may or may not affect your guest's > behaviour.Not sure if it's possible to pin each vCPU thread to each core, but let me try.> > I personally find the QEMU behaviour reasonable. KVM/arm64 make little > effort to support BL virtualisation as design choice (I value my > sanity), and userspace is still in control of the placement. > >>>>> This brings a problem, in theory RK3399 SoC should out-perform BCM2711 >>>>> in multi-core performance, but if a VM can only be bind to either A72 or >>>>> A55 cores, then the performance is no longer competitive against >>>>> BCM2711, wasting the PCIE 2.0 x4 capacity. > > Vote with your money. If you too think that BL systems are utter crap, > do not buy them! Or treat them as 'two systems in one', which is what > I do. From that angle, this is of great value! ;-)I guess I'm setting my expectation too high for rk3399, just seeing its multi-thread perf beating RPI4 and has better IO doesn't mean it's a perfect fit for VM. Hopes rk3588 could change it. For now I guess overclocking the big core to 2.2G is what I can do to grab more performance from the board. Thanks for your detailed reason and new advices! Qu> >>>>> I guess with projects like Asahi Linux making progress, there will be >>>>> more and more such problems. > > Well, not more than any other big-little system. They suffer from > similar issues, plus those resulting from not fully implementing the > ARM architecture. They are however more consistent in their feature > set than the ARM implementations ever were. > >>>>> >>>>> Any clue on how to properly pass all physical CPU cores to VM for >>>>> little.BIG setup? >>>>> >>>> >>>> I have never met big.LITTLE but my understanding was that those big >>>> cores are compatible with little ones and the only difference is that >>>> the big ones are shut off if there's no demand (to save energy) leaving >>>> only the little ones running. > > No. They are all notionally running. It is the scheduler that places > tasks (such as a vcpu) on a 'convenient' core, where 'convenient' > depends on the scheduling policy. > > HTH, > > M. >
Michal Prívozník
2021-Dec-14 07:53 UTC
Libvirt on little.BIG ARM systems unable to start guest if no cpuset is provided
On 12/14/21 01:41, Qu Wenruo wrote:> > > On 2021/12/14 00:49, Marc Zyngier wrote: >> On Mon, 13 Dec 2021 16:06:14 +0000, >> Peter Maydell <peter.maydell at linaro.org> wrote: >>> >>> KVM on big.little setups is a kernel-level question really; I've >>> cc'd the kvmarm list. >> >> Thanks Peter for throwing us under the big-little bus! ;-) >> >>> >>> On Mon, 13 Dec 2021 at 15:02, Qu Wenruo <quwenruo.btrfs at gmx.com> wrote: >>>> >>>> >>>> >>>> On 2021/12/13 21:17, Michal Pr?vozn?k wrote: >>>>> On 12/11/21 02:58, Qu Wenruo wrote: >>>>>> Hi, >>>>>> >>>>>> Recently I got my libvirt setup on both RK3399 (RockPro64) and RPI >>>>>> CM4, >>>>>> with upstream kernels. >>>>>> >>>>>> For RPI CM4 its mostly smooth sail, but on RK3399 due to its >>>>>> little.BIG >>>>>> setup (core 0-3 are 4x A55 cores, and core 4-5 are 2x A72 cores), it >>>>>> brings quite some troubles for VMs. >>>>>> >>>>>> In short, without proper cpuset to bind the VM to either all A72 >>>>>> cores >>>>>> or all A55 cores, the VM will mostly fail to boot. >> >> s/A55/A53/. There were thankfully no A72+A55 ever produced (just the >> though of it makes me sick). >> >>>>>> >>>>>> Currently the working xml is: >>>>>> >>>>>> ??? <vcpu placement='static' cpuset='4-5'>2</vcpu> >>>>>> ??? <cpu mode='host-passthrough' check='none'/> >>>>>> >>>>>> But even with vcpupin, pinning each vcpu to each physical core, VM >>>>>> will >>>>>> mostly fail to start up due to vcpu initialization failed with >>>>>> -EINVAL. >> >> Disclaimer: I know nothing about libvirt (and no, I don't want to >> know! ;-). >> >> However, for things to be reliable, you need to taskset the whole QEMU >> process to the CPU type you intend to use. > > Yep, that's what I'm doing. > >> That's because, AFAICT, >> QEMU will snapshot the system registers outside of the vcpu threads, >> and attempt to use the result to configure the actual vcpu threads. If >> they happen to run on different CPU types, the sysregs will differ in >> incompatible ways and an error will be returned. This may or may not >> be a bug, I don't know (I see it as a feature). > > Then this brings another question. > > If we can pin each vCPU to each physical core (both little and big), > then as long as the registers are per-vCPU based, it should be able to > pass both big and little cores to the VM. > > Yeah, I totally understand this screw up the scheduling, but that's at > least what (some insane) users want (just like me). > >> >> If you are annoyed with this behaviour, you can always use a different >> VMM that won't care about such difference (crosvm or kvmtool, to name >> a few). > > Sounds pretty interesting, a new world but without libvirt... > >> However, the guest will be able to observe the migration from >> one cpu type to another. This may or may not affect your guest's >> behaviour. > > Not sure if it's possible to pin each vCPU thread to each core, but let > me try. >Sure it is, for instance: <cputune> <vcpupin vcpu="0" cpuset="1-4,^2"/> <vcpupin vcpu="1" cpuset="0,1"/> <vcpupin vcpu="2" cpuset="2,3"/> <vcpupin vcpu="3" cpuset="0,4"/> <emulatorpin cpuset="1-3"/> <iothreadpin iothread="1" cpuset="5,6"/> <iothreadpin iothread="2" cpuset="7,8"/> </cputune> pins vCPU#0 onto host CPUs 1-4, excluding 2; vCPU#1 onto host CPUs 0-1 and so on. You can also pin emulator (QEMU) and its iothreads. It's documented here: https://libvirt.org/formatdomain.html#cpu-tuning Michal
Marc Zyngier
2021-Dec-14 09:34 UTC
Libvirt on little.BIG ARM systems unable to start guest if no cpuset is provided
On Tue, 14 Dec 2021 00:41:01 +0000, Qu Wenruo <quwenruo.btrfs at gmx.com> wrote:> > > > On 2021/12/14 00:49, Marc Zyngier wrote: > > On Mon, 13 Dec 2021 16:06:14 +0000, > > Peter Maydell <peter.maydell at linaro.org> wrote: > >> > >> KVM on big.little setups is a kernel-level question really; I've > >> cc'd the kvmarm list. > > > > Thanks Peter for throwing us under the big-little bus! ;-) > > > >> > >> On Mon, 13 Dec 2021 at 15:02, Qu Wenruo <quwenruo.btrfs at gmx.com> wrote: > >>> > >>> > >>> > >>> On 2021/12/13 21:17, Michal Pr?vozn?k wrote: > >>>> On 12/11/21 02:58, Qu Wenruo wrote: > >>>>> Hi, > >>>>> > >>>>> Recently I got my libvirt setup on both RK3399 (RockPro64) and RPI CM4, > >>>>> with upstream kernels. > >>>>> > >>>>> For RPI CM4 its mostly smooth sail, but on RK3399 due to its little.BIG > >>>>> setup (core 0-3 are 4x A55 cores, and core 4-5 are 2x A72 cores), it > >>>>> brings quite some troubles for VMs. > >>>>> > >>>>> In short, without proper cpuset to bind the VM to either all A72 cores > >>>>> or all A55 cores, the VM will mostly fail to boot. > > > > s/A55/A53/. There were thankfully no A72+A55 ever produced (just the > > though of it makes me sick). > > > >>>>> > >>>>> Currently the working xml is: > >>>>> > >>>>> <vcpu placement='static' cpuset='4-5'>2</vcpu> > >>>>> <cpu mode='host-passthrough' check='none'/> > >>>>> > >>>>> But even with vcpupin, pinning each vcpu to each physical core, VM will > >>>>> mostly fail to start up due to vcpu initialization failed with -EINVAL. > > > > Disclaimer: I know nothing about libvirt (and no, I don't want to > > know! ;-). > > > > However, for things to be reliable, you need to taskset the whole QEMU > > process to the CPU type you intend to use. > > Yep, that's what I'm doing.Are you sure? The xml directive above seem to only apply to the vcpus, and no other QEMU thread.> > That's because, AFAICT, > > QEMU will snapshot the system registers outside of the vcpu threads, > > and attempt to use the result to configure the actual vcpu threads. If > > they happen to run on different CPU types, the sysregs will differ in > > incompatible ways and an error will be returned. This may or may not > > be a bug, I don't know (I see it as a feature). > > Then this brings another question. > > If we can pin each vCPU to each physical core (both little and big), > then as long as the registers are per-vCPU based, it should be able to > pass both big and little cores to the VM.Absolutely. But that's not how QEMU works. It assumes that it can restore the *same* registers to all the vcpus. Which of course doesn't work (we don't allow you to change MIDR_EL1, for a start).> Yeah, I totally understand this screw up the scheduling, but that's at > least what (some insane) users want (just like me).That's fine, we all have our own use cases.> > > > > If you are annoyed with this behaviour, you can always use a different > > VMM that won't care about such difference (crosvm or kvmtool, to name > > a few). > > Sounds pretty interesting, a new world but without libvirt... > > > However, the guest will be able to observe the migration from > > one cpu type to another. This may or may not affect your guest's > > behaviour. > > Not sure if it's possible to pin each vCPU thread to each core, but let > me try.Again: the problem isn't the vcpu threads, but the dummy VM that QEMU creates to snapshot the vcpu registers.> > I personally find the QEMU behaviour reasonable. KVM/arm64 make little > > effort to support BL virtualisation as design choice (I value my > > sanity), and userspace is still in control of the placement. > > > >>>>> This brings a problem, in theory RK3399 SoC should out-perform BCM2711 > >>>>> in multi-core performance, but if a VM can only be bind to either A72 or > >>>>> A55 cores, then the performance is no longer competitive against > >>>>> BCM2711, wasting the PCIE 2.0 x4 capacity. > > > > Vote with your money. If you too think that BL systems are utter crap, > > do not buy them! Or treat them as 'two systems in one', which is what > > I do. From that angle, this is of great value! ;-) > > I guess I'm setting my expectation too high for rk3399, just seeing its > multi-thread perf beating RPI4 and has better IO doesn't mean it's a > perfect fit for VM.I find my own rk3399 perfectly adequate with QEMU. HTH, M. -- Without deviation from the norm, progress is not possible.