David Hildenbrand
2018-Feb-07 22:26 UTC
Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running)
On 07.02.2018 16:31, Kashyap Chamarthy wrote:> [Cc: KVM upstream list.] > > On Tue, Feb 06, 2018 at 04:11:46PM +0100, Florian Haas wrote: >> Hi everyone, >> >> I hope this is the correct list to discuss this issue; please feel >> free to redirect me otherwise. >> >> I have a nested virtualization setup that looks as follows: >> >> - Host: Ubuntu 16.04, kernel 4.4.0 (an OpenStack Nova compute node) >> - L0 guest: openSUSE Leap 42.3, kernel 4.4.104-39-default >> - Nested guest: SLES 12, kernel 3.12.28-4-default >> >> The nested guest is configured with "<type arch='x86_64' >> machine='pc-i440fx-1.4'>hvm</type>". >> >> This is working just beautifully, except when the L0 guest wakes up >> from managed save (openstack server resume in OpenStack parlance). >> Then, in the L0 guest we immediately see this: > > [...] # Snip the call trace from Florian. It is here: > https://www.redhat.com/archives/libvirt-users/2018-February/msg00014.html > >> What does fix things, of course, is to switch from the nested guest >> from KVM to Qemu — but that also makes things significantly slower. >> >> So I'm wondering: is there someone reading this who does run nested >> KVM and has managed to successfully live-migrate or managed-save? If >> so, would you be able to share a working host kernel / L0 guest kernel >> / nested guest kernel combination, or any other hints for tuning the >> L0 guest to support managed save and live migration? > > Following up from our IRC discussion (on #kvm, Freenode). Re-posting my > comment here: > > So I just did a test of 'managedsave' (which is just "save the state of > the running VM to a file" in libvirt parlance) of L1, _while_ L2 is > running, and I seem to reproduce your case (see the call trace > attached). > > # Ensure L2 (the nested guest) is running on L1. Then, from L0, do > # the following: > [L0] $ virsh managedsave L1 > [L0] $ virsh start L1 --console > > Result: See the call trace attached to this bug. But L1 goes on to > start "fine", and L2 keeps running, too. But things start to seem > weird. As in: I try to safely, read-only mount the L2 disk image via > libguestfs (by setting export LIBGUESTFS_BACKEND=direct, which uses > direct QEMU): `guestfish --ro -a -i ./cirros.qcow2`. It throws the call > trace again on the L1 serial console. And the `guestfish` command just > sits there forever > > > - L0 (bare metal) Kernel: 4.13.13-300.fc27.x86_64+debug > - L1 (guest hypervisor) kernel: 4.11.10-300.fc26.x86_64 > - L2 is a CirrOS 3.5 image > > I can reproduce this at least 3 times, with the above versions. > > I'm using libvirt 'host-passthrough' for CPU (meaning: '-cpu host' in > QEMU parlance) for both L1 and L2. > > My L0 CPU is: Intel(R) Xeon(R) CPU E5-2609 v3 @ 1.90GHz. > > Thoughts?Sounds like a similar problem as in https://bugzilla.kernel.org/show_bug.cgi?id=198621 In short: there is no (live) migration support for nested VMX yet. So as soon as your guest is using VMX itself ("nVMX"), this is not expected to work. -- Thanks, David / dhildenb
Florian Haas
2018-Feb-08 08:19 UTC
Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running)
On Wed, Feb 7, 2018 at 11:26 PM, David Hildenbrand <david@redhat.com> wrote:> On 07.02.2018 16:31, Kashyap Chamarthy wrote: >> [Cc: KVM upstream list.] >> >> On Tue, Feb 06, 2018 at 04:11:46PM +0100, Florian Haas wrote: >>> Hi everyone, >>> >>> I hope this is the correct list to discuss this issue; please feel >>> free to redirect me otherwise. >>> >>> I have a nested virtualization setup that looks as follows: >>> >>> - Host: Ubuntu 16.04, kernel 4.4.0 (an OpenStack Nova compute node) >>> - L0 guest: openSUSE Leap 42.3, kernel 4.4.104-39-default >>> - Nested guest: SLES 12, kernel 3.12.28-4-default >>> >>> The nested guest is configured with "<type arch='x86_64' >>> machine='pc-i440fx-1.4'>hvm</type>". >>> >>> This is working just beautifully, except when the L0 guest wakes up >>> from managed save (openstack server resume in OpenStack parlance). >>> Then, in the L0 guest we immediately see this: >> >> [...] # Snip the call trace from Florian. It is here: >> https://www.redhat.com/archives/libvirt-users/2018-February/msg00014.html >> >>> What does fix things, of course, is to switch from the nested guest >>> from KVM to Qemu — but that also makes things significantly slower. >>> >>> So I'm wondering: is there someone reading this who does run nested >>> KVM and has managed to successfully live-migrate or managed-save? If >>> so, would you be able to share a working host kernel / L0 guest kernel >>> / nested guest kernel combination, or any other hints for tuning the >>> L0 guest to support managed save and live migration? >> >> Following up from our IRC discussion (on #kvm, Freenode). Re-posting my >> comment here: >> >> So I just did a test of 'managedsave' (which is just "save the state of >> the running VM to a file" in libvirt parlance) of L1, _while_ L2 is >> running, and I seem to reproduce your case (see the call trace >> attached). >> >> # Ensure L2 (the nested guest) is running on L1. Then, from L0, do >> # the following: >> [L0] $ virsh managedsave L1 >> [L0] $ virsh start L1 --console >> >> Result: See the call trace attached to this bug. But L1 goes on to >> start "fine", and L2 keeps running, too. But things start to seem >> weird. As in: I try to safely, read-only mount the L2 disk image via >> libguestfs (by setting export LIBGUESTFS_BACKEND=direct, which uses >> direct QEMU): `guestfish --ro -a -i ./cirros.qcow2`. It throws the call >> trace again on the L1 serial console. And the `guestfish` command just >> sits there forever >> >> >> - L0 (bare metal) Kernel: 4.13.13-300.fc27.x86_64+debug >> - L1 (guest hypervisor) kernel: 4.11.10-300.fc26.x86_64 >> - L2 is a CirrOS 3.5 image >> >> I can reproduce this at least 3 times, with the above versions. >> >> I'm using libvirt 'host-passthrough' for CPU (meaning: '-cpu host' in >> QEMU parlance) for both L1 and L2. >> >> My L0 CPU is: Intel(R) Xeon(R) CPU E5-2609 v3 @ 1.90GHz. >> >> Thoughts? > > Sounds like a similar problem as in > https://bugzilla.kernel.org/show_bug.cgi?id=198621 > > In short: there is no (live) migration support for nested VMX yet. So as > soon as your guest is using VMX itself ("nVMX"), this is not expected to > work.Hi David, thanks for getting back to us on this. I see your point, except the issue Kashyap and I are describing does not occur with live migration, it occurs with savevm/loadvm (virsh managedsave/virsh start in libvirt terms, nova suspend/resume in OpenStack lingo). And it's not immediately self-evident that the limitations for the former also apply to the latter. Even for the live migration limitation, I've been unsuccessful at finding documentation that warns users to not attempt live migration when using nesting, and this discussion sounds like a good opportunity for me to help fix that. Just to give an example, https://www.redhat.com/en/blog/inception-how-usable-are-nested-kvm-guests from just last September talks explicitly about how "guests can be snapshot/resumed, migrated to other hypervisors and much more" in the opening paragraph, and then talks at length about nested guests — without ever pointing out that those very features aren't expected to work for them. :) So to clarify things, could you enumerate the currently known limitations when enabling nesting? I'd be happy to summarize those and add them to the linux-kvm.org FAQ so others are less likely to hit their head on this issue. In particular: - Is https://fedoraproject.org/wiki/How_to_enable_nested_virtualization_in_KVM still accurate in that -cpu host (libvirt "host-passthrough") is the strongly recommended configuration for the L2 guest? - If so, are there any recommendations for how to configure the L1 guest with regard to CPU model? - Is live migration with nested guests _always_ expected to break on all architectures, and if not, which are safe? - Idem, for savevm/loadvm? - With regard to the problem that Kashyap and I (and Dennis, the kernel.org bugzilla reporter) are describing, is this expected to work any better on AMD CPUs? (All reports are on Intel) - Do you expect nested virtualization functionality to be adversely affected by KPTI and/or other Meltdown/Spectre mitigation patches? Kashyap, can you think of any other limitations that would benefit from improved documentation? Cheers, Florian
Kashyap Chamarthy
2018-Feb-08 10:46 UTC
Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running)
On Wed, Feb 07, 2018 at 11:26:14PM +0100, David Hildenbrand wrote:> On 07.02.2018 16:31, Kashyap Chamarthy wrote:[...]> Sounds like a similar problem as in > https://bugzilla.kernel.org/show_bug.cgi?id=198621 > > In short: there is no (live) migration support for nested VMX yet. So as > soon as your guest is using VMX itself ("nVMX"), this is not expected to > work.Actually, live migration with nVMX _does_ work insofar as you have _identical_ CPUs on both source and destination — i.e. use the QEMU '-cpu host' for the L1 guests. At least that's been the case in my experience. FWIW, I frequently use that setup in my test environments. Just to be quadruple sure, I did the test: Migrate an L2 guest (with non-shared storage), and it worked just fine. (No 'oops'es, no stack traces, no "kernel BUG" in `dmesg` or serial consoles on L1s. And I can login to the L2 guest on the destination L1 just fine.) Once you have the password-less SSH between source and destination, and a bit of libvirt config setup. I ran the migrate command as following: $ virsh migrate --verbose --copy-storage-all \ --live cvm1 qemu+tcp://root@f26-vm2/system Migration: [100 %] $ echo $? 0 Full details: https://kashyapc.fedorapeople.org/virt/Migrate-a-nested-guest-08Feb2018.txt (At the end of the document above, I also posted the libvirt config and the version details across L0, L1 and L2. So this is a fully repeatable test.) -- /kashyap
Kashyap Chamarthy
2018-Feb-08 11:34 UTC
Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running)
On Thu, Feb 08, 2018 at 11:46:24AM +0100, Kashyap Chamarthy wrote:> On Wed, Feb 07, 2018 at 11:26:14PM +0100, David Hildenbrand wrote: > > On 07.02.2018 16:31, Kashyap Chamarthy wrote: > > [...] > > > Sounds like a similar problem as in > > https://bugzilla.kernel.org/show_bug.cgi?id=198621 > > > > In short: there is no (live) migration support for nested VMX yet. So as > > soon as your guest is using VMX itself ("nVMX"), this is not expected to > > work. > > Actually, live migration with nVMX _does_ work insofar as you have > _identical_ CPUs on both source and destination — i.e. use the QEMU > '-cpu host' for the L1 guests. At least that's been the case in my > experience. FWIW, I frequently use that setup in my test environments.Correcting my erroneous statement above: For live migration to work in a nested KVM setup, it is _not_ mandatory to use "-cpu host". I just did another test. Here I used libvirt's 'host-model' for both source and destination L1 guests, _and_ for L2 guest. Migrated the L2 to destination L1, worked great. In my setup, both my L1 guests recieved the following CPU configuration (in QEMU command-line): [...] -cpu Haswell-noTSX,vme=on,ss=on,vmx=on,f16c=on,rdrand=on,\ hypervisor=on,arat=on,tsc_adjust=on,xsaveopt=on,pdpe1gb=on,abm=on,aes=off [...] And the L2 guest recieved this: [...] -cpu Haswell-noTSX,vme=on,ss=on,f16c=on,rdrand=on,hypervisor=on,\ arat=on,tsc_adjust=on,xsaveopt=on,pdpe1gb=on,abm=on,aes=off,invpcid=off [...] -- /kashyap
David Hildenbrand
2018-Feb-08 11:48 UTC
Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running)
On 08.02.2018 11:46, Kashyap Chamarthy wrote:> On Wed, Feb 07, 2018 at 11:26:14PM +0100, David Hildenbrand wrote: >> On 07.02.2018 16:31, Kashyap Chamarthy wrote: > > [...] > >> Sounds like a similar problem as in >> https://bugzilla.kernel.org/show_bug.cgi?id=198621 >> >> In short: there is no (live) migration support for nested VMX yet. So as >> soon as your guest is using VMX itself ("nVMX"), this is not expected to >> work. > > Actually, live migration with nVMX _does_ work insofar as you have > _identical_ CPUs on both source and destination — i.e. use the QEMU > '-cpu host' for the L1 guests. At least that's been the case in my > experience. FWIW, I frequently use that setup in my test environments. >Your mixing use cases. While you talk about migrating a L2, this is about migrating an L1, running L2. Migrating an L2 is expected to work just like when migrating an L1, not running L2. (of course, the usual trouble with CPU models, but upper layers should check and handle that).> Just to be quadruple sure, I did the test: Migrate an L2 guest (with > non-shared storage), and it worked just fine. (No 'oops'es, no stack > traces, no "kernel BUG" in `dmesg` or serial consoles on L1s. And I can > login to the L2 guest on the destination L1 just fine.) > > Once you have the password-less SSH between source and destination, and > a bit of libvirt config setup. I ran the migrate command as following: > > $ virsh migrate --verbose --copy-storage-all \ > --live cvm1 qemu+tcp://root@f26-vm2/system > Migration: [100 %] > $ echo $? > 0 > > Full details: > https://kashyapc.fedorapeople.org/virt/Migrate-a-nested-guest-08Feb2018.txt > > (At the end of the document above, I also posted the libvirt config and > the version details across L0, L1 and L2. So this is a fully repeatable > test.) > >-- Thanks, David / dhildenb
David Hildenbrand
2018-Feb-08 12:07 UTC
Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running)
>> In short: there is no (live) migration support for nested VMX yet. So as >> soon as your guest is using VMX itself ("nVMX"), this is not expected to >> work. > > Hi David, thanks for getting back to us on this.Hi Florian, (sombeody please correct me if I'm wrong)> > I see your point, except the issue Kashyap and I are describing does > not occur with live migration, it occurs with savevm/loadvm (virsh > managedsave/virsh start in libvirt terms, nova suspend/resume in > OpenStack lingo). And it's not immediately self-evident that the > limitations for the former also apply to the latter. Even for the live > migration limitation, I've been unsuccessful at finding documentation > that warns users to not attempt live migration when using nesting, and > this discussion sounds like a good opportunity for me to help fix > that. > > Just to give an example, > https://www.redhat.com/en/blog/inception-how-usable-are-nested-kvm-guests > from just last September talks explicitly about how "guests can be > snapshot/resumed, migrated to other hypervisors and much more" in the > opening paragraph, and then talks at length about nested guests — > without ever pointing out that those very features aren't expected to > work for them. :)Well, it still is a kernel parameter "nested" that is disabled by default. So things should be expected to be shaky. :) While running nested guests work usually fine, migrating a nested hypervisor is the problem. Especially see e.g. https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/virtualization_deployment_and_administration_guide/nested_virt "However, note that nested virtualization is not supported or recommended in production user environments, and is primarily intended for development and testing. "> > So to clarify things, could you enumerate the currently known > limitations when enabling nesting? I'd be happy to summarize those and > add them to the linux-kvm.org FAQ so others are less likely to hit > their head on this issue. In particular:The general problem is that migration of an L1 will not work when it is running L2, so when L1 is using VMX ("nVMX"). Migrating an L2 should work as before. The problem is, in order for L1 to make use of VMX to run L2, we have to run L2 in L0, simulating VMX -> nested VMX a.k.a. nVMX . This requires additional state information about L1 ("nVMX" state), which is not properly migrated when migrating L1. Therefore, after migration, the CPU state of L1 might be screwed up after migration, resulting in L1 crashes. In addition, certain VMX features might be missing on the target, which also still has to be handled via the CPU model in the future. L0, should hopefully not crash, I hope that you are not seeing that.> > - Is https://fedoraproject.org/wiki/How_to_enable_nested_virtualization_in_KVM > still accurate in that -cpu host (libvirt "host-passthrough") is the > strongly recommended configuration for the L2 guest? > > - If so, are there any recommendations for how to configure the L1 > guest with regard to CPU model?You have to indicate the VMX feature to your L1 ("nested hypervisor"), that is usually automatically done by using the "host-passthrough" or "host-model" value. If you're using a custom CPU model, you have to enable it explicitly.> > - Is live migration with nested guests _always_ expected to break on > all architectures, and if not, which are safe?x86 VMX: running nested guests works, migrating nested hypervisors does not work x86 SVM: running nested guests works, migrating nested hypervisor does not work (somebody correct me if I'm wrong) s390x: running nested guests works, migrating nested hypervisors works power: running nested guests works only via KVM-PR ("trap and emulate"). migrating nested hypervisors therefore works. But we are not using hardware virtualization for L1->L2. (my latest status) arm: running nested guests is in the works (my latest status), migration is therefore also not possible.> > - Idem, for savevm/loadvm? >savevm/loadvm is not expected to work correctly on an L1 if it is running L2 guests. It should work on L2 however.> - With regard to the problem that Kashyap and I (and Dennis, the > kernel.org bugzilla reporter) are describing, is this expected to work > any better on AMD CPUs? (All reports are on Intel)No, remeber that they are also still missing migration support of the nested SVM state.> > - Do you expect nested virtualization functionality to be adversely > affected by KPTI and/or other Meltdown/Spectre mitigation patches?Not an expert on this. I think it should be affected in a similar way as ordinary guests :)> > Kashyap, can you think of any other limitations that would benefit > from improved documentation?We should certainly document what I have summaries here properly at a central palce!> > Cheers, > Florian >-- Thanks, David / dhildenb
Possibly Parallel Threads
- Re: Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running)
- Re: Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running)
- Re: Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running)
- Re: Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running)
- Re: Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running)