David Hildenbrand
2018-Feb-08 13:47 UTC
Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running)
> Sure, I do understand that Red Hat (or any other vendor) is taking no > support responsibility for this. At this point I'd just like to > contribute to a better understanding of what's expected to definitely > _not_ work, so that people don't bloody their noses on that. :)Indeed. nesting is nice to enable as it works in 99% of all cases. It just doesn't work when trying to migrate a nested hypervisor. (on x86) That's what most people don't realize, as it works "just fine" for 99% of all use cases. [...]>> >> savevm/loadvm is not expected to work correctly on an L1 if it is >> running L2 guests. It should work on L2 however. > > Again, I'm somewhat struggling to understand this vs. live migration — > but it's entirely possible that I'm sorely lacking in my knowledge of > kernel and CPU internals.(savevm/loadvm is also called "migration to file") When we migrate to a file, it really is the same migration stream. You "dump" the VM state into a file, instead of sending it over to another (running) target. Once you load your VM state from that file, it is a completely fresh VM/KVM environment. So you have to restore all the state. Now, as nVMX state is not contained in the migration stream, you cannot restore that state. The L1 state is therefore "damaged" or incomplete. [...]>>> Kashyap, can you think of any other limitations that would benefit >>> from improved documentation? >> >> We should certainly document what I have summaries here properly at a >> central palce! > > I tried getting registered on the linux-kvm.org wiki to do exactly > that, and ran into an SMTP/DNS configuration issue with the > verification email. Kashyap said he was going to poke the site admin > about that. > > Now, here's a bit more information on my continued testing. As I > mentioned on IRC, one of the things that struck me as odd was that if > I ran into the issue previously described, the L1 guest would enter a > reboot loop if configured with kernel.panic_on_oops=1. In other words, > I would savevm the L1 guest (with a running L2), then loadvm it, and > then the L1 would stack-trace, reboot, and then keep doing that > indefinitely. I found that weird because on the second reboot, I would > expect the system to come up cleanly.Guess the L1 state (in the kernel) is broken that hard, that even a reset cannot fix it.> > I've now changed my L2 guest's CPU configuration so that libvirt (in > L1) starts the L2 guest with the following settings: > > <cpu> > <model fallback='forbid'>Haswell-noTSX</model> > <vendor>Intel</vendor> > <feature policy='disable' name='vme'/> > <feature policy='disable' name='ss'/> > <feature policy='disable' name='f16c'/> > <feature policy='disable' name='rdrand'/> > <feature policy='disable' name='hypervisor'/> > <feature policy='disable' name='arat'/> > <feature policy='disable' name='tsc_adjust'/> > <feature policy='disable' name='xsaveopt'/> > <feature policy='disable' name='abm'/> > <feature policy='disable' name='aes'/> > <feature policy='disable' name='invpcid'/> > </cpu>Maybe one of these features is the root cause of the "messed up" state in KVM. So disabling it also makes the L1 state "less broken".> > Basically, I am disabling every single feature that my L1's "virsh > capabilities" reports. Now this does not make my L1 come up happily > from loadvm. But it does seem to initiate a clean reboot after loadvm, > and after that clean reboot it lives happily. > > If this is as good as it gets (for now), then I can totally live with > that. It certainly beats running the L2 guest with Qemu (without KVM > acceleration). But I would still love to understand the issue a little > bit better.I mean the real solution to the problem is of course restoring the L1 state correctly (migrating nVMX state, what people are working on right now). So what you are seeing is a bad "side effect" of that. For now, nested=true should never be used along with savevm/loadvm/live migration> > Cheers, > Florian >-- Thanks, David / dhildenb
Florian Haas
2018-Feb-08 13:57 UTC
Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running)
On Thu, Feb 8, 2018 at 2:47 PM, David Hildenbrand <david@redhat.com> wrote:>> Again, I'm somewhat struggling to understand this vs. live migration — >> but it's entirely possible that I'm sorely lacking in my knowledge of >> kernel and CPU internals. > > (savevm/loadvm is also called "migration to file") > > When we migrate to a file, it really is the same migration stream. You > "dump" the VM state into a file, instead of sending it over to another > (running) target. > > Once you load your VM state from that file, it is a completely fresh > VM/KVM environment. So you have to restore all the state. Now, as nVMX > state is not contained in the migration stream, you cannot restore that > state. The L1 state is therefore "damaged" or incomplete.*lightbulb* Thanks a lot, that's a perfectly logical explanation. :)>> Now, here's a bit more information on my continued testing. As I >> mentioned on IRC, one of the things that struck me as odd was that if >> I ran into the issue previously described, the L1 guest would enter a >> reboot loop if configured with kernel.panic_on_oops=1. In other words, >> I would savevm the L1 guest (with a running L2), then loadvm it, and >> then the L1 would stack-trace, reboot, and then keep doing that >> indefinitely. I found that weird because on the second reboot, I would >> expect the system to come up cleanly. > > Guess the L1 state (in the kernel) is broken that hard, that even a > reset cannot fix it.... which would also explain that in contrast to that, a virsh destroy/virsh start cycle does fix things.>> I've now changed my L2 guest's CPU configuration so that libvirt (in >> L1) starts the L2 guest with the following settings: >> >> <cpu> >> <model fallback='forbid'>Haswell-noTSX</model> >> <vendor>Intel</vendor> >> <feature policy='disable' name='vme'/> >> <feature policy='disable' name='ss'/> >> <feature policy='disable' name='f16c'/> >> <feature policy='disable' name='rdrand'/> >> <feature policy='disable' name='hypervisor'/> >> <feature policy='disable' name='arat'/> >> <feature policy='disable' name='tsc_adjust'/> >> <feature policy='disable' name='xsaveopt'/> >> <feature policy='disable' name='abm'/> >> <feature policy='disable' name='aes'/> >> <feature policy='disable' name='invpcid'/> >> </cpu> > > Maybe one of these features is the root cause of the "messed up" state > in KVM. So disabling it also makes the L1 state "less broken".Would you try a guess as to which of the above features is a likely culprit?>> Basically, I am disabling every single feature that my L1's "virsh >> capabilities" reports. Now this does not make my L1 come up happily >> from loadvm. But it does seem to initiate a clean reboot after loadvm, >> and after that clean reboot it lives happily. >> >> If this is as good as it gets (for now), then I can totally live with >> that. It certainly beats running the L2 guest with Qemu (without KVM >> acceleration). But I would still love to understand the issue a little >> bit better. > > I mean the real solution to the problem is of course restoring the L1 > state correctly (migrating nVMX state, what people are working on right > now). So what you are seeing is a bad "side effect" of that. > > For now, nested=true should never be used along with savevm/loadvm/live > migration.Yes, I gathered as much. :) Thanks again! Cheers, Florian
David Hildenbrand
2018-Feb-08 14:55 UTC
Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running)
>>> I've now changed my L2 guest's CPU configuration so that libvirt (in >>> L1) starts the L2 guest with the following settings: >>> >>> <cpu> >>> <model fallback='forbid'>Haswell-noTSX</model> >>> <vendor>Intel</vendor> >>> <feature policy='disable' name='vme'/> >>> <feature policy='disable' name='ss'/> >>> <feature policy='disable' name='f16c'/> >>> <feature policy='disable' name='rdrand'/> >>> <feature policy='disable' name='hypervisor'/> >>> <feature policy='disable' name='arat'/> >>> <feature policy='disable' name='tsc_adjust'/> >>> <feature policy='disable' name='xsaveopt'/> >>> <feature policy='disable' name='abm'/> >>> <feature policy='disable' name='aes'/> >>> <feature policy='disable' name='invpcid'/> >>> </cpu> >> >> Maybe one of these features is the root cause of the "messed up" state >> in KVM. So disabling it also makes the L1 state "less broken". > > Would you try a guess as to which of the above features is a likely culprit? >Hmm, actually no idea, but you can bisect :) (but watch out, it could also just be "coincidence". Especially if you migrate while all VCPUs of L1 are currently not executing L2, chances might be better for L1 to survive a migration - L2 will still fail hard, and L1 certainly, too when trying to run L2 again) -- Thanks, David / dhildenb
Daniel P. Berrangé
2018-Feb-08 14:59 UTC
Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running)
On Thu, Feb 08, 2018 at 02:47:26PM +0100, David Hildenbrand wrote:> > Sure, I do understand that Red Hat (or any other vendor) is taking no > > support responsibility for this. At this point I'd just like to > > contribute to a better understanding of what's expected to definitely > > _not_ work, so that people don't bloody their noses on that. :) > > Indeed. nesting is nice to enable as it works in 99% of all cases. It > just doesn't work when trying to migrate a nested hypervisor. (on x86)Hmm, if migration of the L1 is going to cause things to crash and burn, then ideally libvirt on L0 would block the migration from being done. Naively we could do that if the guest has vmx or svm features in its CPU, except that's probably way too conservative as many guests with those features won't actually do any nested VMs. It would also be desirable to still be able to migrate the L1, if no L2s are running currently. Is there any way QEMU can expose whether there's any L2s activated to libvirt, so we can prevent migration in that case ? Or should QEMU itself refuse to start migration perhaps ? Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
David Hildenbrand
2018-Feb-08 15:11 UTC
Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running)
On 08.02.2018 15:59, Daniel P. Berrangé wrote:> On Thu, Feb 08, 2018 at 02:47:26PM +0100, David Hildenbrand wrote: >>> Sure, I do understand that Red Hat (or any other vendor) is taking no >>> support responsibility for this. At this point I'd just like to >>> contribute to a better understanding of what's expected to definitely >>> _not_ work, so that people don't bloody their noses on that. :) >> >> Indeed. nesting is nice to enable as it works in 99% of all cases. It >> just doesn't work when trying to migrate a nested hypervisor. (on x86) > > Hmm, if migration of the L1 is going to cause things to crash and > burn, then ideally libvirt on L0 would block the migration from being > done.Yes, in an ideal world. Usually we assume that people that turn on experimental features ("nested=true") are aware of what the implications are. The main problem is that the implications are not really documented :) Once, with new KVM _and_ new QEMU it will eventually be supported.> > Naively we could do that if the guest has vmx or svm features in its > CPU, except that's probably way too conservative as many guests with > those features won't actually do any nested VMs. It would also be > desirable to still be able to migrate the L1, if no L2s are running > currently.No using CPU feature flags for that purpose on the libvirt level is no good, and especially once we support migration we would have to find another interface to tell "but it is now working". QEMU could try to warn the user if VMX is enabled in the CPU model, but as you said, that might also hold true for guests that don't use nVMX. On the other hand, VMX will only pop up as a valid feature if nested=true is set. So the amount of affected users is minimal. So we could e.g. abort migration on the QEMU level if VMX is specified right now. Once we have the migration support in place, we can allow it again.> > Is there any way QEMU can expose whether there's any L2s activated > to libvirt, so we can prevent migration in that case ? Or should > QEMU itself refuse to start migration perhaps ?Not without another kernel interface. But I am no expert on that matter. Maybe there would be an easy way to block that I just don't see right now.> > > Regards, > Daniel >-- Thanks, David / dhildenb
Possibly Parallel Threads
- Re: Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running)
- Re: Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running)
- Re: Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running)
- Re: Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running)
- Re: Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running)