thr3ads.net - libvirt users - Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running) [Feb 2018]

If this information is useful, please help other people find it:
Share via:

David Hildenbrand

2018-Feb-08 13:47 UTC

Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running)

> Sure, I do understand that Red Hat (or any other vendor) is taking no
> support responsibility for this. At this point I'd just like to
> contribute to a better understanding of what's expected to definitely
> _not_ work, so that people don't bloody their noses on that. :)
Indeed. nesting is nice to enable as it works in 99% of all cases. It
just doesn't work when trying to migrate a nested hypervisor. (on x86)

That's what most people don't realize, as it works "just fine"
for 99%
of all use cases.

[...]>>
>> savevm/loadvm is not expected to work correctly on an L1 if it is
>> running L2 guests. It should work on L2 however.
> 
> Again, I'm somewhat struggling to understand this vs. live migration —
> but it's entirely possible that I'm sorely lacking in my knowledge
of
> kernel and CPU internals.
(savevm/loadvm is also called "migration to file")

When we migrate to a file, it really is the same migration stream. You
"dump" the VM state into a file, instead of sending it over to another
(running) target.

Once you load your VM state from that file, it is a completely fresh
VM/KVM environment. So you have to restore all the state. Now, as nVMX
state is not contained in the migration stream, you cannot restore that
state. The L1 state is therefore "damaged" or incomplete.

[...]
>>> Kashyap, can you think of any other limitations that would benefit
>>> from improved documentation?
>>
>> We should certainly document what I have summaries here properly at a
>> central palce!
> 
> I tried getting registered on the linux-kvm.org wiki to do exactly
> that, and ran into an SMTP/DNS configuration issue with the
> verification email. Kashyap said he was going to poke the site admin
> about that.
> 
> Now, here's a bit more information on my continued testing. As I
> mentioned on IRC, one of the things that struck me as odd was that if
> I ran into the issue previously described, the L1 guest would enter a
> reboot loop if configured with kernel.panic_on_oops=1. In other words,
> I would savevm the L1 guest (with a running L2), then loadvm it, and
> then the L1 would stack-trace, reboot, and then keep doing that
> indefinitely. I found that weird because on the second reboot, I would
> expect the system to come up cleanly.
Guess the L1 state (in the kernel) is broken that hard, that even a
reset cannot fix it.
> 
> I've now changed my L2 guest's CPU configuration so that libvirt
(in
> L1) starts the L2 guest with the following settings:
> 
> <cpu>
>     <model fallback='forbid'>Haswell-noTSX</model>
>     <vendor>Intel</vendor>
>     <feature policy='disable' name='vme'/>
>     <feature policy='disable' name='ss'/>
>     <feature policy='disable' name='f16c'/>
>     <feature policy='disable' name='rdrand'/>
>     <feature policy='disable' name='hypervisor'/>
>     <feature policy='disable' name='arat'/>
>     <feature policy='disable' name='tsc_adjust'/>
>     <feature policy='disable' name='xsaveopt'/>
>     <feature policy='disable' name='abm'/>
>     <feature policy='disable' name='aes'/>
>     <feature policy='disable' name='invpcid'/>
> </cpu>
Maybe one of these features is the root cause of the "messed up" state
in KVM. So disabling it also makes the L1 state "less broken".
> 
> Basically, I am disabling every single feature that my L1's "virsh
> capabilities" reports. Now this does not make my L1 come up happily
> from loadvm. But it does seem to initiate a clean reboot after loadvm,
> and after that clean reboot it lives happily.
> 
> If this is as good as it gets (for now), then I can totally live with
> that. It certainly beats running the L2 guest with Qemu (without KVM
> acceleration). But I would still love to understand the issue a little
> bit better.
I mean the real solution to the problem is of course restoring the L1
state correctly (migrating nVMX state, what people are working on right
now). So what you are seeing is a bad "side effect" of that.

For now, nested=true should never be used along with savevm/loadvm/live
migration
> 
> Cheers,
> Florian
> 

-- 

Thanks,

David / dhildenb

Florian Haas

2018-Feb-08 13:57 UTC

head link

Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running)

On Thu, Feb 8, 2018 at 2:47 PM, David Hildenbrand <david@redhat.com>
wrote:>> Again, I'm somewhat struggling to understand this vs. live
migration —
>> but it's entirely possible that I'm sorely lacking in my
knowledge of
>> kernel and CPU internals.
>
> (savevm/loadvm is also called "migration to file")
>
> When we migrate to a file, it really is the same migration stream. You
> "dump" the VM state into a file, instead of sending it over to
another
> (running) target.
>
> Once you load your VM state from that file, it is a completely fresh
> VM/KVM environment. So you have to restore all the state. Now, as nVMX
> state is not contained in the migration stream, you cannot restore that
> state. The L1 state is therefore "damaged" or incomplete.
*lightbulb* Thanks a lot, that's a perfectly logical explanation. :)
>> Now, here's a bit more information on my continued testing. As I
>> mentioned on IRC, one of the things that struck me as odd was that if
>> I ran into the issue previously described, the L1 guest would enter a
>> reboot loop if configured with kernel.panic_on_oops=1. In other words,
>> I would savevm the L1 guest (with a running L2), then loadvm it, and
>> then the L1 would stack-trace, reboot, and then keep doing that
>> indefinitely. I found that weird because on the second reboot, I would
>> expect the system to come up cleanly.
>
> Guess the L1 state (in the kernel) is broken that hard, that even a
> reset cannot fix it.
... which would also explain that in contrast to that, a virsh
destroy/virsh start cycle does fix things.
>> I've now changed my L2 guest's CPU configuration so that
libvirt (in
>> L1) starts the L2 guest with the following settings:
>>
>> <cpu>
>>     <model fallback='forbid'>Haswell-noTSX</model>
>>     <vendor>Intel</vendor>
>>     <feature policy='disable' name='vme'/>
>>     <feature policy='disable' name='ss'/>
>>     <feature policy='disable' name='f16c'/>
>>     <feature policy='disable' name='rdrand'/>
>>     <feature policy='disable' name='hypervisor'/>
>>     <feature policy='disable' name='arat'/>
>>     <feature policy='disable' name='tsc_adjust'/>
>>     <feature policy='disable' name='xsaveopt'/>
>>     <feature policy='disable' name='abm'/>
>>     <feature policy='disable' name='aes'/>
>>     <feature policy='disable' name='invpcid'/>
>> </cpu>
>
> Maybe one of these features is the root cause of the "messed up"
state
> in KVM. So disabling it also makes the L1 state "less broken".
Would you try a guess as to which of the above features is a likely culprit?
>> Basically, I am disabling every single feature that my L1's
"virsh
>> capabilities" reports. Now this does not make my L1 come up
happily
>> from loadvm. But it does seem to initiate a clean reboot after loadvm,
>> and after that clean reboot it lives happily.
>>
>> If this is as good as it gets (for now), then I can totally live with
>> that. It certainly beats running the L2 guest with Qemu (without KVM
>> acceleration). But I would still love to understand the issue a little
>> bit better.
>
> I mean the real solution to the problem is of course restoring the L1
> state correctly (migrating nVMX state, what people are working on right
> now). So what you are seeing is a bad "side effect" of that.
>
> For now, nested=true should never be used along with savevm/loadvm/live
> migration.
Yes, I gathered as much. :) Thanks again!

Cheers,
Florian

David Hildenbrand

2018-Feb-08 14:55 UTC

head link

Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running)

>>> I've now changed my L2 guest's CPU configuration so that
libvirt (in
>>> L1) starts the L2 guest with the following settings:
>>>
>>> <cpu>
>>>     <model
fallback='forbid'>Haswell-noTSX</model>
>>>     <vendor>Intel</vendor>
>>>     <feature policy='disable' name='vme'/>
>>>     <feature policy='disable' name='ss'/>
>>>     <feature policy='disable' name='f16c'/>
>>>     <feature policy='disable' name='rdrand'/>
>>>     <feature policy='disable'
name='hypervisor'/>
>>>     <feature policy='disable' name='arat'/>
>>>     <feature policy='disable'
name='tsc_adjust'/>
>>>     <feature policy='disable'
name='xsaveopt'/>
>>>     <feature policy='disable' name='abm'/>
>>>     <feature policy='disable' name='aes'/>
>>>     <feature policy='disable'
name='invpcid'/>
>>> </cpu>
>>
>> Maybe one of these features is the root cause of the "messed
up" state
>> in KVM. So disabling it also makes the L1 state "less
broken".
> 
> Would you try a guess as to which of the above features is a likely
culprit?
> 
Hmm, actually no idea, but you can bisect :)

(but watch out, it could also just be "coincidence". Especially if you
migrate while all VCPUs of L1 are currently not executing L2, chances
might be better for L1 to survive a migration - L2 will still fail hard,
and L1 certainly, too when trying to run L2 again)

-- 

Thanks,

David / dhildenb

Daniel P. Berrangé

2018-Feb-08 14:59 UTC

head link

Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running)

On Thu, Feb 08, 2018 at 02:47:26PM +0100, David Hildenbrand
wrote:> > Sure, I do understand that Red Hat (or any other vendor) is taking no
> > support responsibility for this. At this point I'd just like to
> > contribute to a better understanding of what's expected to
definitely
> > _not_ work, so that people don't bloody their noses on that. :)
> 
> Indeed. nesting is nice to enable as it works in 99% of all cases. It
> just doesn't work when trying to migrate a nested hypervisor. (on x86)
Hmm, if migration of the L1 is going to cause things to crash and
burn, then ideally libvirt on L0 would block the migration from being
done.

Naively we could do that if the guest has vmx or svm features in its
CPU, except that's probably way too conservative as many guests with
those features won't actually do any nested VMs.  It would also be
desirable to still be able to migrate the L1, if no L2s are running
currently.

Is there any way QEMU can expose whether there's any L2s activated
to libvirt, so we can prevent migration in that case ? Or should
QEMU itself refuse to start migration perhaps ?

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

David Hildenbrand

2018-Feb-08 15:11 UTC

head link

Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running)

On 08.02.2018 15:59, Daniel P. Berrangé wrote:> On Thu, Feb 08, 2018 at 02:47:26PM +0100, David Hildenbrand wrote:
>>> Sure, I do understand that Red Hat (or any other vendor) is taking
no
>>> support responsibility for this. At this point I'd just like to
>>> contribute to a better understanding of what's expected to
definitely
>>> _not_ work, so that people don't bloody their noses on that. :)
>>
>> Indeed. nesting is nice to enable as it works in 99% of all cases. It
>> just doesn't work when trying to migrate a nested hypervisor. (on
x86)
> 
> Hmm, if migration of the L1 is going to cause things to crash and
> burn, then ideally libvirt on L0 would block the migration from being
> done.
Yes, in an ideal world. Usually we assume that people that turn on
experimental features ("nested=true") are aware of what the
implications
are. The main problem is that the implications are not really documented :)

Once, with new KVM _and_ new QEMU it will eventually be supported.
> 
> Naively we could do that if the guest has vmx or svm features in its
> CPU, except that's probably way too conservative as many guests with
> those features won't actually do any nested VMs.  It would also be
> desirable to still be able to migrate the L1, if no L2s are running
> currently.
No using CPU feature flags for that purpose on the libvirt level is no
good, and especially once we support migration we would have to find
another interface to tell "but it is now working".

QEMU could try to warn the user if VMX is enabled in the CPU model, but
as you said, that might also hold true for guests that don't use nVMX.

On the other hand, VMX will only pop up as a valid feature if
nested=true is set. So the amount of affected users is minimal.

So we could e.g. abort migration on the QEMU level if VMX is specified
right now. Once we have the migration support in place, we can allow it
again.
> 
> Is there any way QEMU can expose whether there's any L2s activated
> to libvirt, so we can prevent migration in that case ? Or should
> QEMU itself refuse to start migration perhaps ?
Not without another kernel interface.

But I am no expert on that matter. Maybe there would be an easy way to
block that I just don't see right now.
> 
> 
> Regards,
> Daniel
> 

-- 

Thanks,

David / dhildenb

Apparently Analagous Threads

Search for more maybe matching threads

libvirt users - Feb 2018 - Re: Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running)

Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running)

Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running)

Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running)

Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running)

Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running)

Apparently Analagous Threads