thr3ads.net - Xen devel - x86/AMD: Nested hvm crashes in 4.3 [Jun 2013]

If this information is useful, please help other people find it:
Share via:

Suravee Suthikulanit

2013-Jun-27 00:24 UTC

x86/AMD: Nested hvm crashes in 4.3

Hi Jan,

I have found an issue in where the system crash right when I start 
another HVM guest inside an HVM guest.  I have traced back to the patch 
which the issue started.

commit f1bde87fc08ce8c818a1640a8fe4765d48923091
Author: Jan Beulich <jbeulich@suse.com>
Date:   Fri Feb 8 11:06:04 2013 +0100

     x86: debugging code for testing 16Tb support on smaller memory systems

     Signed-off-by: Jan Beulich <jbeulich@suse.com>
     Acked-by: Keir Fraser <keir@xen.org>

The issue doesn''t reproduce when starting a PV (L2) guest inside an HVM
(L1) guest.

Suravee

PS: The L2 Xen is running Xen-4.3, but I think the issue is at the L1 
Xen since it crashes the system.

Jan Beulich

2013-Jun-27 08:22 UTC

head link

Re: x86/AMD: Nested hvm crashes in 4.3

>>> On 27.06.13 at 02:24, Suravee Suthikulanit
<suravee.suthikulpanit@amd.com> wrote:
> I have found an issue in where the system crash right when I start 
> another HVM guest inside an HVM guest.  I have traced back to the patch 
> which the issue started.
> 
> commit f1bde87fc08ce8c818a1640a8fe4765d48923091
> Author: Jan Beulich <jbeulich@suse.com>
> Date:   Fri Feb 8 11:06:04 2013 +0100
> 
>      x86: debugging code for testing 16Tb support on smaller memory systems
> 
>      Signed-off-by: Jan Beulich <jbeulich@suse.com>
>      Acked-by: Keir Fraser <keir@xen.org>
We had issues exposed by this patch before, but any such issue
would just have been masked before that patch (and would
surface on a system with more than 5Tb of memory anyway). So
it is very unlikely for the patch itself to be at fault.

Furthermore, the crash them supposedly is because of the code
added conditional upon NDEBUG, and hence would (on a smaller
memory system) otherwise not surface at all for a production
(debug=n) build.
> The issue doesn''t reproduce when starting a PV (L2) guest inside
an HVM
> (L1) guest.
"Does not" or "does"? In the former case - what is this
supposed to
tell me?

In any case - without you sharing technical details (register/stack
dump of the crash at the very least) I don''t think I have anything
at hand to look for possible problems.

Jan

Suravee Suthikulpanit

2013-Jun-27 09:20 UTC

head link

Re: x86/AMD: Nested hvm crashes in 4.3

On 6/27/2013 3:22 AM, Jan Beulich wrote:>>>> On 27.06.13 at 02:24, Suravee Suthikulanit
<suravee.suthikulpanit@amd.com> wrote:
>> I have found an issue in where the system crash right when I start
>> another HVM guest inside an HVM guest.  I have traced back to the patch
>> which the issue started.
>>
>> commit f1bde87fc08ce8c818a1640a8fe4765d48923091
>> Author: Jan Beulich <jbeulich@suse.com>
>> Date:   Fri Feb 8 11:06:04 2013 +0100
>>
>>       x86: debugging code for testing 16Tb support on smaller memory
systems
>>
>>       Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>       Acked-by: Keir Fraser <keir@xen.org>
> We had issues exposed by this patch before, but any such issue
> would just have been masked before that patch (and would
> surface on a system with more than 5Tb of memory anyway).
The system I am having the issue has 48GB of memory.
> So it is very unlikely for the patch itself to be at fault.
I have traced the issue and found that the system crashing starts from this
commit id and onward.
(i.e. The system does not crash with commit id
ed759d20249197cf87b338ff0ed328052ca3b8e7)
So, I am still believe that this patch has somehow triggered the issue.
> Furthermore, the crash them supposedly is because of the code
> added conditional upon NDEBUG, and hence would (on a smaller
> memory system) otherwise not surface at all for a production
> (debug=n) build.
>
>> The issue doesn''t reproduce when starting a PV (L2) guest
inside an HVM
>> (L1) guest.
> "Does not" or "does"? In the former case - what is this
supposed to
> tell me?
What I am trying to say here is that the system_does not_ crash when starting
the PV guest as level 2 guest.
This is meant to be another data point to help analyzing the issue.
> In any case - without you sharing technical details (register/stack
> dump of the crash at the very least) I don''t think I have anything
> at hand to look for possible problems.
At this point, I am just reporting the issue.  I have not been able to get the
crash dump because the system immediately reboot.
I''ll try to boot Xen with "noreboot" and inspect the log for
more clues. Any suggestions are welcome.

Thank you,

Suravee
>
> Jan
>
>

Egger, Christoph

2013-Jun-27 09:50 UTC

head link

Re: x86/AMD: Nested hvm crashes in 4.3

Running a PV guest as L2 guest makes no difference how the p2m
code is used in L0 Xen (L0 == OS that runs on bare metal hardware).

I assume you use the default settings which means you use NPT-on-NPT.
Try shadow-on-npt. You can do this with

  cpuid="host,svm_npt=0"

in the guest config file. Then in the L1 guest you should see
the NPT svm feature bit not available.
Then launch a l2 guest and check if it still crashes.

I agree with Jan: Please provide the crash logs he requested.

Christoph

On 27.06.13 11:20, Suravee Suthikulpanit wrote:> On 6/27/2013 3:22 AM, Jan Beulich wrote:
>>>>> On 27.06.13 at 02:24, Suravee Suthikulanit
>>>>> <suravee.suthikulpanit@amd.com> wrote:
>>> I have found an issue in where the system crash right when I start
>>> another HVM guest inside an HVM guest.  I have traced back to the
patch
>>> which the issue started.
>>>
>>> commit f1bde87fc08ce8c818a1640a8fe4765d48923091
>>> Author: Jan Beulich <jbeulich@suse.com>
>>> Date:   Fri Feb 8 11:06:04 2013 +0100
>>>
>>>       x86: debugging code for testing 16Tb support on smaller
memory
>>> systems
>>>
>>>       Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>       Acked-by: Keir Fraser <keir@xen.org>
>> We had issues exposed by this patch before, but any such issue
>> would just have been masked before that patch (and would
>> surface on a system with more than 5Tb of memory anyway).
> 
> The system I am having the issue has 48GB of memory.
> 
>> So it is very unlikely for the patch itself to be at fault.
> 
> I have traced the issue and found that the system crashing starts from
> this commit id and onward.
> (i.e. The system does not crash with commit id
> ed759d20249197cf87b338ff0ed328052ca3b8e7)
> So, I am still believe that this patch has somehow triggered the issue.
> 
>> Furthermore, the crash them supposedly is because of the code
>> added conditional upon NDEBUG, and hence would (on a smaller
>> memory system) otherwise not surface at all for a production
>> (debug=n) build.
>>
>>> The issue doesn''t reproduce when starting a PV (L2) guest
inside an HVM
>>> (L1) guest.
>> "Does not" or "does"? In the former case - what is
this supposed to
>> tell me?
> 
> What I am trying to say here is that the system_does not_ crash when
> starting the PV guest as level 2 guest.
> This is meant to be another data point to help analyzing the issue.
> 
>> In any case - without you sharing technical details (register/stack
>> dump of the crash at the very least) I don''t think I have
anything
>> at hand to look for possible problems.
> 
> At this point, I am just reporting the issue.  I have not been able to
> get the crash dump because the system immediately reboot.
> I''ll try to boot Xen with "noreboot" and inspect the log
for more clues.
> Any suggestions are welcome.
> 
> Thank you,
> 
> Suravee

Jan Beulich

2013-Jun-27 10:08 UTC

head link

Re: x86/AMD: Nested hvm crashes in 4.3

>>> On 27.06.13 at 11:20, Suravee Suthikulpanit
<suravee.suthikulpanit@amd.com> wrote:
> On 6/27/2013 3:22 AM, Jan Beulich wrote:
>>>>> On 27.06.13 at 02:24, Suravee Suthikulanit
<suravee.suthikulpanit@amd.com> wrote:
>>> I have found an issue in where the system crash right when I start
>>> another HVM guest inside an HVM guest.  I have traced back to the
patch
>>> which the issue started.
>>>
>>> commit f1bde87fc08ce8c818a1640a8fe4765d48923091
>>> Author: Jan Beulich <jbeulich@suse.com>
>>> Date:   Fri Feb 8 11:06:04 2013 +0100
>>>
>>>       x86: debugging code for testing 16Tb support on smaller
memory systems
>>>
>>>       Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>       Acked-by: Keir Fraser <keir@xen.org>
>> We had issues exposed by this patch before, but any such issue
>> would just have been masked before that patch (and would
>> surface on a system with more than 5Tb of memory anyway).
> 
> The system I am having the issue has 48GB of memory.
Which is why you''re seeing the problem only with the debugging
code enabled. (And of course I didn''t really expect you to have
tried this on a huge memory system - they''re just too rare still
for this to be likely.)
>> So it is very unlikely for the patch itself to be at fault.
> 
> I have traced the issue and found that the system crashing starts from this
> commit id and onward.
> (i.e. The system does not crash with commit id 
> ed759d20249197cf87b338ff0ed328052ca3b8e7)
> So, I am still believe that this patch has somehow triggered the issue.
As said - I''m pretty certain this merely unmasked an already
lurking issue. And that''s what the purpose of that patch is.

Jan

Suravee Suthikulpanit

2013-Jun-27 10:24 UTC

head link

Re: x86/AMD: Nested hvm crashes in 4.3

On 6/27/2013 5:08 AM, Jan Beulich wrote:>>>> On 27.06.13 at 11:20, Suravee Suthikulpanit
<suravee.suthikulpanit@amd.com> wrote:
>> On 6/27/2013 3:22 AM, Jan Beulich wrote:
>>>>>> On 27.06.13 at 02:24, Suravee Suthikulanit
<suravee.suthikulpanit@amd.com> wrote:
>>>> I have found an issue in where the system crash right when I
start
>>>> another HVM guest inside an HVM guest.  I have traced back to
the patch
>>>> which the issue started.
>>>>
>>>> commit f1bde87fc08ce8c818a1640a8fe4765d48923091
>>>> Author: Jan Beulich <jbeulich@suse.com>
>>>> Date:   Fri Feb 8 11:06:04 2013 +0100
>>>>
>>>>        x86: debugging code for testing 16Tb support on smaller
memory systems
>>>>
>>>>        Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>        Acked-by: Keir Fraser <keir@xen.org>
>>> We had issues exposed by this patch before, but any such issue
>>> would just have been masked before that patch (and would
>>> surface on a system with more than 5Tb of memory anyway).
>> The system I am having the issue has 48GB of memory.
> Which is why you''re seeing the problem only with the debugging
> code enabled.Is the "debugging" enabled by default?  I didn''t specify any
debug when
building.
How can I check and disable debugging?
> (And of course I didn''t really expect you to have
> tried this on a huge memory system - they''re just too rare still
> for this to be likely.)
>
>>> So it is very unlikely for the patch itself to be at fault.
>> I have traced the issue and found that the system crashing starts from
this
>> commit id and onward.
>> (i.e. The system does not crash with commit id
>> ed759d20249197cf87b338ff0ed328052ca3b8e7)
>> So, I am still believe that this patch has somehow triggered the issue.
> As said - I''m pretty certain this merely unmasked an already
> lurking issue.I''m not quite sure what you meant here.  Are you saying that this 
"crashing" is a known issue?
>   And that''s what the purpose of that patch is.This patch is crashing the system. What do you mean by "And that''s
what
the purpose of that patch is"?

Suravee>
> Jan
>
>

Andrew Cooper

2013-Jun-27 10:28 UTC

head link

Re: x86/AMD: Nested hvm crashes in 4.3

On 27/06/13 11:24, Suravee Suthikulpanit wrote:> On 6/27/2013 5:08 AM, Jan Beulich wrote:
>>>>> On 27.06.13 at 11:20, Suravee Suthikulpanit
>>>>> <suravee.suthikulpanit@amd.com> wrote:
>>> On 6/27/2013 3:22 AM, Jan Beulich wrote:
>>>>>>> On 27.06.13 at 02:24, Suravee Suthikulanit
>>>>>>> <suravee.suthikulpanit@amd.com> wrote:
>>>>> I have found an issue in where the system crash right when
I start
>>>>> another HVM guest inside an HVM guest.  I have traced back
to the
>>>>> patch
>>>>> which the issue started.
>>>>>
>>>>> commit f1bde87fc08ce8c818a1640a8fe4765d48923091
>>>>> Author: Jan Beulich <jbeulich@suse.com>
>>>>> Date:   Fri Feb 8 11:06:04 2013 +0100
>>>>>
>>>>>        x86: debugging code for testing 16Tb support on
smaller
>>>>> memory systems
>>>>>
>>>>>        Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>>        Acked-by: Keir Fraser <keir@xen.org>
>>>> We had issues exposed by this patch before, but any such issue
>>>> would just have been masked before that patch (and would
>>>> surface on a system with more than 5Tb of memory anyway).
>>> The system I am having the issue has 48GB of memory.
>> Which is why you''re seeing the problem only with the debugging
>> code enabled.
> Is the "debugging" enabled by default?  I didn''t specify
any debug
> when building.
> How can I check and disable debugging?
>
>> (And of course I didn''t really expect you to have
>> tried this on a huge memory system - they''re just too rare
still
>> for this to be likely.)
>>
>>>> So it is very unlikely for the patch itself to be at fault.
>>> I have traced the issue and found that the system crashing starts
>>> from this
>>> commit id and onward.
>>> (i.e. The system does not crash with commit id
>>> ed759d20249197cf87b338ff0ed328052ca3b8e7)
>>> So, I am still believe that this patch has somehow triggered the
issue.
>> As said - I''m pretty certain this merely unmasked an already
>> lurking issue.
> I''m not quite sure what you meant here.  Are you saying that this
> "crashing" is a known issue?
>
>>   And that''s what the purpose of that patch is.
> This patch is crashing the system. What do you mean by "And
that''s
> what the purpose of that patch is"?
>
> Suravee
>>
>> Jan
>>
>>
>
>
It means that this patch is exposing a latent bug where the nested hvm
code is already wrong.  It will be something in the nested hvm code
which is not using map_domain_page() when it really should be.

Without posting a stack trace, there is nothing we can do to help narrow
down the issue.

~Andrew

>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

Egger, Christoph

2013-Jun-27 10:33 UTC

head link

Re: x86/AMD: Nested hvm crashes in 4.3

On 27.06.13 12:24, Suravee Suthikulpanit wrote:> On 6/27/2013 5:08 AM, Jan Beulich wrote:
>>>>> On 27.06.13 at 11:20, Suravee Suthikulpanit
>>>>> <suravee.suthikulpanit@amd.com> wrote:
>>> On 6/27/2013 3:22 AM, Jan Beulich wrote:
>>>>>>> On 27.06.13 at 02:24, Suravee Suthikulanit
>>>>>>> <suravee.suthikulpanit@amd.com> wrote:
>>>>> I have found an issue in where the system crash right when
I start
>>>>> another HVM guest inside an HVM guest.  I have traced back
to the
>>>>> patch
>>>>> which the issue started.
>>>>>
>>>>> commit f1bde87fc08ce8c818a1640a8fe4765d48923091
>>>>> Author: Jan Beulich <jbeulich@suse.com>
>>>>> Date:   Fri Feb 8 11:06:04 2013 +0100
>>>>>
>>>>>        x86: debugging code for testing 16Tb support on
smaller
>>>>> memory systems
>>>>>
>>>>>        Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>>        Acked-by: Keir Fraser <keir@xen.org>
>>>> We had issues exposed by this patch before, but any such issue
>>>> would just have been masked before that patch (and would
>>>> surface on a system with more than 5Tb of memory anyway).
>>> The system I am having the issue has 48GB of memory.
>> Which is why you''re seeing the problem only with the debugging
>> code enabled.
> Is the "debugging" enabled by default?  I didn''t specify
any debug when
> building.
"Debugging" is enabled by default in the development tree.
> How can I check and disable debugging?
In the toplevel source directory look into Config.mk
and set the line

   debug ?= y

accordingly.
> 
>> (And of course I didn''t really expect you to have
>> tried this on a huge memory system - they''re just too rare
still
>> for this to be likely.)
>>
>>>> So it is very unlikely for the patch itself to be at fault.
>>> I have traced the issue and found that the system crashing starts
>>> from this
>>> commit id and onward.
>>> (i.e. The system does not crash with commit id
>>> ed759d20249197cf87b338ff0ed328052ca3b8e7)
>>> So, I am still believe that this patch has somehow triggered the
issue.
>> As said - I''m pretty certain this merely unmasked an already
>> lurking issue.
> I''m not quite sure what you meant here.  Are you saying that this
> "crashing" is a known issue?
He means nestedhvm reveals an existing bug in his patch.
If he is right then you do not see nestedhvm crashing with a non-debug
xen-kernel (unless something else broke it).
> 
>>   And that''s what the purpose of that patch is.
> This patch is crashing the system. What do you mean by "And
that''s what
> the purpose of that patch is"?
The purpose is "People, please test".

Christoph
> 
> Suravee
>>
>> Jan

Suravee Suthikulpanit

2013-Jun-27 11:14 UTC

head link

Re: x86/AMD: Nested hvm crashes in 4.3

On 6/27/2013 5:33 AM, Egger, Christoph wrote:> On 27.06.13 12:24, Suravee Suthikulpanit wrote:
>> On 6/27/2013 5:08 AM, Jan Beulich wrote:
>>>>>> On 27.06.13 at 11:20, Suravee Suthikulpanit
>>>>>> <suravee.suthikulpanit@amd.com> wrote:
>>>> On 6/27/2013 3:22 AM, Jan Beulich wrote:
>>>>>>>> On 27.06.13 at 02:24, Suravee Suthikulanit
>>>>>>>> <suravee.suthikulpanit@amd.com> wrote:
>>>>>> I have found an issue in where the system crash right
when I start
>>>>>> another HVM guest inside an HVM guest.  I have traced
back to the
>>>>>> patch
>>>>>> which the issue started.
>>>>>>
>>>>>> commit f1bde87fc08ce8c818a1640a8fe4765d48923091
>>>>>> Author: Jan Beulich <jbeulich@suse.com>
>>>>>> Date:   Fri Feb 8 11:06:04 2013 +0100
>>>>>>
>>>>>>         x86: debugging code for testing 16Tb support on
smaller
>>>>>> memory systems
>>>>>>
>>>>>>         Signed-off-by: Jan Beulich
<jbeulich@suse.com>
>>>>>>         Acked-by: Keir Fraser <keir@xen.org>
>>>>> We had issues exposed by this patch before, but any such
issue
>>>>> would just have been masked before that patch (and would
>>>>> surface on a system with more than 5Tb of memory anyway).
>>>> The system I am having the issue has 48GB of memory.
>>> Which is why you''re seeing the problem only with the
debugging
>>> code enabled.
>> Is the "debugging" enabled by default?  I didn''t
specify any debug when
>> building.
> "Debugging" is enabled by default in the development tree.
>
>> How can I check and disable debugging?
> In the toplevel source directory look into Config.mk
> and set the line
>
>     debug ?= y
>
> accordingly.
Thank you for clarification.
>>> (And of course I didn''t really expect you to have
>>> tried this on a huge memory system - they''re just too rare
still
>>> for this to be likely.)
>>>
>>>>> So it is very unlikely for the patch itself to be at fault.
>>>> I have traced the issue and found that the system crashing
starts
>>>> from this
>>>> commit id and onward.
>>>> (i.e. The system does not crash with commit id
>>>> ed759d20249197cf87b338ff0ed328052ca3b8e7)
>>>> So, I am still believe that this patch has somehow triggered
the issue.
>>> As said - I''m pretty certain this merely unmasked an
already
>>> lurking issue.
>> I''m not quite sure what you meant here.  Are you saying that
this
>> "crashing" is a known issue?
> He means nestedhvm reveals an existing bug in his patch.
> If he is right then you do not see nestedhvm crashing with a non-debug
> xen-kernel (unless something else broke it).
After I rebuilt Xen kernel with debug=n, the system no longer crash when
starting npt-on-npt and shadown-on-npt guests.
I was not able to get to the crash dump previously. I will try again tomorrow at
work and will post them.

Thank you,

Suravee

George Dunlap

2013-Jun-27 11:20 UTC

head link

Re: x86/AMD: Nested hvm crashes in 4.3

On Thu, Jun 27, 2013 at 11:24 AM, Suravee Suthikulpanit
<suravee.suthikulpanit@amd.com> wrote:> On 6/27/2013 5:08 AM, Jan Beulich wrote:
>>>>>
>>>>> On 27.06.13 at 11:20, Suravee Suthikulpanit
>>>>> <suravee.suthikulpanit@amd.com> wrote:
>>>
>>> On 6/27/2013 3:22 AM, Jan Beulich wrote:
>>>>>>>
>>>>>>> On 27.06.13 at 02:24, Suravee Suthikulanit
>>>>>>> <suravee.suthikulpanit@amd.com> wrote:
>>>>>
>>>>> I have found an issue in where the system crash right when
I start
>>>>> another HVM guest inside an HVM guest.  I have traced back
to the patch
>>>>> which the issue started.
>>>>>
>>>>> commit f1bde87fc08ce8c818a1640a8fe4765d48923091
>>>>> Author: Jan Beulich <jbeulich@suse.com>
>>>>> Date:   Fri Feb 8 11:06:04 2013 +0100
>>>>>
>>>>>        x86: debugging code for testing 16Tb support on
smaller memory
>>>>> systems
>>>>>
>>>>>        Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>>        Acked-by: Keir Fraser <keir@xen.org>
>>>>
>>>> We had issues exposed by this patch before, but any such issue
>>>> would just have been masked before that patch (and would
>>>> surface on a system with more than 5Tb of memory anyway).
>>>
>>> The system I am having the issue has 48GB of memory.
>>
>> Which is why you''re seeing the problem only with the debugging
>> code enabled.
>
> Is the "debugging" enabled by default?  I didn''t specify
any debug when
> building.
> How can I check and disable debugging?
>
>
>> (And of course I didn''t really expect you to have
>> tried this on a huge memory system - they''re just too rare
still
>> for this to be likely.)
>>
>>>> So it is very unlikely for the patch itself to be at fault.
>>>
>>> I have traced the issue and found that the system crashing starts
from
>>> this
>>> commit id and onward.
>>> (i.e. The system does not crash with commit id
>>> ed759d20249197cf87b338ff0ed328052ca3b8e7)
>>> So, I am still believe that this patch has somehow triggered the
issue.
>>
>> As said - I''m pretty certain this merely unmasked an already
>> lurking issue.
>
> I''m not quite sure what you meant here.  Are you saying that this
"crashing"
> is a known issue?
>
>
>>   And that''s what the purpose of that patch is.
>
> This patch is crashing the system. What do you mean by "And
that''s what the
> purpose of that patch is"?
*If* you had had >5TiB, then you would have crashed even without this patch.

The purpose of the patch is to make it so that if there is a bug that
will crash for >5TiB, then it will *also* crash for <5TiB.  Since the
vast majority of people have <5TiB of RAM, this results in better
testing coverage for those with >5TiB of RAM.

On production systems, we want it to work as often as possible, so
this test is disabled when debug=n, which is the default for released
versions of Xen.  But the development branch we very much want to find
bugs, so during development, we set debug=y by default.

 -George

Jan Beulich

2013-Jun-27 11:37 UTC

head link

Re: x86/AMD: Nested hvm crashes in 4.3

>>> On 27.06.13 at 12:24, Suravee Suthikulpanit
<suravee.suthikulpanit@amd.com> wrote:
> On 6/27/2013 5:08 AM, Jan Beulich wrote:
>>>>> On 27.06.13 at 11:20, Suravee Suthikulpanit
<suravee.suthikulpanit@amd.com> wrote:
>>> On 6/27/2013 3:22 AM, Jan Beulich wrote:
>>>> We had issues exposed by this patch before, but any such issue
>>>> would just have been masked before that patch (and would
>>>> surface on a system with more than 5Tb of memory anyway).
>>> The system I am having the issue has 48GB of memory.
>> Which is why you''re seeing the problem only with the debugging
>> code enabled.
> Is the "debugging" enabled by default?  I didn''t specify
any debug when
> building.
> How can I check and disable debugging?
Set

debug := n

close to the top of ./Config.mk.
>> (And of course I didn''t really expect you to have
>> tried this on a huge memory system - they''re just too rare
still
>> for this to be likely.)
>>
>>>> So it is very unlikely for the patch itself to be at fault.
>>> I have traced the issue and found that the system crashing starts
from this
>>> commit id and onward.
>>> (i.e. The system does not crash with commit id
>>> ed759d20249197cf87b338ff0ed328052ca3b8e7)
>>> So, I am still believe that this patch has somehow triggered the
issue.
>> As said - I''m pretty certain this merely unmasked an already
>> lurking issue.
> I''m not quite sure what you meant here.  Are you saying that this 
> "crashing" is a known issue?
No, I''m unaware of any issue similar to what you describe.
>>   And that''s what the purpose of that patch is.
> This patch is crashing the system. What do you mean by "And
that''s what
> the purpose of that patch is"?
The finding of bugs that otherwise would surface only once
indeed running on a huge memory system. If on such a system
this would result in crashing the host, so be it with this
debugging code even on "normal" systems (as long as not
running in production mode). The alternative would be to keep
the bug masked until someone really tried to run Xen on such
a huge system, and the debugging of this then would be quite
a bit more expensive (if nothing else then in the amount of
electrical power needed).

Jan

Suravee Suthikulanit

2013-Jun-28 00:44 UTC

head link

Re: x86/AMD: Nested hvm crashes in 4.3

On 6/27/2013 6:14 AM, Suravee Suthikulpanit wrote:> On 6/27/2013 5:33 AM, Egger, Christoph wrote:
>> On 27.06.13 12:24, Suravee Suthikulpanit wrote:
>>> On 6/27/2013 5:08 AM, Jan Beulich wrote:
>>>>>>> On 27.06.13 at 11:20, Suravee Suthikulpanit
>>>>>>> <suravee.suthikulpanit@amd.com> wrote:
>>>>> On 6/27/2013 3:22 AM, Jan Beulich wrote:
>>>>>>>>> On 27.06.13 at 02:24, Suravee Suthikulanit
>>>>>>>>> <suravee.suthikulpanit@amd.com>
wrote:
>>>>>>> I have found an issue in where the system crash
right when I start
>>>>>>> another HVM guest inside an HVM guest.  I have
traced back to the
>>>>>>> patch
>>>>>>> which the issue started.
>>>>>>>
>>>>>>> commit f1bde87fc08ce8c818a1640a8fe4765d48923091
>>>>>>> Author: Jan Beulich <jbeulich@suse.com>
>>>>>>> Date:   Fri Feb 8 11:06:04 2013 +0100
>>>>>>>
>>>>>>>         x86: debugging code for testing 16Tb
support on smaller
>>>>>>> memory systems
>>>>>>>
>>>>>>>         Signed-off-by: Jan Beulich
<jbeulich@suse.com>
>>>>>>>         Acked-by: Keir Fraser <keir@xen.org>
>>>>>> We had issues exposed by this patch before, but any
such issue
>>>>>> would just have been masked before that patch (and
would
>>>>>> surface on a system with more than 5Tb of memory
anyway).
>>>>> The system I am having the issue has 48GB of memory.
>>>> Which is why you''re seeing the problem only with the
debugging
>>>> code enabled.
>>> Is the "debugging" enabled by default?  I didn''t
specify any debug when
>>> building.
>> "Debugging" is enabled by default in the development tree.
>>
>>> How can I check and disable debugging?
>> In the toplevel source directory look into Config.mk
>> and set the line
>>
>>     debug ?= y
>>
>> accordingly.
>
> Thank you for clarification.
>
>>>> (And of course I didn''t really expect you to have
>>>> tried this on a huge memory system - they''re just too
rare still
>>>> for this to be likely.)
>>>>
>>>>>> So it is very unlikely for the patch itself to be at
fault.
>>>>> I have traced the issue and found that the system crashing
starts
>>>>> from this
>>>>> commit id and onward.
>>>>> (i.e. The system does not crash with commit id
>>>>> ed759d20249197cf87b338ff0ed328052ca3b8e7)
>>>>> So, I am still believe that this patch has somehow
triggered the
>>>>> issue.
>>>> As said - I''m pretty certain this merely unmasked an
already
>>>> lurking issue.
>>> I''m not quite sure what you meant here.  Are you saying
that this
>>> "crashing" is a known issue?
>> He means nestedhvm reveals an existing bug in his patch.
>> If he is right then you do not see nestedhvm crashing with a non-debug
>> xen-kernel (unless something else broke it).
>
> After I rebuilt Xen kernel with debug=n, the system no longer crash 
> when starting npt-on-npt and shadown-on-npt guests.
> I was not able to get to the crash dump previously. I will try again 
> tomorrow at work and will post them.
>
> Thank you,
>
> Suravee
So, I have finally able to get the crash dump (see below). The crash is due to
an assert

     (XEN) Assertion ''va >= XEN_VIRT_START'' failed at
/sandbox/xen/xen.git/xen/include/asm/x86_64/page.h:86

* Debugging show the va=ffff82c40002d000, XEN_VIRT_START=ffff82c4c0000000,
DIRECTMAP_VIRT_END=ffffff8000000000.
* Backtrace symbol showing the crash is in "svm_vmexit_handler()",
which is inlined from "svm_vmexit_do_vmsave()" and
"svm_vmsave()".

CRASH DUMP
=========
(XEN) Assertion ''va >= XEN_VIRT_START'' failed at
/sandbox/xen/xen.git/xen/include/asm/x86_64/page.h:86
(XEN) Debugging connection not set up.
(XEN) ----[ Xen-4.3-unstable  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    17
(XEN) RIP:    e008:[<ffff82c4c01cfbfc>] svm_vmexit_handler+0x1574/0x1a2a
(XEN) RFLAGS: 0000000000010293   CONTEXT: hypervisor
(XEN) rax: ffff82c4bfffffff   rbx: ffff830852ec1000   rcx: 0000000000000000
(XEN) rdx: ffff830434757020   rsi: 000000000000000a   rdi: ffff82c4c0283740
(XEN) rbp: ffff83043474ff08   rsp: ffff83043474fd28   r8:  0000000000000004
(XEN) r9:  0000000000000010   r10: ffffff8000000000   r11: 0000000000000010
(XEN) r12: ffff83000e010000   r13: 0000000000000003   r14: 0000000000000000
(XEN) r15: ffff82c40002d000   cr0: 000000008005003b   cr4: 00000000000406f0
(XEN) cr3: 000000086d9dd000   cr2: 00007fe7f8e99120
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen stack trace from rsp=ffff83043474fd28:
(XEN)    ffff83000e010000 ffff83043474fd70 ffff82c4c01bb001 0000000000000000
(XEN)    ffff83000e010000 ffff830852ec1000 0000000000000000 0000000000000000
(XEN)    ffff830434757080 ffff83043474fda0 ffff82c4c01cca33 0000000000000000
(XEN)    ffff8300c7ea6000 00000000000fee00 ffff83043474ff18 ffff830400000000
(XEN)    ffff82c4c015fe19 ffff83043474fe10 ffff82c4c0185827 00000000000000fc
(XEN)    0000003b5c327b44 0000000a0000000d 0000000000000000 0000000000000000
(XEN)    0000000000000000 ffff83043474fe20 ffff8300c7ea6000 00000049a0b0dcf5
(XEN)    0000000000000286 ffff83043474fe28 ffff82c4c0125e9e ffff83000e010000
(XEN)    ffff83043474fe98 ffff82c4c01c8048 ffff82c4c0125e9e ffff83000e010488
(XEN)    ffff83043474fe98 ffffffffffffffff ffff83043474fe78 ffff82c4c01c5e56
(XEN)    ffff83000e010000 ffff830853ea1000 ffff83043474fe98 ffff82c4c01be614
(XEN)    ffff83000e010000 0000000000000007 ffff83043474ff08 ffff82c4c01c8e66
(XEN)    ffff830434757080 000000fc3474fee0 ffff82c4c0125a52 ffff830434748000
(XEN)    ffff830434748000 00000000ffffffff ffff830852ec1000 ffff83000e010000
(XEN)    ffff830209c87000 0000000000000007 0000000000000003 ffff830203ddff18
(XEN)    ffff830203ddfd70 ffff82c4c01d1c45 ffff830203ddff18 0000000000000003
(XEN)    0000000000000007 ffff830209c87000 ffff830203ddfd70 ffff8300d4b46000
(XEN)    0000000000000246 00000000deadbeef 00000013eabcc169 0000000000000003
(XEN)    0000000203de2000 0000000000000000 0000000203de2000 ffff830203de4000
(XEN)    ffff830209c87000 0000beef0000beef ffff82c4c01ce158 0000beef0000beef
(XEN) Xen call trace:
(XEN)    [<ffff82c4c01cfbfc>] svm_vmexit_handler+0x1574/0x1a2a
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 17:
(XEN) Assertion ''va >= XEN_VIRT_START'' failed at
/sandbox/xen/xen.git/xen/include/asm/x86_64/page.h:92
(XEN) ****************************************
(XEN)
(XEN) Manual reset required (''noreboot'' specified)
(XEN) Debugging connection not set up.

Suravee

Jan Beulich

2013-Jun-28 07:58 UTC

head link

Re: x86/AMD: Nested hvm crashes in 4.3

>>> On 28.06.13 at 02:44, Suravee Suthikulanit
<suravee.suthikulpanit@amd.com> wrote:
> So, I have finally able to get the crash dump (see below). The crash is due
> to an assert
> 
>      (XEN) Assertion ''va >= XEN_VIRT_START'' failed at 
> /sandbox/xen/xen.git/xen/include/asm/x86_64/page.h:86
> 
> * Debugging show the va=ffff82c40002d000, XEN_VIRT_START=ffff82c4c0000000, 
> DIRECTMAP_VIRT_END=ffffff8000000000.
> * Backtrace symbol showing the crash is in
"svm_vmexit_handler()", which is
> inlined from "svm_vmexit_do_vmsave()" and
"svm_vmsave()".
Which helps in no way identifying where the problem is -
svm_vmexit_handler() is just too large to spot this without either
the matching xen-syms at hand, or you adding further
instrumentation.

Jan

Suravee Suthikulanit

2013-Jun-28 14:20 UTC

head link

Re: x86/AMD: Nested hvm crashes in 4.3

On 6/28/2013 2:58 AM, Jan Beulich wrote:>>>> On 28.06.13 at 02:44, Suravee Suthikulanit
<suravee.suthikulpanit@amd.com> wrote:
>> So, I have finally able to get the crash dump (see below). The crash is
due
>> to an assert
>>
>>       (XEN) Assertion ''va >= XEN_VIRT_START''
failed at
>> /sandbox/xen/xen.git/xen/include/asm/x86_64/page.h:86
>>
>> * Debugging show the va=ffff82c40002d000,
XEN_VIRT_START=ffff82c4c0000000,
>> DIRECTMAP_VIRT_END=ffffff8000000000.
>> * Backtrace symbol showing the crash is in
"svm_vmexit_handler()", which is
>> inlined from "svm_vmexit_do_vmsave()" and
"svm_vmsave()".
> Which helps in no way identifying where the problem is -
> svm_vmexit_handler() is just too large to spot this without either
> the matching xen-syms at hand, or you adding further
> instrumentation.
>
> Jan
What I am trying to say is, the assertion is in the __virt_to_maddr which is
called from
svm_vmexit_do_vmsave().  However, this is a bit complicate due to macros and
inlines.
Here is the callchain supposed to look like:

     ASSERT(va >= XEN_VIRT_START )
     __virt_to_maddr        <---- inlined
     virt_to_mfn ()         <---- macro
     __pa ()                <---- macro
     smv_vmasave()          <---- inlined
     svm_vmexit_do_vmsave() <---- inlined
     svm_vmexit_handler()   <---- symbol

Suravee

Andrew Cooper

2013-Jun-28 14:24 UTC

head link

Re: x86/AMD: Nested hvm crashes in 4.3

On 28/06/13 15:20, Suravee Suthikulanit wrote:> On 6/28/2013 2:58 AM, Jan Beulich wrote:
>>>>> On 28.06.13 at 02:44, Suravee Suthikulanit
>>>>> <suravee.suthikulpanit@amd.com> wrote:
>>> So, I have finally able to get the crash dump (see below). The
crash
>>> is due
>>> to an assert
>>>
>>>       (XEN) Assertion ''va >= XEN_VIRT_START''
failed at
>>> /sandbox/xen/xen.git/xen/include/asm/x86_64/page.h:86
>>>
>>> * Debugging show the va=ffff82c40002d000,
>>> XEN_VIRT_START=ffff82c4c0000000,
>>> DIRECTMAP_VIRT_END=ffffff8000000000.
>>> * Backtrace symbol showing the crash is in
"svm_vmexit_handler()",
>>> which is
>>> inlined from "svm_vmexit_do_vmsave()" and
"svm_vmsave()".
>> Which helps in no way identifying where the problem is -
>> svm_vmexit_handler() is just too large to spot this without either
>> the matching xen-syms at hand, or you adding further
>> instrumentation.
>>
>> Jan
>
> What I am trying to say is, the assertion is in the __virt_to_maddr
> which is called from
> svm_vmexit_do_vmsave().  However, this is a bit complicate due to
> macros and inlines.
> Here is the callchain supposed to look like:
>
>     ASSERT(va >= XEN_VIRT_START )
>     __virt_to_maddr        <---- inlined
>     virt_to_mfn ()         <---- macro
>     __pa ()                <---- macro
>     smv_vmasave()          <---- inlined
>     svm_vmexit_do_vmsave() <---- inlined
>     svm_vmexit_handler()   <---- symbol
>
> Suravee
The code is assuming that the virtual address is mapped into the Xen
pagetables when in fact it is not.

The code needs to be corrected to use map_domain_page() to correctly
access a domheap page.

~Andrew

Jan Beulich

2013-Jun-28 14:52 UTC

head link

Re: x86/AMD: Nested hvm crashes in 4.3

>>> On 28.06.13 at 16:20, Suravee Suthikulanit
<suravee.suthikulpanit@amd.com>
wrote:> On 6/28/2013 2:58 AM, Jan Beulich wrote:
>>>>> On 28.06.13 at 02:44, Suravee Suthikulanit
<suravee.suthikulpanit@amd.com>
> wrote:
>>> So, I have finally able to get the crash dump (see below). The
crash is due
>>> to an assert
>>>
>>>       (XEN) Assertion ''va >= XEN_VIRT_START''
failed at
>>> /sandbox/xen/xen.git/xen/include/asm/x86_64/page.h:86
>>>
>>> * Debugging show the va=ffff82c40002d000,
XEN_VIRT_START=ffff82c4c0000000,
>>> DIRECTMAP_VIRT_END=ffffff8000000000.
>>> * Backtrace symbol showing the crash is in
"svm_vmexit_handler()", which is
>>> inlined from "svm_vmexit_do_vmsave()" and
"svm_vmsave()".
>> Which helps in no way identifying where the problem is -
>> svm_vmexit_handler() is just too large to spot this without either
>> the matching xen-syms at hand, or you adding further
>> instrumentation.
>>
>> Jan
> 
> What I am trying to say is, the assertion is in the __virt_to_maddr which
is
> called from
> svm_vmexit_do_vmsave().  However, this is a bit complicate due to macros
and
> inlines.
> Here is the callchain supposed to look like:
> 
>      ASSERT(va >= XEN_VIRT_START )
>      __virt_to_maddr        <---- inlined
>      virt_to_mfn ()         <---- macro
>      __pa ()                <---- macro
>      smv_vmasave()          <---- inlined
>      svm_vmexit_do_vmsave() <---- inlined
>      svm_vmexit_handler()   <---- symbol
So the problem is the inverse of the usual one (and that''s part of
why I didn''t spot it when searching the tree for code that needs
fixing; the other part is that while running into these functions I
knew that VMCBs get allocated from the Xen heap, but didn''t
notice that the same functions also get used for dealing with
guest VMCBs):

nestedsvm_vmcb_map() properly does the necessary mapping,
but svm_vmsave() (just like svm_vmload()) blindly uses __pa() on
something that''s not an address in the direct mapping region.
Which means that on 4.2.0, where we still had a 32-bit hypervisor,
nested SVM was completely broken (and presumably never tested)
in that 32-bit case. Luckily we meanwhile disabled the use of nested
HVM in 4.2.x''s 32-bit builds.

Jan

Egger, Christoph

2013-Jun-28 15:05 UTC

head link

Re: x86/AMD: Nested hvm crashes in 4.3

On 28.06.13 16:52, Jan Beulich wrote:>>>> On 28.06.13 at 16:20, Suravee Suthikulanit
<suravee.suthikulpanit@amd.com>
> wrote:
>> On 6/28/2013 2:58 AM, Jan Beulich wrote:
>>>>>> On 28.06.13 at 02:44, Suravee Suthikulanit
<suravee.suthikulpanit@amd.com>
>> wrote:
>>>> So, I have finally able to get the crash dump (see below). The
crash is due
>>>> to an assert
>>>>
>>>>       (XEN) Assertion ''va >=
XEN_VIRT_START'' failed at
>>>> /sandbox/xen/xen.git/xen/include/asm/x86_64/page.h:86
>>>>
>>>> * Debugging show the va=ffff82c40002d000,
XEN_VIRT_START=ffff82c4c0000000,
>>>> DIRECTMAP_VIRT_END=ffffff8000000000.
>>>> * Backtrace symbol showing the crash is in
"svm_vmexit_handler()", which is
>>>> inlined from "svm_vmexit_do_vmsave()" and
"svm_vmsave()".
>>> Which helps in no way identifying where the problem is -
>>> svm_vmexit_handler() is just too large to spot this without either
>>> the matching xen-syms at hand, or you adding further
>>> instrumentation.
>>>
>>> Jan
>>
>> What I am trying to say is, the assertion is in the __virt_to_maddr
which is
>> called from
>> svm_vmexit_do_vmsave().  However, this is a bit complicate due to
macros and
>> inlines.
>> Here is the callchain supposed to look like:
>>
>>      ASSERT(va >= XEN_VIRT_START )
>>      __virt_to_maddr        <---- inlined
>>      virt_to_mfn ()         <---- macro
>>      __pa ()                <---- macro
>>      smv_vmasave()          <---- inlined
>>      svm_vmexit_do_vmsave() <---- inlined
>>      svm_vmexit_handler()   <---- symbol
> 
> So the problem is the inverse of the usual one (and that''s part of
> why I didn''t spot it when searching the tree for code that needs
> fixing; the other part is that while running into these functions I
> knew that VMCBs get allocated from the Xen heap, but didn''t
> notice that the same functions also get used for dealing with
> guest VMCBs):
> 
> nestedsvm_vmcb_map() properly does the necessary mapping,
> but svm_vmsave() (just like svm_vmload()) blindly uses __pa() on
> something that''s not an address in the direct mapping region.
> Which means that on 4.2.0, where we still had a 32-bit hypervisor,
> nested SVM was completely broken (and presumably never tested)
> in that 32-bit case. Luckily we meanwhile disabled the use of nested
> HVM in 4.2.x''s 32-bit builds.
I never tested nested svm on 32bit on the host. I did test
32bit hypervisors as guest.

Christoph

Xen devel - Jun 2013 - x86/AMD: Nested hvm crashes in 4.3

x86/AMD: Nested hvm crashes in 4.3

Re: x86/AMD: Nested hvm crashes in 4.3

Re: x86/AMD: Nested hvm crashes in 4.3

Re: x86/AMD: Nested hvm crashes in 4.3

Re: x86/AMD: Nested hvm crashes in 4.3

Re: x86/AMD: Nested hvm crashes in 4.3

Re: x86/AMD: Nested hvm crashes in 4.3

Re: x86/AMD: Nested hvm crashes in 4.3

Re: x86/AMD: Nested hvm crashes in 4.3

Re: x86/AMD: Nested hvm crashes in 4.3

Re: x86/AMD: Nested hvm crashes in 4.3

Re: x86/AMD: Nested hvm crashes in 4.3

Re: x86/AMD: Nested hvm crashes in 4.3

Re: x86/AMD: Nested hvm crashes in 4.3

Re: x86/AMD: Nested hvm crashes in 4.3

Re: x86/AMD: Nested hvm crashes in 4.3

Re: x86/AMD: Nested hvm crashes in 4.3