thr3ads.net - Xen devel - PAE PV guest kernel regression [Jun 2012]

If this information is useful, please help other people find it:
Share via:

Andrew Jones

2012-Jun-06 08:54 UTC

PAE PV guest kernel regression

https://bugzilla.redhat.com/show_bug.cgi?id=829016

was opened yesterday. Maybe the following commit
is the culprit?

commit cb8095bba6d24118135a5683a956f4f4fb5f17bb
Author: Jan Beulich <JBeulich@suse.com>
Date:   Fri Jan 20 16:22:04 2012 +0000

    x86: atomic64 assembly improvements

Drew

Jan Beulich

2012-Jun-06 10:41 UTC

head link

Re: PAE PV guest kernel regression

>>> On 06.06.12 at 10:54, Andrew Jones <drjones@redhat.com>
wrote:
> https://bugzilla.redhat.com/show_bug.cgi?id=829016 
> 
> was opened yesterday. Maybe the following commit
> is the culprit?
> 
> commit cb8095bba6d24118135a5683a956f4f4fb5f17bb
> Author: Jan Beulich <JBeulich@suse.com>
> Date:   Fri Jan 20 16:22:04 2012 +0000
> 
>     x86: atomic64 assembly improvements
Hardly - that change didn''t even touch atomic64_read_cx8()
or any of its constraints.

Further more, this

*pdpt = 0000000023be6027 *pde = 00000000037c2067 *pte = 8000000023bdb061 
Oops: 0003 [#1] SMP 

tells us that this was a write-protection fault, yet the hypervisor
failed to emulate this (hence the hypervisor log would likely help).
After quite some time spent looking through the upstream 3.0.4
sources (and my local object files) I can''t, however, spot where
the call to atomic64_read() is actually located in the functions in
question.

Jan

Andrew Jones

2012-Jun-07 10:46 UTC

head link

Re: PAE PV guest kernel regression

----- Original Message -----> >>> On 06.06.12 at 10:54, Andrew Jones <drjones@redhat.com>
wrote:
> 
> > https://bugzilla.redhat.com/show_bug.cgi?id=829016
> > 
> > was opened yesterday. Maybe the following commit
> > is the culprit?
> > 
> > commit cb8095bba6d24118135a5683a956f4f4fb5f17bb
> > Author: Jan Beulich <JBeulich@suse.com>
> > Date:   Fri Jan 20 16:22:04 2012 +0000
> > 
> >     x86: atomic64 assembly improvements
> 
> Hardly - that change didn''t even touch atomic64_read_cx8()
> or any of its constraints.
> 
> Further more, this
> 
> *pdpt = 0000000023be6027 *pde = 00000000037c2067 *pte > 8000000023bdb061
> Oops: 0003 [#1] SMP
> 
> tells us that this was a write-protection fault, yet the hypervisor
> failed to emulate this (hence the hypervisor log would likely help).
> After quite some time spent looking through the upstream 3.0.4
> sources (and my local object files) I can''t, however, spot where
> the call to atomic64_read() is actually located in the functions in
> question.
> 
Ah, I didn''t check to see what Fedora pulled in on top of 3.4. Had I
done that I would have immediately suspected a different patch instead
(mm-pmd_read_atomic-fix-32bit-PAE-pmd-walk-vs-pmd_populate-SMP-race-condition.patch,
upstream commit 26c191788f18). We''ve already encountered one problem
with this patch for RHEL6 and fixed it. The patch F17 has, however, is
already the "fixed" version. Now the difference between RHEL6 and F17
though is that F17 has CONFIG_TRANSPARENT_HUGEPAGE=y for 32b guests,
but RHEL6 does not. So now with this patch F17 is calling
atomic64_read() from pmd_none_or_trans_huge_or_clear_bad().

Drew

Jan Beulich

2012-Jun-07 14:55 UTC

head link

Re: PAE PV guest kernel regression

>>> Andrew Jones <drjones@redhat.com> 06/07/12 12:47 PM
>>>
>Ah, I didn''t check to see what Fedora pulled in on top of 3.4. Had
I
>done that I would have immediately suspected a different patch instead
>(mm-pmd_read_atomic-fix-32bit-PAE-pmd-walk-vs-pmd_populate-SMP-race-condition.patch,
>upstream commit 26c191788f18). We''ve already encountered one
problem
>with this patch for RHEL6 and fixed it. The patch F17 has, however, is
>already the "fixed" version. Now the difference between RHEL6 and
F17
>though is that F17 has CONFIG_TRANSPARENT_HUGEPAGE=y for 32b guests,
>but RHEL6 does not. So now with this patch F17 is calling
>atomic64_read() from pmd_none_or_trans_huge_or_clear_bad().
By itself the use of atomic64_read() on page table entries shouldn''t be
a problem though, particularly not if what might get written is a not
present entry (while the way atoimc64_read_cx8() works would also
allow for values with the low bit set to be pseudo-written, but neither
did that appear to be the case in at least one of the two cases where
register dumps were provided in your bugzilla, nor should an attempt
to write back a value that was already there cause any problem, even
if the low bit was set).

In any case, the hypervisor log would be interesting to see. Plus - is
this perhaps a rather old hypervisor running there (i.e. potentially
lacking some fix)?

Jan

Konrad Rzeszutek Wilk

2012-Jun-07 14:57 UTC

head link

Re: PAE PV guest kernel regression

> Ah, I didn''t check to see what Fedora pulled in on top of 3.4. Had
I
> done that I would have immediately suspected a different patch instead
>
(mm-pmd_read_atomic-fix-32bit-PAE-pmd-walk-vs-pmd_populate-SMP-race-condition.patch,
> upstream commit 26c191788f18). We''ve already encountered one
problem
> with this patch for RHEL6 and fixed it. The patch F17 has, however, is
Was the fix for the git commit 26c191788f18 (which I understand is in RHEL6)
posted?
> already the "fixed" version. Now the difference between RHEL6 and
F17
> though is that F17 has CONFIG_TRANSPARENT_HUGEPAGE=y for 32b guests,
> but RHEL6 does not. So now with this patch F17 is calling
> atomic64_read() from pmd_none_or_trans_huge_or_clear_bad().
> 
> Drew
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

Andrew Jones

2012-Jun-07 15:16 UTC

head link

Re: PAE PV guest kernel regression

----- Original Message ----- > Was the fix for the git commit 26c191788f18 (which I understand is in
> RHEL6)
> posted?
Nope, it got rolled into 26c191788f18 before 26c191788f18 was
committed. The fix was to not use __pmd, which is paravirtualized,
when doing the final creation of the pmd_t retval. Here it is
(the 2f88dfd2 referenced there is a rhel6 commit).

    Commit 2f88dfd2 introduces read_pmd_atomic which calls __pmd. __pmd is
    paravirtualized and for xen it calls xen_make_pmd(). xen_make_pmd may
    return a zero pmd in the case that the pfn''s associated mfn is
still
    invalid. However at the point when read_pmd_atomic is called the mfn can
    appear invalid when it isn''t, if its creation was lazily deferred.
If
    read_pmd_atomic returns zero in these cases then it can eventually lead to
    a crash. To resolve the issue we can avoid calling __pmd and just return
    the open coded compound literal (as native_make_pmd would do) instead.
    
diff --git a/arch/x86/include/asm/pgtable-3level.h b/arch/x86/include/asm/pgtabl
index f2d473b..4f325a1 100644
--- a/arch/x86/include/asm/pgtable-3level.h
+++ b/arch/x86/include/asm/pgtable-3level.h
@@ -73,12 +73,13 @@ static inline pmd_t read_pmd_atomic(pmd_t *pmdp)
                ret |= ((pmdval_t)*(tmp + 1)) << 32;
        }
 
-       return __pmd(ret);
+       return (pmd_t) { ret };
 }
 #else /* CONFIG_TRANSPARENT_HUGEPAGE */
 static inline pmd_t read_pmd_atomic(pmd_t *pmdp)
 {
-       return __pmd(atomic64_read((atomic64_t *)pmdp));
+       pmdval_t val = (pmdval_t)atomic64_read((atomic64_t *)pmdp);
+       return (pmd_t) { val };
 }
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */

Andrew Jones

2012-Jun-07 15:19 UTC

head link

Re: PAE PV guest kernel regression

----- Original Message -----> >>> Andrew Jones <drjones@redhat.com> 06/07/12 12:47 PM
>>>
> >Ah, I didn''t check to see what Fedora pulled in on top of 3.4.
Had I
> >done that I would have immediately suspected a different patch
> >instead
>
>(mm-pmd_read_atomic-fix-32bit-PAE-pmd-walk-vs-pmd_populate-SMP-race-condition.patch,
> >upstream commit 26c191788f18). We''ve already encountered one
problem
> >with this patch for RHEL6 and fixed it. The patch F17 has, however,
> >is
> >already the "fixed" version. Now the difference between RHEL6
and
> >F17
> >though is that F17 has CONFIG_TRANSPARENT_HUGEPAGE=y for 32b guests,
> >but RHEL6 does not. So now with this patch F17 is calling
> >atomic64_read() from pmd_none_or_trans_huge_or_clear_bad().
> 
> By itself the use of atomic64_read() on page table entries
shouldn''t
> be
> a problem though, particularly not if what might get written is a not
> present entry (while the way atoimc64_read_cx8() works would also
> allow for values with the low bit set to be pseudo-written, but
> neither
> did that appear to be the case in at least one of the two cases where
> register dumps were provided in your bugzilla, nor should an attempt
> to write back a value that was already there cause any problem, even
> if the low bit was set).
> 
> In any case, the hypervisor log would be interesting to see. Plus -
> is
> this perhaps a rather old hypervisor running there (i.e. potentially
> lacking some fix)?
I just reproduced this on my own test box, which is running an HV
based on older code (it''s RHEL 5.8). Even with
''loglvl=all guest_loglvl=all'' I didn''t get anything
logged in
''xm dmesg''.

Drew

Andrew Jones

2012-Jun-07 15:44 UTC

head link

Re: PAE PV guest kernel regression

----- Original Message -----> > In any case, the hypervisor log would be interesting to see. Plus -
> > is
> > this perhaps a rather old hypervisor running there (i.e.
> > potentially
> > lacking some fix)?
> 
> I just reproduced this on my own test box, which is running an HV
> based on older code (it''s RHEL 5.8). Even with
> ''loglvl=all guest_loglvl=all'' I didn''t get
anything logged in
> ''xm dmesg''.
I don''t have an later hypervisor setup for testing, but it''d
be nice to know if guests work on them. If so, then maybe RHEL5
and EC2 need to look at Xen c/s 17498 and/or others.

Drew

Jan Beulich

2012-Jun-07 18:48 UTC

head link

Re: PAE PV guest kernel regression

>>> Andrew Jones <drjones@redhat.com> 06/07/12 5:44 PM
>>>
>> > In any case, the hypervisor log would be interesting to see. Plus
-
>> > is
>> > this perhaps a rather old hypervisor running there (i.e.
>> > potentially
>> > lacking some fix)?
>> 
>> I just reproduced this on my own test box, which is running an HV
>> based on older code (it''s RHEL 5.8). Even with
>> ''loglvl=all guest_loglvl=all'' I didn''t get
anything logged in
>> ''xm dmesg''.
>
>I don''t have an later hypervisor setup for testing, but
it''d
>be nice to know if guests work on them. If so, then maybe RHEL5
>and EC2 need to look at Xen c/s 17498 and/or others.
Is it perhaps the really old 3.0.4 based one, and lacking a backport of
14041:b010e556fe2c? In that case the comparison part of the
cmpxchg8b emulation failing would apparently be treated equally to
e.g. a fault having occurred during the emulation (i.e. the original
page fault would get forwarded rather than just EFLAGS.ZF getting
set).

Jan

Jan Beulich

2012-Jun-07 19:43 UTC

head link

Re: PAE PV guest kernel regression

>>> "Jan Beulich" <jbeulich@suse.com> 06/07/12 8:50 PM
>>>
>Is it perhaps the really old 3.0.4 based one, and lacking a backport of
>14041:b010e556fe2c? In that case the comparison part of the
>cmpxchg8b emulation failing would apparently be treated equally to
>e.g. a fault having occurred during the emulation (i.e. the original
>page fault would get forwarded rather than just EFLAGS.ZF getting
>set).
Having read the other thread with Andrea''s explanation, that''s
unlikely
the culprit. He says the cmpxchg8b is being done on a PMD entry,
which the ptwr code definitely doesn''t support so far.

Jan

Xen devel - Jun 2012 - PAE PV guest kernel regression

PAE PV guest kernel regression

Re: PAE PV guest kernel regression

Re: PAE PV guest kernel regression

Re: PAE PV guest kernel regression

Re: PAE PV guest kernel regression

Re: PAE PV guest kernel regression

Re: PAE PV guest kernel regression

Re: PAE PV guest kernel regression

Re: PAE PV guest kernel regression

Re: PAE PV guest kernel regression