https://bugzilla.redhat.com/show_bug.cgi?id=829016 was opened yesterday. Maybe the following commit is the culprit? commit cb8095bba6d24118135a5683a956f4f4fb5f17bb Author: Jan Beulich <JBeulich@suse.com> Date: Fri Jan 20 16:22:04 2012 +0000 x86: atomic64 assembly improvements Drew
>>> On 06.06.12 at 10:54, Andrew Jones <drjones@redhat.com> wrote:> https://bugzilla.redhat.com/show_bug.cgi?id=829016 > > was opened yesterday. Maybe the following commit > is the culprit? > > commit cb8095bba6d24118135a5683a956f4f4fb5f17bb > Author: Jan Beulich <JBeulich@suse.com> > Date: Fri Jan 20 16:22:04 2012 +0000 > > x86: atomic64 assembly improvementsHardly - that change didn''t even touch atomic64_read_cx8() or any of its constraints. Further more, this *pdpt = 0000000023be6027 *pde = 00000000037c2067 *pte = 8000000023bdb061 Oops: 0003 [#1] SMP tells us that this was a write-protection fault, yet the hypervisor failed to emulate this (hence the hypervisor log would likely help). After quite some time spent looking through the upstream 3.0.4 sources (and my local object files) I can''t, however, spot where the call to atomic64_read() is actually located in the functions in question. Jan
----- Original Message -----> >>> On 06.06.12 at 10:54, Andrew Jones <drjones@redhat.com> wrote: > > > https://bugzilla.redhat.com/show_bug.cgi?id=829016 > > > > was opened yesterday. Maybe the following commit > > is the culprit? > > > > commit cb8095bba6d24118135a5683a956f4f4fb5f17bb > > Author: Jan Beulich <JBeulich@suse.com> > > Date: Fri Jan 20 16:22:04 2012 +0000 > > > > x86: atomic64 assembly improvements > > Hardly - that change didn''t even touch atomic64_read_cx8() > or any of its constraints. > > Further more, this > > *pdpt = 0000000023be6027 *pde = 00000000037c2067 *pte > 8000000023bdb061 > Oops: 0003 [#1] SMP > > tells us that this was a write-protection fault, yet the hypervisor > failed to emulate this (hence the hypervisor log would likely help). > After quite some time spent looking through the upstream 3.0.4 > sources (and my local object files) I can''t, however, spot where > the call to atomic64_read() is actually located in the functions in > question. >Ah, I didn''t check to see what Fedora pulled in on top of 3.4. Had I done that I would have immediately suspected a different patch instead (mm-pmd_read_atomic-fix-32bit-PAE-pmd-walk-vs-pmd_populate-SMP-race-condition.patch, upstream commit 26c191788f18). We''ve already encountered one problem with this patch for RHEL6 and fixed it. The patch F17 has, however, is already the "fixed" version. Now the difference between RHEL6 and F17 though is that F17 has CONFIG_TRANSPARENT_HUGEPAGE=y for 32b guests, but RHEL6 does not. So now with this patch F17 is calling atomic64_read() from pmd_none_or_trans_huge_or_clear_bad(). Drew
>>> Andrew Jones <drjones@redhat.com> 06/07/12 12:47 PM >>> >Ah, I didn''t check to see what Fedora pulled in on top of 3.4. Had I >done that I would have immediately suspected a different patch instead >(mm-pmd_read_atomic-fix-32bit-PAE-pmd-walk-vs-pmd_populate-SMP-race-condition.patch, >upstream commit 26c191788f18). We''ve already encountered one problem >with this patch for RHEL6 and fixed it. The patch F17 has, however, is >already the "fixed" version. Now the difference between RHEL6 and F17 >though is that F17 has CONFIG_TRANSPARENT_HUGEPAGE=y for 32b guests, >but RHEL6 does not. So now with this patch F17 is calling >atomic64_read() from pmd_none_or_trans_huge_or_clear_bad().By itself the use of atomic64_read() on page table entries shouldn''t be a problem though, particularly not if what might get written is a not present entry (while the way atoimc64_read_cx8() works would also allow for values with the low bit set to be pseudo-written, but neither did that appear to be the case in at least one of the two cases where register dumps were provided in your bugzilla, nor should an attempt to write back a value that was already there cause any problem, even if the low bit was set). In any case, the hypervisor log would be interesting to see. Plus - is this perhaps a rather old hypervisor running there (i.e. potentially lacking some fix)? Jan
> Ah, I didn''t check to see what Fedora pulled in on top of 3.4. Had I > done that I would have immediately suspected a different patch instead > (mm-pmd_read_atomic-fix-32bit-PAE-pmd-walk-vs-pmd_populate-SMP-race-condition.patch, > upstream commit 26c191788f18). We''ve already encountered one problem > with this patch for RHEL6 and fixed it. The patch F17 has, however, isWas the fix for the git commit 26c191788f18 (which I understand is in RHEL6) posted?> already the "fixed" version. Now the difference between RHEL6 and F17 > though is that F17 has CONFIG_TRANSPARENT_HUGEPAGE=y for 32b guests, > but RHEL6 does not. So now with this patch F17 is calling > atomic64_read() from pmd_none_or_trans_huge_or_clear_bad(). > > Drew > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
----- Original Message -----> Was the fix for the git commit 26c191788f18 (which I understand is in > RHEL6) > posted?Nope, it got rolled into 26c191788f18 before 26c191788f18 was committed. The fix was to not use __pmd, which is paravirtualized, when doing the final creation of the pmd_t retval. Here it is (the 2f88dfd2 referenced there is a rhel6 commit). Commit 2f88dfd2 introduces read_pmd_atomic which calls __pmd. __pmd is paravirtualized and for xen it calls xen_make_pmd(). xen_make_pmd may return a zero pmd in the case that the pfn''s associated mfn is still invalid. However at the point when read_pmd_atomic is called the mfn can appear invalid when it isn''t, if its creation was lazily deferred. If read_pmd_atomic returns zero in these cases then it can eventually lead to a crash. To resolve the issue we can avoid calling __pmd and just return the open coded compound literal (as native_make_pmd would do) instead. diff --git a/arch/x86/include/asm/pgtable-3level.h b/arch/x86/include/asm/pgtabl index f2d473b..4f325a1 100644 --- a/arch/x86/include/asm/pgtable-3level.h +++ b/arch/x86/include/asm/pgtable-3level.h @@ -73,12 +73,13 @@ static inline pmd_t read_pmd_atomic(pmd_t *pmdp) ret |= ((pmdval_t)*(tmp + 1)) << 32; } - return __pmd(ret); + return (pmd_t) { ret }; } #else /* CONFIG_TRANSPARENT_HUGEPAGE */ static inline pmd_t read_pmd_atomic(pmd_t *pmdp) { - return __pmd(atomic64_read((atomic64_t *)pmdp)); + pmdval_t val = (pmdval_t)atomic64_read((atomic64_t *)pmdp); + return (pmd_t) { val }; } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
----- Original Message -----> >>> Andrew Jones <drjones@redhat.com> 06/07/12 12:47 PM >>> > >Ah, I didn''t check to see what Fedora pulled in on top of 3.4. Had I > >done that I would have immediately suspected a different patch > >instead > >(mm-pmd_read_atomic-fix-32bit-PAE-pmd-walk-vs-pmd_populate-SMP-race-condition.patch, > >upstream commit 26c191788f18). We''ve already encountered one problem > >with this patch for RHEL6 and fixed it. The patch F17 has, however, > >is > >already the "fixed" version. Now the difference between RHEL6 and > >F17 > >though is that F17 has CONFIG_TRANSPARENT_HUGEPAGE=y for 32b guests, > >but RHEL6 does not. So now with this patch F17 is calling > >atomic64_read() from pmd_none_or_trans_huge_or_clear_bad(). > > By itself the use of atomic64_read() on page table entries shouldn''t > be > a problem though, particularly not if what might get written is a not > present entry (while the way atoimc64_read_cx8() works would also > allow for values with the low bit set to be pseudo-written, but > neither > did that appear to be the case in at least one of the two cases where > register dumps were provided in your bugzilla, nor should an attempt > to write back a value that was already there cause any problem, even > if the low bit was set). > > In any case, the hypervisor log would be interesting to see. Plus - > is > this perhaps a rather old hypervisor running there (i.e. potentially > lacking some fix)?I just reproduced this on my own test box, which is running an HV based on older code (it''s RHEL 5.8). Even with ''loglvl=all guest_loglvl=all'' I didn''t get anything logged in ''xm dmesg''. Drew
----- Original Message -----> > In any case, the hypervisor log would be interesting to see. Plus - > > is > > this perhaps a rather old hypervisor running there (i.e. > > potentially > > lacking some fix)? > > I just reproduced this on my own test box, which is running an HV > based on older code (it''s RHEL 5.8). Even with > ''loglvl=all guest_loglvl=all'' I didn''t get anything logged in > ''xm dmesg''.I don''t have an later hypervisor setup for testing, but it''d be nice to know if guests work on them. If so, then maybe RHEL5 and EC2 need to look at Xen c/s 17498 and/or others. Drew
>>> Andrew Jones <drjones@redhat.com> 06/07/12 5:44 PM >>> >> > In any case, the hypervisor log would be interesting to see. Plus - >> > is >> > this perhaps a rather old hypervisor running there (i.e. >> > potentially >> > lacking some fix)? >> >> I just reproduced this on my own test box, which is running an HV >> based on older code (it''s RHEL 5.8). Even with >> ''loglvl=all guest_loglvl=all'' I didn''t get anything logged in >> ''xm dmesg''. > >I don''t have an later hypervisor setup for testing, but it''d >be nice to know if guests work on them. If so, then maybe RHEL5 >and EC2 need to look at Xen c/s 17498 and/or others.Is it perhaps the really old 3.0.4 based one, and lacking a backport of 14041:b010e556fe2c? In that case the comparison part of the cmpxchg8b emulation failing would apparently be treated equally to e.g. a fault having occurred during the emulation (i.e. the original page fault would get forwarded rather than just EFLAGS.ZF getting set). Jan
>>> "Jan Beulich" <jbeulich@suse.com> 06/07/12 8:50 PM >>> >Is it perhaps the really old 3.0.4 based one, and lacking a backport of >14041:b010e556fe2c? In that case the comparison part of the >cmpxchg8b emulation failing would apparently be treated equally to >e.g. a fault having occurred during the emulation (i.e. the original >page fault would get forwarded rather than just EFLAGS.ZF getting >set).Having read the other thread with Andrea''s explanation, that''s unlikely the culprit. He says the cmpxchg8b is being done on a PMD entry, which the ptwr code definitely doesn''t support so far. Jan