thr3ads.net - Xen devel - Re: xen: oops at atomic64_read

If this information is useful, please help other people find it:
Share via:

Jonathan Nieder

2012-Jun-07 07:33 UTC

Re: xen: oops at atomic64_read_cx8+0x4

Sergio Gelato wrote[1]:
>                          That 3.4.1-1~experimental.1 build
> (3.4-trunk-686-pae #1 SMP Wed Jun 6 15:11:31 UTC 2012 i686 GNU/Linux)
> is even less well-behaved under Xen: I''m getting a kernel OOPS at
> EIP: [<c1168e54>] atomic64_read_cx8+0x4/0xc SS:ESP e021:ca853c6c
> The top of the trace message unfortunately scrolled off the console before
I
> could see it, and the message doesn''t have time to make it to
syslog (either
> local or remote).
[...]> Non-Xen boots proceed normally.
Yeah, apparently[2] that''s caused by

	commit 26c191788f18
	Author: Andrea Arcangeli <aarcange@redhat.com>
	Date:   Tue May 29 15:06:49 2012 -0700

	    mm: pmd_read_atomic: fix 32bit PAE pmd walk vs pmd_populate SMP race
condition

which was also included in Debian kernel 3.2.19-1.

[1] http://bugs.debian.org/676360
[2] https://bugzilla.redhat.com/show_bug.cgi?id=829016#c4

Andrea Arcangeli

2012-Jun-07 10:33 UTC

head link

Re: xen: oops at atomic64_read_cx8+0x4

On Thu, Jun 07, 2012 at 02:33:33AM -0500, Jonathan Nieder
wrote:> Sergio Gelato wrote[1]:
> 
> >                          That 3.4.1-1~experimental.1 build
> > (3.4-trunk-686-pae #1 SMP Wed Jun 6 15:11:31 UTC 2012 i686 GNU/Linux)
> > is even less well-behaved under Xen: I''m getting a kernel
OOPS at
> > EIP: [<c1168e54>] atomic64_read_cx8+0x4/0xc SS:ESP e021:ca853c6c
> > The top of the trace message unfortunately scrolled off the console
before I
> > could see it, and the message doesn''t have time to make it to
syslog (either
> > local or remote).
> [...]
> > Non-Xen boots proceed normally.
> 
> Yeah, apparently[2] that''s caused by
> 
> 	commit 26c191788f18
> 	Author: Andrea Arcangeli <aarcange@redhat.com>
> 	Date:   Tue May 29 15:06:49 2012 -0700
> 
> 	    mm: pmd_read_atomic: fix 32bit PAE pmd walk vs pmd_populate SMP race
condition
> 
> which was also included in Debian kernel 3.2.19-1.
> 
> [1] http://bugs.debian.org/676360
> [2] https://bugzilla.redhat.com/show_bug.cgi?id=829016#c4
Oops, sorry I didn''t imagine atomic64_read on a pmd would trip.

Unfortunately to support pagetable walking with mmap_sem hold for
reading, we need an atomic read on 32bit PAE if
CONFIG_TRANSPARENT_HUGEPAGE=y.

The only case requiring this is 32bit PAE with
CONFIG_TRANSPARENT_HUGEPAGE=y at build time. If you set
CONFIG_TRANSPARENT_HUGEPAGE=n temporarily you should be able to work
around this as I optimized the code in a way to avoid an expensive
cmpxchg8b.

I guess if Xen can''t be updated to handle an atomic64_read on a pmd in
the guest, we can add a pmd_read paravirt op? Or if we don''t want to
break the paravirt interface a loop like gup_fast with irq disabled
should also work but looping + local_irq_disable()/enable() sounded
worse and more complex than a atomic64_read (gup fast already disables
irqs because it doesn''t hold the mmap_sem so it''s a different
cost
looping there). AFIK Xen disables THP during boot, so a check on THP
being enabled and falling back in the THP=n version of
pmd_read_atomic, would also be safe, but it''s not so nice to do it
with a runtime check.

Thanks,
Andrea

Sergio Gelato

2012-Jun-07 12:30 UTC

head link

Bug#676360: xen: oops at atomic64_read_cx8+0x4

* Andrea Arcangeli [2012-06-07 12:33:55 +0200]:> I guess if Xen can''t be updated to handle an atomic64_read on a
pmd in
> the guest, 
I''m not sure if it makes a difference, but just in case: I observed the
problem in a dom0.
>            we can add a pmd_read paravirt op? Or if we don''t want
to
> break the paravirt interface a loop like gup_fast with irq disabled
> should also work but looping + local_irq_disable()/enable() sounded
> worse and more complex than a atomic64_read (gup fast already disables
> irqs because it doesn''t hold the mmap_sem so it''s a
different cost
> looping there). AFIK Xen disables THP during boot, so a check on THP
> being enabled and falling back in the THP=n version of
> pmd_read_atomic, would also be safe, but it''s not so nice to do it
> with a runtime check.
> 
> Thanks,
> Andrea

Konrad Rzeszutek Wilk

2012-Jun-07 15:56 UTC

head link

Bug#676360: [Xen-devel] xen: oops at atomic64_read_cx8+0x4

On Thu, Jun 07, 2012 at 12:33:55PM +0200, Andrea Arcangeli
wrote:> On Thu, Jun 07, 2012 at 02:33:33AM -0500, Jonathan Nieder wrote:
> > Sergio Gelato wrote[1]:
> > 
> > >                          That 3.4.1-1~experimental.1 build
> > > (3.4-trunk-686-pae #1 SMP Wed Jun 6 15:11:31 UTC 2012 i686
GNU/Linux)
> > > is even less well-behaved under Xen: I''m getting a
kernel OOPS at
> > > EIP: [<c1168e54>] atomic64_read_cx8+0x4/0xc SS:ESP
e021:ca853c6c
> > > The top of the trace message unfortunately scrolled off the
console before I
> > > could see it, and the message doesn''t have time to make
it to syslog (either
> > > local or remote).
> > [...]
> > > Non-Xen boots proceed normally.
> > 
> > Yeah, apparently[2] that''s caused by
> > 
> > 	commit 26c191788f18
> > 	Author: Andrea Arcangeli <aarcange@redhat.com>
> > 	Date:   Tue May 29 15:06:49 2012 -0700
> > 
> > 	    mm: pmd_read_atomic: fix 32bit PAE pmd walk vs pmd_populate SMP
race condition
> > 
> > which was also included in Debian kernel 3.2.19-1.
> > 
> > [1] http://bugs.debian.org/676360
> > [2] https://bugzilla.redhat.com/show_bug.cgi?id=829016#c4
> 
> Oops, sorry I didn''t imagine atomic64_read on a pmd would trip.
Hmm, so it looks like it used to do this:

 pmd = pmd_offset(pud, addr);
 ..
 pmd_t pmdval = *pmd;

but now you do:
 pmd_t ret = (pmd_val)((u32)*tmp);
 ret |= (*tmp+1) << 32;

which would read the low first and then the high one next
(or is the other way around?).  The ''pmd_offset'' beforehand
manufactures the pmd using the PFN to MFN lookup tree (so
that there aren''t any hypercall or traps).

Hm, with your change, you are still looking at the ''pmd''
and its contents, except that you are reading the low and
then the high part. Why that would trip the hypervisor
is not clear to me. Perhaps in the past it only read the
low bits?

If there was Xen hypervisor log that might give some ideas. Is
there any chance that the Linode folks could send that over?
> 
> Unfortunately to support pagetable walking with mmap_sem hold for
> reading, we need an atomic read on 32bit PAE if
> CONFIG_TRANSPARENT_HUGEPAGE=y.
> 
> The only case requiring this is 32bit PAE with
> CONFIG_TRANSPARENT_HUGEPAGE=y at build time. If you set
> CONFIG_TRANSPARENT_HUGEPAGE=n temporarily you should be able to work
> around this as I optimized the code in a way to avoid an expensive
> cmpxchg8b.
Ah, by just skipping the thing if the low bits are zero.> 
> I guess if Xen can''t be updated to handle an atomic64_read on a
pmd in
> the guest, we can add a pmd_read paravirt op? Or if we don''t want
to
> break the paravirt interface a loop like gup_fast with irq disabled
> should also work but looping + local_irq_disable()/enable() sounded
> worse and more complex than a atomic64_read (gup fast already disables
> irqs because it doesn''t hold the mmap_sem so it''s a
different cost
I am not really sure what is at foot. It sounds like the hypervisor
didn''t like somebody reading the high and low bit, but isn''t
the
pmdval_t still 64-bit ? So I would have thought this would
have been triggered? Or is that the code on pmd_val never actually
read the high bits (before your addition to the atomic_read?)?
> looping there). AFIK Xen disables THP during boot, so a check on THP
> being enabled and falling back in the THP=n version of
> pmd_read_atomic, would also be safe, but it''s not so nice to do it
> with a runtime check.
The thing is that I did install a 32-bit PAE guest (a Fedora) on a Fedora
17 dom0. So it looks like this is reading high part is fixed on the newer
hypervisors, but now with the older ones. And the older one is Amazon EC2
so some .. hack to workaround older hypervisors could be added.

Andrea Arcangeli

2012-Jun-07 16:17 UTC

head link

Re: xen: oops at atomic64_read_cx8+0x4

Hi,

On Thu, Jun 07, 2012 at 11:56:47AM -0400, Konrad Rzeszutek Wilk
wrote:> then the high part. Why that would trip the hypervisor
> is not clear to me. Perhaps in the past it only read the
That is the CONFIG_TRANSPARENT_HUGEPAGE=n case and in fact it doesn''t
trip the hypervisor. That was tested too, it should work fine.

The problem is with the atomic64_read version, that one uses cmpxchg8b
to read the contents of the pmdp.
> Ah, by just skipping the thing if the low bits are zero.
Yep.
> didn''t like somebody reading the high and low bit, but
isn''t the
> pmdval_t still 64-bit ? So I would have thought this would
The pmd format is unchanged, that''s hardware.
> The thing is that I did install a 32-bit PAE guest (a Fedora) on a Fedora
> 17 dom0. So it looks like this is reading high part is fixed on the newer
> hypervisors, but now with the older ones. And the older one is Amazon EC2
> so some .. hack to workaround older hypervisors could be added.
The insn oopsing is cmpxchg8b and it''s not reading the low/high part
in two separate insn but reading it in a single insn, which means the
kernel oopsing was built with CONFIG_TRANSPARENT_HUGEPAGE=y.

Jan Beulich

2012-Jun-07 19:50 UTC

head link

Re: xen: oops at atomic64_read_cx8+0x4

>>> Andrea Arcangeli <aarcange@redhat.com> 06/07/12 12:35 PM
>>>
>Oops, sorry I didn''t imagine atomic64_read on a pmd would trip.
The problem really is that the cmpxchg8b is a write, and hence won''t go
through without faulting on a write protected page (which all page table
pages necessarily are).
>I guess if Xen can''t be updated to handle an atomic64_read on a pmd
in
>the guest, we can add a pmd_read paravirt op?
Xen could certainly be updated to treat cmpxchg8b on a PMD entry as a
simple 8-byte read when compared-to and to-be-stored values are
identical, but the problem would be that hypervisors in the field
wouldn''t
necessarily get updated.
>Or if we don''t want to
>break the paravirt interface a loop like gup_fast with irq disabled
>should also work but looping + local_irq_disable()/enable() sounded
>worse and more complex than a atomic64_read (gup fast already disables
>irqs because it doesn''t hold the mmap_sem so it''s a
different cost
>looping there). AFIK Xen disables THP during boot, so a check on THP
>being enabled and falling back in the THP=n version of
>pmd_read_atomic, would also be safe, but it''s not so nice to do it
>with a runtime check.
That would probably nevertheless be the most viable option. If
performance critical(?), maybe this could be hidden with something
like the alternative instruction or paravirt patching mechanisms?

Jan

Xen devel - Jun 2012 - Re: xen: oops at atomic64_read_cx8+0x4

Re: xen: oops at atomic64_read_cx8+0x4

Re: xen: oops at atomic64_read_cx8+0x4

Bug#676360: xen: oops at atomic64_read_cx8+0x4

Bug#676360: [Xen-devel] xen: oops at atomic64_read_cx8+0x4

Re: xen: oops at atomic64_read_cx8+0x4

Re: xen: oops at atomic64_read_cx8+0x4