thr3ads.net - Xen devel - [Xen-devel] 3.0.0 Xen pv guest - BUG: Unable to handle kernel paging request in swap_count

If this information is useful, please help other people find it:
Share via:

Peter Sandin

2011-Aug-26 17:42 UTC

[Xen-devel] 3.0.0 Xen pv guest - BUG: Unable to handle kernel paging request in swap_count_continued

We have a number of virtualized Linux instances running under Xen that have been
hitting a bug. This issue first cropped up in the 2.6.38 release and
we''re still seeing cases with the 3.0.0 kernel. On average
we''re receiving reports of about one instance per day crashing due to
this issue. The affected 2.6.39 and 3.0.0 kernels are vanilla kernel.org
kernels, the .config file and binary for the affected 3.0.0 kernel can be found
at:

http://thesandins.net/xen/3.0.0/

This issue has happened on multiple separate physical machine and different
distributions, so it''s not a hardware or distribution specific issue.
The Apache httpd server seems to be the most likely process to trigger this
issue. Someone else opened a bug with Apache about this issue, but that bug was
closed as not being an Apache issue, that report can be found at:

https://issues.apache.org/bugzilla/show_bug.cgi?id=51325

We inquired about this issue with the Xen-devel list when we first ran in to it,
that thread can be found at:

http://lists.xensource.com/archives/html/xen-devel/2011-04/msg00230.html

If anyone has any ideas on why this is happening and what we need to do to
prevent it from happening in the future please let us know. The issue has only
manifested in customer instances so we don''t have access to other logs
from these incidents, however if anyone has suggestions on tests or methods for
replicating this issue I''d be glad to give those a try on a test
instance. The console output from the error is included below:

BUG: unable to handle kernel paging request at f57a63be
IP: [<c01ab854>] swap_count_continued+0x104/0x180
*pdpt = 0000000029d01027 *pde = 00000000008d4067 *pte = 0000000000000000 
Oops: 0000 [#1] SMP 
Modules linked in:

Pid: 2206, comm: apache2 Not tainted 3.0.0-linode35 #1  
EIP: 0061:[<c01ab854>] EFLAGS: 00010246 CPU: 1
EIP is at swap_count_continued+0x104/0x180
EAX: f57a63be EBX: eb9fc4e0 ECX: f57a6000 EDX: 000000be
ESI: ed3d7cc0 EDI: 000000be EBP: 000003be ESP: ea3bddb0
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
Process apache2 (pid: 2206, ti=ea3bc000 task=eaca6410 task.ti=ea3bc000)
Stack:
 ea76dcc0 000013be 000000be ffffffea c01abe22 35a34067 c01040fb 0002a5cb
 40f40067 000013be ea5cb2e0 000277c0 bfc5c000 c01abee4 00000000 c01a068b
 bfc40000 80000007 00000000 00000000 000013be 0000001c e7f402e0 00100173
Call Trace:
 [<c01abe22>] ? __swap_duplicate+0xc2/0x160
 [<c01040fb>] ? pte_mfn_to_pfn+0x8b/0xe0
 [<c01abee4>] ? swap_duplicate+0x14/0x40
 [<c01a068b>] ? copy_pte_range+0x45b/0x500
 [<c01a08c5>] ? copy_page_range+0x195/0x200
 [<c0132756>] ? dup_mmap+0x1c6/0x2c0
 [<c0132b88>] ? dup_mm+0xa8/0x130
 [<c01335fa>] ? copy_process+0x98a/0xb30
 [<c01337ef>] ? do_fork+0x4f/0x280
 [<c010f780>] ? sys_clone+0x30/0x40
 [<c06c000d>] ? ptregs_clone+0x15/0x48
 [<c06bf6f1>] ? syscall_call+0x7/0xb
 [<c06b0000>] ? sctp_backlog_rcv+0xf0/0x100
Code: de 75 dc b8 01 00 00 00 5b 5e 5f 5d c3 66 90 e8 d3 7c f7 ff 8b 5b 18 83 eb
18 39 de 0f 84 7f 00 00 00 89 d8 e8 fe 7e f7 ff 01 e8 <0f> b
6 10 80 fa ff 74 dc 80 fa 7f 74 28 83 c2 01 88 10 eb 0c 89 
EIP: [<c01ab854>] swap_count_continued+0x104/0x180 SS:ESP 0069:ea3bddb0
CR2: 00000000f57a63be
---[ end trace aa46a9340a0a4bc6 ]---
note: apache2[2206] exited with preempt_count 1
BUG: scheduling while atomic: apache2/2206/0x00000001
Modules linked in:
Pid: 2206, comm: apache2 Tainted: G      D     3.0.0-linode35 #1
Call Trace:
 [<c06bda6a>] ? schedule+0x60a/0x6f0
 [<c0106404>] ? check_events+0x8/0xc
 [<c01063fb>] ? xen_restore_fl_direct_reloc+0x4/0x4
 [<c01775fe>] ? rcu_enter_nohz+0x2e/0xb0
 [<c0139921>] ? irq_exit+0x31/0xa0
 [<c0477bed>] ? xen_evtchn_do_upcall+0x1d/0x30
 [<c0101227>] ? hypercall_page+0x227/0x1000
 [<c0105c27>] ? xen_force_evtchn_callback+0x17/0x30
 [<c0106404>] ? check_events+0x8/0xc
 [<c06bf28d>] ? rwsem_down_failed_common+0x9d/0x110
 [<c06bf353>] ? call_rwsem_down_read_failed+0x7/0xc
 [<c06bea6a>] ? down_read+0xa/0x10
 [<c01683f5>] ? acct_collect+0x35/0x160
 [<c0137fbd>] ? do_exit+0x27d/0x350
 [<c011f170>] ? mm_fault_error+0x130/0x130
 [<c010b7e1>] ? oops_end+0x71/0xa0
 [<c011ef8f>] ? bad_area_nosemaphore+0xf/0x20
 [<c011f3bf>] ? do_page_fault+0x24f/0x3a0
 [<c0105c27>] ? xen_force_evtchn_callback+0x17/0x30
 [<c0106404>] ? check_events+0x8/0xc
 [<c01063fb>] ? xen_restore_fl_direct_reloc+0x4/0x4
 [<c011f170>] ? mm_fault_error+0x130/0x130
 [<c06bfc66>] ? error_code+0x5a/0x60
 [<c012007b>] ? try_preserve_large_page+0x7b/0x340
 [<c011f170>] ? mm_fault_error+0x130/0x130
 [<c01ab854>] ? swap_count_continued+0x104/0x180
 [<c01abe22>] ? __swap_duplicate+0xc2/0x160
 [<c01040fb>] ? pte_mfn_to_pfn+0x8b/0xe0
 [<c01abee4>] ? swap_duplicate+0x14/0x40
 [<c01a068b>] ? copy_pte_range+0x45b/0x500
 [<c01a08c5>] ? copy_page_range+0x195/0x200
 [<c0132756>] ? dup_mmap+0x1c6/0x2c0
 [<c0132b88>] ? dup_mm+0xa8/0x130
 [<c01335fa>] ? copy_process+0x98a/0xb30
 [<c01337ef>] ? do_fork+0x4f/0x280
 [<c010f780>] ? sys_clone+0x30/0x40
 [<c06c000d>] ? ptregs_clone+0x15/0x48
 [<c06bf6f1>] ? syscall_call+0x7/0xb
 [<c06b0000>] ? sctp_backlog_rcv+0xf0/0x100
INFO: rcu_sched_state detected stall on CPU 2 (t=60000 jiffies)

Regards,
Peter
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christopher S. Aker

2011-Aug-29 14:39 UTC

head link

[Xen-devel] Re: kernel BUG at mm/swapfile.c:2527! [was 3.0.0 Xen pv guest - BUG: Unable to handle]

And another related from 2.6.39:

------------[ cut here ]------------
kernel BUG at mm/swapfile.c:2527!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu3/topology/core_id
Modules linked in:

Pid: 17680, comm: postgres Tainted: G    B       2.6.39-linode33 #3
EIP: 0061:[<c01b4b26>] EFLAGS: 00210246 CPU: 0
EIP is at swap_count_continued+0x176/0x180
EAX: f57bac57 EBX: eba2c200 ECX: f57ba000 EDX: 00000000
ESI: ebfd7c20 EDI: 00000080 EBP: 00000c57 ESP: c670fe0c
  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
Process postgres (pid: 17680, ti=c670e000 task=e93415d0 task.ti=c670e000)
Stack:
  e9e3a340 00013c57 ee15fc57 00000000 c01b60b1 c0731000 c06982d5 401b4b73
  ceebc988 e9e3a340 00013c57 00000000 c01b60f7 ceebc988 b7731000 c670ff04
  c01a7183 4646e045 80000005 e62ce348 28999063 c0103fc5 7f662000 00278ae0
Call Trace:
  [<c01b60b1>] ? swap_entry_free+0x121/0x140
  [<c06982d5>] ? _raw_spin_lock+0x5/0x10
  [<c01b60f7>] ? free_swap_and_cache+0x27/0xd0
  [<c01a7183>] ? zap_pte_range+0x1b3/0x480
  [<c0103fc5>] ? pte_pfn_to_mfn+0xb5/0xd0
  [<c01a7568>] ? unmap_page_range+0x118/0x1a0
  [<c0105b17>] ? xen_force_evtchn_callback+0x17/0x30
  [<c01a771b>] ? unmap_vmas+0x12b/0x1e0
  [<c01aba01>] ? exit_mmap+0x91/0x140
  [<c0134b2b>] ? mmput+0x2b/0xc0
  [<c01386ba>] ? exit_mm+0xfa/0x130
  [<c0698330>] ? _raw_spin_lock_irq+0x10/0x20
  [<c013a2b5>] ? do_exit+0x125/0x360
  [<c0105b17>] ? xen_force_evtchn_callback+0x17/0x30
  [<c013a52c>] ? do_group_exit+0x3c/0xa0
  [<c013a5a1>] ? sys_exit_group+0x11/0x20
  [<c0698631>] ? syscall_call+0x7/0xb
Code: ff 89 d8 e8 7d ec f6 ff 01 e8 8d 76 00 c6 00 00 ba 01 00 00 00 eb 
b2 89 f8 3c 80 0f 94 c0
e9 b9 fe ff ff 0f 0b eb fe 0f 0b eb fe <0f> 0b eb fe 0f 0b eb fe 66 90 
53 31 db 83 ec 0c 85 c0 7
4 39 89
EIP: [<c01b4b26>] swap_count_continued+0x176/0x180 SS:ESP 0069:c670fe0c
---[ end trace c2dcb41c89b0a9f7 ]---

Thanks,
-Chris


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Aug-29 15:07 UTC

head link

Re: [Xen-devel] Re: kernel BUG at mm/swapfile.c:2527! [was 3.0.0 Xen pv guest - BUG: Unable to handle]

On Mon, Aug 29, 2011 at 10:39:25AM -0400, Christopher S. Aker
wrote:> And another related from 2.6.39:
I just don''t get how you are the only person seeing this - and you have
been seeing this from 2.6.32... The dom0 you have - is it printing at least
something when this happens (or before)? Or the Xen hypervisor:
maybe a message about L1 pages not found?

And the dom0 is 2.6.18, right? - Did you update it (I know that the Red Hat guys
have been updating a couple of things on it).

Any chance I can get access to your setup and try to work with somebody
to reproduce this?
> 
> ------------[ cut here ]------------
> kernel BUG at mm/swapfile.c:2527!
> invalid opcode: 0000 [#1] SMP
> last sysfs file: /sys/devices/system/cpu/cpu3/topology/core_id
> Modules linked in:
> 
> Pid: 17680, comm: postgres Tainted: G    B       2.6.39-linode33 #3
> EIP: 0061:[<c01b4b26>] EFLAGS: 00210246 CPU: 0
> EIP is at swap_count_continued+0x176/0x180
> EAX: f57bac57 EBX: eba2c200 ECX: f57ba000 EDX: 00000000
> ESI: ebfd7c20 EDI: 00000080 EBP: 00000c57 ESP: c670fe0c
>  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
> Process postgres (pid: 17680, ti=c670e000 task=e93415d0 task.ti=c670e000)
> Stack:
>  e9e3a340 00013c57 ee15fc57 00000000 c01b60b1 c0731000 c06982d5 401b4b73
>  ceebc988 e9e3a340 00013c57 00000000 c01b60f7 ceebc988 b7731000 c670ff04
>  c01a7183 4646e045 80000005 e62ce348 28999063 c0103fc5 7f662000 00278ae0
> Call Trace:
>  [<c01b60b1>] ? swap_entry_free+0x121/0x140
>  [<c06982d5>] ? _raw_spin_lock+0x5/0x10
>  [<c01b60f7>] ? free_swap_and_cache+0x27/0xd0
>  [<c01a7183>] ? zap_pte_range+0x1b3/0x480
>  [<c0103fc5>] ? pte_pfn_to_mfn+0xb5/0xd0
>  [<c01a7568>] ? unmap_page_range+0x118/0x1a0
>  [<c0105b17>] ? xen_force_evtchn_callback+0x17/0x30
>  [<c01a771b>] ? unmap_vmas+0x12b/0x1e0
>  [<c01aba01>] ? exit_mmap+0x91/0x140
>  [<c0134b2b>] ? mmput+0x2b/0xc0
>  [<c01386ba>] ? exit_mm+0xfa/0x130
>  [<c0698330>] ? _raw_spin_lock_irq+0x10/0x20
>  [<c013a2b5>] ? do_exit+0x125/0x360
>  [<c0105b17>] ? xen_force_evtchn_callback+0x17/0x30
>  [<c013a52c>] ? do_group_exit+0x3c/0xa0
>  [<c013a5a1>] ? sys_exit_group+0x11/0x20
>  [<c0698631>] ? syscall_call+0x7/0xb
> Code: ff 89 d8 e8 7d ec f6 ff 01 e8 8d 76 00 c6 00 00 ba 01 00 00 00
> eb b2 89 f8 3c 80 0f 94 c0
> e9 b9 fe ff ff 0f 0b eb fe 0f 0b eb fe <0f> 0b eb fe 0f 0b eb fe 66
> 90 53 31 db 83 ec 0c 85 c0 7
> 4 39 89
> EIP: [<c01b4b26>] swap_count_continued+0x176/0x180 SS:ESP
0069:c670fe0c
> ---[ end trace c2dcb41c89b0a9f7 ]---
> 
> Thanks,
> -Chris
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Campbell

2011-Aug-30 11:45 UTC

head link

Re: [Xen-devel] Re: kernel BUG at mm/swapfile.c:2527! [was 3.0.0 Xen pv guest - BUG: Unable to handle]

On Mon, 2011-08-29 at 16:07 +0100, Konrad Rzeszutek Wilk wrote:
> On Mon, Aug 29, 2011 at 10:39:25AM -0400, Christopher S. Aker wrote:
> > And another related from 2.6.39:
> 
> I just don''t get how you are the only person seeing this - and you
have
> been seeing this from 2.6.32... The dom0 you have - is it printing at least
> something when this happens (or before)? Or the Xen hypervisor:
> maybe a message about L1 pages not found?
It''d be worth ensuring that the requires guest_loglvl and loglvl
parameters to allow this is in place on the hypervisor command line. 

Are these reports against totally unpatched kernel.org domU kernels?
> And the dom0 is 2.6.18, right? - Did you update it (I know that the Red Hat
guys
> have been updating a couple of things on it).
> 
> Any chance I can get access to your setup and try to work with somebody
> to reproduce this?
> 
> > 
> > ------------[ cut here ]------------
> > kernel BUG at mm/swapfile.c:2527!
This is "BUG_ON(*map == 0);" which is subtly different from the error
in
the original post from Peter which was a "unable to handle kernel paging
request" at EIP c01ab854, with a pagetable walk showing PTE==0.

I''d bet the dereference corresponds to the "*map" in that
same place but
Peter can you convert that address to a line of code please?

map came from a kmap_atomic() not far before this point so it appears
that it is mapping the wrong page (so *map != 0) and/or mapping a
non-existent page (leading to the fault).

Warning, wild speculation follows...

Is it possible that we are in lazy paravirt mode at this point such that
the mapping hasn''t really occurred yet, leaving either nothing or the
previous mapping? (would the current paravirt lazy state make a useful
general addition to the panic message?)

The definition of kmap_atomic is a bit confusing: 
        /*
         * Make both: kmap_atomic(page, idx) and kmap_atomic(page) work.
         */
        #define kmap_atomic(page, args...) __kmap_atomic(page)
but it appears that the KM_USER0 at the callsite is ignored and instead
we end up using the __kmap_atomic_idx stuff (fine). I wondered if it is
possible we are overflowing the number of slots but there is an explicit
BUG_ON for that case in kmap_atomic_idx_push. Oh, wait, that''s iff
CONFIG_DEBUG_HIGHMEM, which appears to not be enabled. I think it would
be worth trying, it doesn''t look to have too much overhead. 

Another possibility which springs to mind is the pfn->mfn laundering
going wrong. Perhaps as a skanky debug hack remembering the last pte
val, address, mfn, pfn etc and dumping them on error would give a hint?
I wouldn''t expect that to result in a non-present mapping though,
rather
I would expect either the wrong thing or the guest to be killed by the
hypervisor

Would it be worth doing a __get_user(map) (or some other "safe"
pointer
dereference) right after the mapping is established, catching a fault if
one occurs so we can dump some additional debug in that case? I''m not
entirely sure what to suggest dumping though.

Ian.
> > invalid opcode: 0000 [#1] SMP
> > last sysfs file: /sys/devices/system/cpu/cpu3/topology/core_id
> > Modules linked in:
> > 
> > Pid: 17680, comm: postgres Tainted: G    B       2.6.39-linode33 #3
> > EIP: 0061:[<c01b4b26>] EFLAGS: 00210246 CPU: 0
> > EIP is at swap_count_continued+0x176/0x180
> > EAX: f57bac57 EBX: eba2c200 ECX: f57ba000 EDX: 00000000
> > ESI: ebfd7c20 EDI: 00000080 EBP: 00000c57 ESP: c670fe0c
> >  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
> > Process postgres (pid: 17680, ti=c670e000 task=e93415d0
task.ti=c670e000)
> > Stack:
> >  e9e3a340 00013c57 ee15fc57 00000000 c01b60b1 c0731000 c06982d5
401b4b73
> >  ceebc988 e9e3a340 00013c57 00000000 c01b60f7 ceebc988 b7731000
c670ff04
> >  c01a7183 4646e045 80000005 e62ce348 28999063 c0103fc5 7f662000
00278ae0
> > Call Trace:
> >  [<c01b60b1>] ? swap_entry_free+0x121/0x140
> >  [<c06982d5>] ? _raw_spin_lock+0x5/0x10
> >  [<c01b60f7>] ? free_swap_and_cache+0x27/0xd0
> >  [<c01a7183>] ? zap_pte_range+0x1b3/0x480
> >  [<c0103fc5>] ? pte_pfn_to_mfn+0xb5/0xd0
> >  [<c01a7568>] ? unmap_page_range+0x118/0x1a0
> >  [<c0105b17>] ? xen_force_evtchn_callback+0x17/0x30
> >  [<c01a771b>] ? unmap_vmas+0x12b/0x1e0
> >  [<c01aba01>] ? exit_mmap+0x91/0x140
> >  [<c0134b2b>] ? mmput+0x2b/0xc0
> >  [<c01386ba>] ? exit_mm+0xfa/0x130
> >  [<c0698330>] ? _raw_spin_lock_irq+0x10/0x20
> >  [<c013a2b5>] ? do_exit+0x125/0x360
> >  [<c0105b17>] ? xen_force_evtchn_callback+0x17/0x30
> >  [<c013a52c>] ? do_group_exit+0x3c/0xa0
> >  [<c013a5a1>] ? sys_exit_group+0x11/0x20
> >  [<c0698631>] ? syscall_call+0x7/0xb
> > Code: ff 89 d8 e8 7d ec f6 ff 01 e8 8d 76 00 c6 00 00 ba 01 00 00 00
> > eb b2 89 f8 3c 80 0f 94 c0
> > e9 b9 fe ff ff 0f 0b eb fe 0f 0b eb fe <0f> 0b eb fe 0f 0b eb fe
66
> > 90 53 31 db 83 ec 0c 85 c0 7
> > 4 39 89
> > EIP: [<c01b4b26>] swap_count_continued+0x176/0x180 SS:ESP
0069:c670fe0c
> > ---[ end trace c2dcb41c89b0a9f7 ]---
> > 
> > Thanks,
> > -Chris
> > 
> > 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christopher S. Aker

2011-Aug-31 20:43 UTC

head link

Re: [Xen-devel] Re: kernel BUG at mm/swapfile.c:2527! [was 3.0.0 Xen pv guest - BUG: Unable to handle]

On 8/30/11 7:45 AM, Ian Campbell wrote:> On Mon, 2011-08-29 at 16:07 +0100, Konrad Rzeszutek Wilk wrote:
>> I just don''t get how you are the only person seeing this - and
you have
>> been seeing this from 2.6.32... The dom0 you have - is it printing at
least
>> something when this happens (or before)? Or the Xen hypervisor:
>> maybe a message about L1 pages not found?
>
> It''d be worth ensuring that the requires guest_loglvl and loglvl
> parameters to allow this is in place on the hypervisor command line.
Nothing in Xen''s output correlates at the time of the domUs crashing, 
however we don''t have guest log levels turned up.
> Are these reports against totally unpatched kernel.org domU kernels?
Yes - unpatched domUs.
>> And the dom0 is 2.6.18, right? - Did you update it (I know that the Red
Hat guys
>> have been updating a couple of things on it).
2.6.18 from xenbits, all around changeset 931 vintage.
>> Any chance I can get access to your setup and try to work with somebody
>> to reproduce this?
Konrad, that''s a fantastic offer and much appreciated.  To make this 
happen I''ll need to find a volunteer customer or two whose activity 
reproduces this problem and who can deal with some downtime -- then 
quarantine them off to an environment you can access.  I''ll send out
the
word...
>>> ------------[ cut here ]------------
>>> kernel BUG at mm/swapfile.c:2527!
>
> This is "BUG_ON(*map == 0);" which is subtly different from the
error in
> the original post from Peter which was a "unable to handle kernel
paging
> request" at EIP c01ab854, with a pagetable walk showing PTE==0.
>
> I''d bet the dereference corresponds to the "*map" in
that same place but
> Peter can you convert that address to a line of code please?
root@build:/build/xen/domU/i386/3.0.0-linode35-debug# gdb vmlinux
GNU gdb (GDB) 7.1-ubuntu (...snip...)
Reading symbols from 
/build/xen/domU/i386/3.0.0-linode35-debug/vmlinux...done.
(gdb) list *0xc01ab854
0xc01ab854 is in swap_count_continued (mm/swapfile.c:2493).
2488
2489            if (count == (SWAP_MAP_MAX | COUNT_CONTINUED)) { /* 
incrementing */
2490                    /*
2491                     * Think of how you add 1 to 999
2492                     */
2493                    while (*map == (SWAP_CONT_MAX | COUNT_CONTINUED)) {
2494                            kunmap_atomic(map, KM_USER0);
2495                            page = list_entry(page->lru.next, struct 
page, lru);
2496                            BUG_ON(page == head);
2497                            map = kmap_atomic(page, KM_USER0) + offset;
(gdb)
> map came from a kmap_atomic() not far before this point so it appears
> that it is mapping the wrong page (so *map != 0) and/or mapping a
> non-existent page (leading to the fault).
>
> Warning, wild speculation follows...
>
> Is it possible that we are in lazy paravirt mode at this point such that
> the mapping hasn''t really occurred yet, leaving either nothing or
the
> previous mapping? (would the current paravirt lazy state make a useful
> general addition to the panic message?)
>
> The definition of kmap_atomic is a bit confusing:
>          /*
>           * Make both: kmap_atomic(page, idx) and kmap_atomic(page) work.
>           */
>          #define kmap_atomic(page, args...) __kmap_atomic(page)
> but it appears that the KM_USER0 at the callsite is ignored and instead
> we end up using the __kmap_atomic_idx stuff (fine). I wondered if it is
> possible we are overflowing the number of slots but there is an explicit
> BUG_ON for that case in kmap_atomic_idx_push. Oh, wait, that''s iff
> CONFIG_DEBUG_HIGHMEM, which appears to not be enabled. I think it would
> be worth trying, it doesn''t look to have too much overhead.
My next build will be sure to include CONFIG_DEBUG_HIGHMEM. Maybe 
that''ll lead us to a discovery.
> Another possibility which springs to mind is the pfn->mfn laundering
> going wrong. Perhaps as a skanky debug hack remembering the last pte
> val, address, mfn, pfn etc and dumping them on error would give a hint?
> I wouldn''t expect that to result in a non-present mapping though,
rather
> I would expect either the wrong thing or the guest to be killed by the
> hypervisor
>
> Would it be worth doing a __get_user(map) (or some other "safe"
pointer
> dereference) right after the mapping is established, catching a fault if
> one occurs so we can dump some additional debug in that case? I''m
not
> entirely sure what to suggest dumping though.
>
> Ian.
>
>>> invalid opcode: 0000 [#1] SMP
>>> last sysfs file: /sys/devices/system/cpu/cpu3/topology/core_id
>>> Modules linked in:
>>>
>>> Pid: 17680, comm: postgres Tainted: G    B       2.6.39-linode33 #3
>>> EIP: 0061:[<c01b4b26>] EFLAGS: 00210246 CPU: 0
>>> EIP is at swap_count_continued+0x176/0x180
>>> EAX: f57bac57 EBX: eba2c200 ECX: f57ba000 EDX: 00000000
>>> ESI: ebfd7c20 EDI: 00000080 EBP: 00000c57 ESP: c670fe0c
>>>   DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
>>> Process postgres (pid: 17680, ti=c670e000 task=e93415d0
task.ti=c670e000)
>>> Stack:
>>>   e9e3a340 00013c57 ee15fc57 00000000 c01b60b1 c0731000 c06982d5
401b4b73
>>>   ceebc988 e9e3a340 00013c57 00000000 c01b60f7 ceebc988 b7731000
c670ff04
>>>   c01a7183 4646e045 80000005 e62ce348 28999063 c0103fc5 7f662000
00278ae0
>>> Call Trace:
>>>   [<c01b60b1>] ? swap_entry_free+0x121/0x140
>>>   [<c06982d5>] ? _raw_spin_lock+0x5/0x10
>>>   [<c01b60f7>] ? free_swap_and_cache+0x27/0xd0
>>>   [<c01a7183>] ? zap_pte_range+0x1b3/0x480
>>>   [<c0103fc5>] ? pte_pfn_to_mfn+0xb5/0xd0
>>>   [<c01a7568>] ? unmap_page_range+0x118/0x1a0
>>>   [<c0105b17>] ? xen_force_evtchn_callback+0x17/0x30
>>>   [<c01a771b>] ? unmap_vmas+0x12b/0x1e0
>>>   [<c01aba01>] ? exit_mmap+0x91/0x140
>>>   [<c0134b2b>] ? mmput+0x2b/0xc0
>>>   [<c01386ba>] ? exit_mm+0xfa/0x130
>>>   [<c0698330>] ? _raw_spin_lock_irq+0x10/0x20
>>>   [<c013a2b5>] ? do_exit+0x125/0x360
>>>   [<c0105b17>] ? xen_force_evtchn_callback+0x17/0x30
>>>   [<c013a52c>] ? do_group_exit+0x3c/0xa0
>>>   [<c013a5a1>] ? sys_exit_group+0x11/0x20
>>>   [<c0698631>] ? syscall_call+0x7/0xb
>>> Code: ff 89 d8 e8 7d ec f6 ff 01 e8 8d 76 00 c6 00 00 ba 01 00 00
00
>>> eb b2 89 f8 3c 80 0f 94 c0
>>> e9 b9 fe ff ff 0f 0b eb fe 0f 0b eb fe<0f>  0b eb fe 0f 0b eb
fe 66
>>> 90 53 31 db 83 ec 0c 85 c0 7
>>> 4 39 89
>>> EIP: [<c01b4b26>] swap_count_continued+0x176/0x180 SS:ESP
0069:c670fe0c
>>> ---[ end trace c2dcb41c89b0a9f7 ]---


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Sep-06 17:13 UTC

head link

Re: [Xen-devel] Re: kernel BUG at mm/swapfile.c:2527! [was 3.0.0 Xen pv guest - BUG: Unable to handle]

On Wed, Aug 31, 2011 at 04:43:07PM -0400, Christopher S. Aker
wrote:> On 8/30/11 7:45 AM, Ian Campbell wrote:
> >On Mon, 2011-08-29 at 16:07 +0100, Konrad Rzeszutek Wilk wrote:
> >>I just don''t get how you are the only person seeing this -
and you have
> >>been seeing this from 2.6.32... The dom0 you have - is it printing
at least
> >>something when this happens (or before)? Or the Xen hypervisor:
> >>maybe a message about L1 pages not found?
So .. just to confirm this b/c you have been seeing this for some time. Did you
see this with a 2.6.32 DomU? Asking b/c in 2.6.37 we removed some code:

ef691947d8a3d479e67652312783aedcf629320a


commit ef691947d8a3d479e67652312783aedcf629320a
Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Date:   Wed Dec 1 15:45:48 2010 -0800

    vmalloc: remove vmalloc_sync_all() from alloc_vm_area()
    
    There''s no need for it: it will get faulted into the current
pagetable
    as needed.
    
    Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 5d60302..fdf4b1e 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2148,10 +2148,6 @@ struct vm_struct *alloc_vm_area(size_t size)
 		return NULL;
 	}
 
-	/* Make sure the pagetables are constructed in process kernel
-	   mappings */
-	vmalloc_sync_all();
-
 	return area;
 }
 EXPORT_SYMBOL_GPL(alloc_vm_area);

Which we found led to a couple of bugs:


"    Revert "vmalloc: remove vmalloc_sync_all() from
alloc_vm_area()"
    
    This reverts commit ef691947d8a3d479e67652312783aedcf629320a.
    
    Xen backend drivers (e.g., blkback and netback) would sometimes fail
    to map grant pages into the vmalloc address space allocated with
    alloc_vm_area().  The GNTTABOP_map_grant_ref would fail because Xen
    could not find the page (in the L2 table) containing the PTEs it
    needed to update.
    
    (XEN) mm.c:3846:d0 Could not find L1 PTE for address fbb42000
    
    netback and blkback were making the hypercall from a kernel thread
    where task->active_mm != &init_mm and alloc_vm_area() was only
    updating the page tables for init_mm.  The usual method of deferring
    the update to the page tables of other processes (i.e., after taking a
    fault) doesn''t work as a fault cannot occur during the hypercall.
    
    This would work on some systems depending on what else was using
    vmalloc.
"

It would really neat if the issue you have been hitting was exactly this
and just having you revert the ef691947d8a3d479e67652312783aedcf629320a
would fix it.

I am grasping at straws here - since without able to reproduce this it is
a bit hard to figure out what is going wrong.

BTW, the fix also affects the front-ends - especially the xen netfront -
even thought the comment only mentions backends.

> >
> >It''d be worth ensuring that the requires guest_loglvl and
loglvl
> >parameters to allow this is in place on the hypervisor command line.
> 
> Nothing in Xen''s output correlates at the time of the domUs
> crashing, however we don''t have guest log levels turned up.
> 
> >Are these reports against totally unpatched kernel.org domU kernels?
> 
> Yes - unpatched domUs.
> 
> >>And the dom0 is 2.6.18, right? - Did you update it (I know that the
Red Hat guys
> >>have been updating a couple of things on it).
> 
> 2.6.18 from xenbits, all around changeset 931 vintage.
> 
> >>Any chance I can get access to your setup and try to work with
somebody
> >>to reproduce this?
> 
> Konrad, that''s a fantastic offer and much appreciated.  To make
this
> happen I''ll need to find a volunteer customer or two whose
activity
> reproduces this problem and who can deal with some downtime -- then
> quarantine them off to an environment you can access.  I''ll send
out
> the word...
> 
> >>>------------[ cut here ]------------
> >>>kernel BUG at mm/swapfile.c:2527!
> >
> >This is "BUG_ON(*map == 0);" which is subtly different from
the error in
> >the original post from Peter which was a "unable to handle kernel
paging
> >request" at EIP c01ab854, with a pagetable walk showing PTE==0.
> >
> >I''d bet the dereference corresponds to the "*map" in
that same place but
> >Peter can you convert that address to a line of code please?
> 
> root@build:/build/xen/domU/i386/3.0.0-linode35-debug# gdb vmlinux
> GNU gdb (GDB) 7.1-ubuntu (...snip...)
> Reading symbols from
> /build/xen/domU/i386/3.0.0-linode35-debug/vmlinux...done.
> (gdb) list *0xc01ab854
> 0xc01ab854 is in swap_count_continued (mm/swapfile.c:2493).
> 2488
> 2489            if (count == (SWAP_MAP_MAX | COUNT_CONTINUED)) { /*
> incrementing */
> 2490                    /*
> 2491                     * Think of how you add 1 to 999
> 2492                     */
> 2493                    while (*map == (SWAP_CONT_MAX | COUNT_CONTINUED)) {
> 2494                            kunmap_atomic(map, KM_USER0);
> 2495                            page = list_entry(page->lru.next,
> struct page, lru);
> 2496                            BUG_ON(page == head);
> 2497                            map = kmap_atomic(page, KM_USER0) + offset;
> (gdb)
> 
> >map came from a kmap_atomic() not far before this point so it appears
> >that it is mapping the wrong page (so *map != 0) and/or mapping a
> >non-existent page (leading to the fault).
> >
> >Warning, wild speculation follows...
> >
> >Is it possible that we are in lazy paravirt mode at this point such
that
> >the mapping hasn''t really occurred yet, leaving either nothing
or the
> >previous mapping? (would the current paravirt lazy state make a useful
> >general addition to the panic message?)
> >
> >The definition of kmap_atomic is a bit confusing:
> >         /*
> >          * Make both: kmap_atomic(page, idx) and kmap_atomic(page)
work.
> >          */
> >         #define kmap_atomic(page, args...) __kmap_atomic(page)
> >but it appears that the KM_USER0 at the callsite is ignored and instead
> >we end up using the __kmap_atomic_idx stuff (fine). I wondered if it is
> >possible we are overflowing the number of slots but there is an
explicit
> >BUG_ON for that case in kmap_atomic_idx_push. Oh, wait, that''s
iff
> >CONFIG_DEBUG_HIGHMEM, which appears to not be enabled. I think it would
> >be worth trying, it doesn''t look to have too much overhead.
> 
> My next build will be sure to include CONFIG_DEBUG_HIGHMEM. Maybe
> that''ll lead us to a discovery.
> 
> >Another possibility which springs to mind is the pfn->mfn laundering
> >going wrong. Perhaps as a skanky debug hack remembering the last pte
> >val, address, mfn, pfn etc and dumping them on error would give a hint?
> >I wouldn''t expect that to result in a non-present mapping
though, rather
> >I would expect either the wrong thing or the guest to be killed by the
> >hypervisor
> >
> >Would it be worth doing a __get_user(map) (or some other
"safe" pointer
> >dereference) right after the mapping is established, catching a fault
if
> >one occurs so we can dump some additional debug in that case?
I''m not
> >entirely sure what to suggest dumping though.
> >
> >Ian.
> >
> >>>invalid opcode: 0000 [#1] SMP
> >>>last sysfs file: /sys/devices/system/cpu/cpu3/topology/core_id
> >>>Modules linked in:
> >>>
> >>>Pid: 17680, comm: postgres Tainted: G    B      
2.6.39-linode33 #3
> >>>EIP: 0061:[<c01b4b26>] EFLAGS: 00210246 CPU: 0
> >>>EIP is at swap_count_continued+0x176/0x180
> >>>EAX: f57bac57 EBX: eba2c200 ECX: f57ba000 EDX: 00000000
> >>>ESI: ebfd7c20 EDI: 00000080 EBP: 00000c57 ESP: c670fe0c
> >>>  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
> >>>Process postgres (pid: 17680, ti=c670e000 task=e93415d0
task.ti=c670e000)
> >>>Stack:
> >>>  e9e3a340 00013c57 ee15fc57 00000000 c01b60b1 c0731000
c06982d5 401b4b73
> >>>  ceebc988 e9e3a340 00013c57 00000000 c01b60f7 ceebc988
b7731000 c670ff04
> >>>  c01a7183 4646e045 80000005 e62ce348 28999063 c0103fc5
7f662000 00278ae0
> >>>Call Trace:
> >>>  [<c01b60b1>] ? swap_entry_free+0x121/0x140
> >>>  [<c06982d5>] ? _raw_spin_lock+0x5/0x10
> >>>  [<c01b60f7>] ? free_swap_and_cache+0x27/0xd0
> >>>  [<c01a7183>] ? zap_pte_range+0x1b3/0x480
> >>>  [<c0103fc5>] ? pte_pfn_to_mfn+0xb5/0xd0
> >>>  [<c01a7568>] ? unmap_page_range+0x118/0x1a0
> >>>  [<c0105b17>] ? xen_force_evtchn_callback+0x17/0x30
> >>>  [<c01a771b>] ? unmap_vmas+0x12b/0x1e0
> >>>  [<c01aba01>] ? exit_mmap+0x91/0x140
> >>>  [<c0134b2b>] ? mmput+0x2b/0xc0
> >>>  [<c01386ba>] ? exit_mm+0xfa/0x130
> >>>  [<c0698330>] ? _raw_spin_lock_irq+0x10/0x20
> >>>  [<c013a2b5>] ? do_exit+0x125/0x360
> >>>  [<c0105b17>] ? xen_force_evtchn_callback+0x17/0x30
> >>>  [<c013a52c>] ? do_group_exit+0x3c/0xa0
> >>>  [<c013a5a1>] ? sys_exit_group+0x11/0x20
> >>>  [<c0698631>] ? syscall_call+0x7/0xb
> >>>Code: ff 89 d8 e8 7d ec f6 ff 01 e8 8d 76 00 c6 00 00 ba 01 00
00 00
> >>>eb b2 89 f8 3c 80 0f 94 c0
> >>>e9 b9 fe ff ff 0f 0b eb fe 0f 0b eb fe<0f>  0b eb fe 0f
0b eb fe 66
> >>>90 53 31 db 83 ec 0c 85 c0 7
> >>>4 39 89
> >>>EIP: [<c01b4b26>] swap_count_continued+0x176/0x180 SS:ESP
0069:c670fe0c
> >>>---[ end trace c2dcb41c89b0a9f7 ]---
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe
linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christopher S. Aker

2011-Sep-12 16:06 UTC

head link

Re: [Xen-devel] Re: kernel BUG at mm/swapfile.c:2527! [was 3.0.0 Xen pv guest - BUG: Unable to handle]

On 9/6/11 1:13 PM, Konrad Rzeszutek Wilk wrote:> So .. just to confirm this b/c you have been seeing this for some time. Did
you
> see this with a 2.6.32 DomU? Asking b/c in 2.6.37 we removed some code:
2.6.32 was NOT affected and our problems began right around 2.6.37. 
This looks promising!
> It would really neat if the issue you have been hitting was exactly this
> and just having you revert the ef691947d8a3d479e67652312783aedcf629320a
> would fix it.
Reverted, built, deployed, and set as default.  We shall see!

Thanks,
-Chris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Sep-12 16:11 UTC

head link

Re: [Xen-devel] Re: kernel BUG at mm/swapfile.c:2527! [was 3.0.0 Xen pv guest - BUG: Unable to handle]

On Mon, Sep 12, 2011 at 12:06:41PM -0400, Christopher S. Aker
wrote:> On 9/6/11 1:13 PM, Konrad Rzeszutek Wilk wrote:
> >So .. just to confirm this b/c you have been seeing this for some time.
Did you
> >see this with a 2.6.32 DomU? Asking b/c in 2.6.37 we removed some code:
> 
> 2.6.32 was NOT affected and our problems began right around 2.6.37.
> This looks promising!
> 
> >It would really neat if the issue you have been hitting was exactly
this
> >and just having you revert the ef691947d8a3d479e67652312783aedcf629320a
> >would fix it.
> 
> Reverted, built, deployed, and set as default.  We shall see!
<holds his fingers crossed>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christopher S. Aker

2011-Sep-15 18:58 UTC

head link

Re: [Xen-devel] Re: kernel BUG at mm/swapfile.c:2527! [was 3.0.0 Xen pv guest - BUG: Unable to handle]

On 9/12/11 12:11 PM, Konrad Rzeszutek Wilk wrote:> On Mon, Sep 12, 2011 at 12:06:41PM -0400, Christopher S. Aker wrote:
>>> It would really neat if the issue you have been hitting was exactly
this
>>> and just having you revert the
ef691947d8a3d479e67652312783aedcf629320a
>>> would fix it.
>>
>> Reverted, built, deployed, and set as default.  We shall see!
>
> <holds his fingers crossed>
No joy.  Still getting reports even with the patched kernel.  I was so 
confident that this was the problem -- I''ve tripled checked that the 
patch was applied and that this is indeed the correct kernel.  It was 
built with DEBUG_HIGHMEM too, without any difference in the dump.

BUG: unable to handle kernel paging request at f5768598
IP: [<c01abbd4>] swap_count_continued+0x84/0x180
*pdpt = 0000000000939027 *pde = 00000000017ef067 *pte = 0000000000000000
Oops: 0000 [#1] SMP
Modules linked in:

Pid: 1619, comm: apache2 Not tainted 3.0.4-linode37 #1
EIP: 0061:[<c01abbd4>] EFLAGS: 00010246 CPU: 2
EIP is at swap_count_continued+0x84/0x180
EAX: f5768598 EBX: ed13af80 ECX: ec9cf0a0 EDX: 00000080
ESI: ed1d35a0 EDI: 00000080 EBP: 00000598 ESP: e73d3dd4
  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
Process apache2 (pid: 1619, ti=e73d2000 task=ebd0e410 task.ti=e73d2000)
Stack:
  ebd3f240 0000e598 00000040 00000000 c01abdc1 ec540e30 ebd3f240 0000e598
  00000000 c01ae027 ec540e30 b8fc6000 e73d3e68 c01a00e3 44846045 80000008
  00000000 00000020 c0105c27 2bbca063 001cb300 eb424200 ecf7780c eaaade38
Call Trace:
  [<c01abdc1>] ? swap_entry_free+0xf1/0x120
  [<c01ae027>] ? free_swap_and_cache+0x27/0xd0
  [<c01a00e3>] ? zap_pte_range+0x173/0x460
  [<c0105c27>] ? xen_force_evtchn_callback+0x17/0x30
  [<c01a04d0>] ? unmap_page_range+0x100/0x180
  [<c01a05da>] ? unmap_vmas+0x8a/0xc0
  [<c01a2a93>] ? exit_mmap+0x73/0x100
  [<c0132bbb>] ? mmput+0x2b/0xc0
  [<c013642f>] ? exit_mm+0xef/0x120
  [<c06bf870>] ? _raw_spin_lock_irq+0x10/0x20
  [<c0137fd5>] ? do_exit+0x125/0x350
  [<c01a2a07>] ? remove_vma+0x37/0x50
  [<c013823c>] ? do_group_exit+0x3c/0xa0
  [<c01382b1>] ? sys_exit_group+0x11/0x20
  [<c06bfb71>] ? syscall_call+0x7/0xb
  [<c06b0000>] ? sctp_err_lookup+0xb0/0x110
Code: 00 00 89 fa 80 fa 80 74 22 e9 0b 01 00 00 90 e8 63 7a f7 ff 8b 5b 
18 83 eb 18 39 de 0f
  84 f3 00 00 00 89 d8 e8 de 7c f7 ff 01 e8 <0f> b6 10 80 fa 80 74 dc 84 
d2 0f 84 e2 00 00 00
  83 ea 01 80 fa
EIP: [<c01abbd4>] swap_count_continued+0x84/0x180 SS:ESP 0069:e73d3dd4
CR2: 00000000f5768598
---[ end trace 06805b7648b253a0 ]---

So today I built a new stack and enabled loglvl=warning and 
guest_loglvl=warning/info, however it''s probably going to take a while 
before we have enough of these running and hit this problem.

I''m going to play around with it some more and see if I can find a 
recipe that can reproduce.

-Chris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christopher S. Aker

2011-Sep-15 19:17 UTC

head link

Re: [Xen-devel] Re: kernel BUG at mm/swapfile.c:2527! [was 3.0.0 Xen pv guest - BUG: Unable to handle]

Another report, different user, and slightly different:

BUG: unable to handle kernel paging request at f573fc8c
IP: [<c01abc54>] swap_count_continued+0x104/0x180
*pdpt = 000000002a3b9027 *pde = 0000000001bed067 *pte = 0000000000000000
Oops: 0000 [#1] SMP
Modules linked in:

Pid: 1638, comm: apache2 Not tainted 3.0.4-linode37 #1
EIP: 0061:[<c01abc54>] EFLAGS: 00210246 CPU: 3
EIP is at swap_count_continued+0x104/0x180
EAX: f573fc8c EBX: ed1d4840 ECX: 00000000 EDX: 000000be
ESI: ed1d4a20 EDI: 000000be EBP: 00000c8c ESP: e9e5fdb0
  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
Process apache2 (pid: 1638, ti=e9e5e000 task=ead37410 task.ti=e9e5e000)
Stack:
  ea746dc0 0000fc8c 000000be ffffffea c01ac222 358a1067 c01040f7 0002a75e
  405d54c0 0000fc8c ea75e5e8 001f9180 b74bd000 c01ac2e4 00000000 c01a0a6b
  a9fdf045 80000002 00000000 00000000 0000fc8c 00000000 dc5d55e8 08100073
Call Trace:
  [<c01ac222>] ? __swap_duplicate+0xc2/0x160
  [<c01040f7>] ? pte_mfn_to_pfn+0x87/0xe0
  [<c01ac2e4>] ? swap_duplicate+0x14/0x40
  [<c01a0a6b>] ? copy_pte_range+0x45b/0x500
  [<c01a0ca5>] ? copy_page_range+0x195/0x200
  [<c01328c6>] ? dup_mmap+0x1c6/0x2c0
  [<c0132cf8>] ? dup_mm+0xa8/0x130
  [<c013376a>] ? copy_process+0x98a/0xb30
  [<c013395f>] ? do_fork+0x4f/0x280
  [<c01573b3>] ? getnstimeofday+0x43/0x100
  [<c010f770>] ? sys_clone+0x30/0x40
  [<c06c048d>] ? ptregs_clone+0x15/0x48
  [<c06bfb71>] ? syscall_call+0x7/0xb
  [<c06b0000>] ? sctp_err_lookup+0xb0/0x110
Code: de 75 dc b8 01 00 00 00 5b 5e 5f 5d c3 66 90 e8 e3 79 f7 ff 8b 5b 
18 83 eb 18 39 de 0f 84 7f 00 00 00 89 d8 e8 5e 7c f7 ff 01 e8 <0f> b6 
10 80 fa ff 74 dc 80 fa 7f 74 28 83 c2 01 88 10 eb 0c 89
EIP: [<c01abc54>] swap_count_continued+0x104/0x180 SS:ESP 0069:e9e5fdb0
CR2: 00000000f573fc8c
---[ end trace 18843f6443e730a1 ]---

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christopher S. Aker

2011-Sep-18 15:05 UTC

head link

Re: [Xen-devel] Re: kernel BUG at mm/swapfile.c:2527! [was 3.0.0 Xen pv guest - BUG: Unable to handle]

We''ve discovered a way to reproduce this quite easily!  Fire up a
guest, run eatmem with 1M and enough threads to not OOM.

http://www.theshore.net/~caker/uml/patches/utils/eatmem.c

On a 1G guest: ./eatmem 1M 900

-Chris


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Sep-21 18:04 UTC

head link

Re: [Xen-devel] Re: kernel BUG at mm/swapfile.c:2527! [was 3.0.0 Xen pv guest - BUG: Unable to handle]

On Sun, Sep 18, 2011 at 11:05:33AM -0400, Christopher S. Aker
wrote:> We''ve discovered a way to reproduce this quite easily!  Fire up a
guest, run eatmem with 1M and enough threads to not OOM.
Excellent. Can you also send me your .config to make sure I''ve the
right
knobs set?
> 
> http://www.theshore.net/~caker/uml/patches/utils/eatmem.c
> 
> On a 1G guest: ./eatmem 1M 900
> 
> -Chris
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christopher S. Aker

2011-Sep-21 22:09 UTC

head link

Re: [Xen-devel] Re: kernel BUG at mm/swapfile.c:2527! [was 3.0.0 Xen pv guest - BUG: Unable to handle]

On 9/21/11 2:04 PM, Konrad Rzeszutek Wilk wrote:> Excellent. Can you also send me your .config to make sure I''ve the
right
> knobs set?
Certainly.  I included my kernel binary along with its config:

http://www.theshore.net/~caker/xen/BUGS/swapfile/

Thanks!
-Chris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Sep-22 18:32 UTC

head link

Re: [Xen-devel] Re: kernel BUG at mm/swapfile.c:2527! [was 3.0.0 Xen pv guest - BUG: Unable to handle]

> >I''d bet the dereference corresponds to the "*map" in
that same place but
> >Peter can you convert that address to a line of code please?
> 
> root@build:/build/xen/domU/i386/3.0.0-linode35-debug# gdb vmlinux
> GNU gdb (GDB) 7.1-ubuntu (...snip...)
> Reading symbols from
> /build/xen/domU/i386/3.0.0-linode35-debug/vmlinux...done.
> (gdb) list *0xc01ab854
> 0xc01ab854 is in swap_count_continued (mm/swapfile.c:2493).
> 2488
> 2489            if (count == (SWAP_MAP_MAX | COUNT_CONTINUED)) { /*
> incrementing */
> 2490                    /*
> 2491                     * Think of how you add 1 to 999
> 2492                     */
> 2493                    while (*map == (SWAP_CONT_MAX | COUNT_CONTINUED)) {
> 2494                            kunmap_atomic(map, KM_USER0);
> 2495                            page = list_entry(page->lru.next,
> struct page, lru);
> 2496                            BUG_ON(page == head);
> 2497                            map = kmap_atomic(page, KM_USER0) + offset;
> (gdb)
> 
> >map came from a kmap_atomic() not far before this point so it appears
> >that it is mapping the wrong page (so *map != 0) and/or mapping a
> >non-existent page (leading to the fault).
First of thanks to Jeremy for help on this one, and Shaun R for lending
me one of his boxes with a environment to easily test it.

The problem looks that in copy_page_range we turn lazy mode on, and then
in swap_entry_free we call swap_count_continued which ends up in:

         map = kmap_atomic(page, KM_USER0) + offset;

and then later on touching *map.

Basically we are forking a process and copying the pages that are also
"swap" pages. We don''t need to access the user pages
immediately, but we
do for the swap pages as we need proper reference counting.

Well, since we are running in batched mode we don''t actually set up the
PTE mappings and the kmap_atomic is not done synchronously and ends up
trying to dereference a page that has not been set.

Looking at kmap_atomic_prot_pfn, it uses
''arch_flush_lazy_mmu_mode'' and
sprinkling that in kmap_atomic_prot and __kunmap_atomic seems to make
the problem go away.

This is the patch that looks to be doing the trick. Please double check
if it fixes in your guys setup.


diff --git a/arch/x86/mm/highmem_32.c b/arch/x86/mm/highmem_32.c
index b499626..f4f29b1 100644
--- a/arch/x86/mm/highmem_32.c
+++ b/arch/x86/mm/highmem_32.c
@@ -45,6 +45,7 @@ void *kmap_atomic_prot(struct page *page, pgprot_t prot)
 	vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
 	BUG_ON(!pte_none(*(kmap_pte-idx)));
 	set_pte(kmap_pte-idx, mk_pte(page, prot));
+	arch_flush_lazy_mmu_mode();
 
 	return (void *)vaddr;
 }
@@ -88,6 +89,7 @@ void __kunmap_atomic(void *kvaddr)
 		 */
 		kpte_clear_flush(kmap_pte-idx, vaddr);
 		kmap_atomic_idx_pop();
+		arch_flush_lazy_mmu_mode();
 	}
 #ifdef CONFIG_DEBUG_HIGHMEM
 	else {




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christopher S. Aker

2011-Sep-22 20:02 UTC

head link

Re: [Xen-devel] Re: kernel BUG at mm/swapfile.c:2527! [was 3.0.0 Xen pv guest - BUG: Unable to handle]

On 9/22/11 2:32 PM, Konrad Rzeszutek Wilk wrote:> This is the patch that looks to be doing the trick. Please double check
> if it fixes in your guys setup.
So far it''s looking good.  Before I was able to BUG it within 2 or 3 
minutes.  This patched kernel has held up for the past hour and counting.

Nice work, and many thanks!

-Chris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Aug 2011 - 3.0.0 Xen pv guest - BUG: Unable to handle kernel paging request in swap_count_continued

[Xen-devel] 3.0.0 Xen pv guest - BUG: Unable to handle kernel paging request in swap_count_continued

[Xen-devel] Re: kernel BUG at mm/swapfile.c:2527! [was 3.0.0 Xen pv guest - BUG: Unable to handle]

Re: [Xen-devel] Re: kernel BUG at mm/swapfile.c:2527! [was 3.0.0 Xen pv guest - BUG: Unable to handle]

Re: [Xen-devel] Re: kernel BUG at mm/swapfile.c:2527! [was 3.0.0 Xen pv guest - BUG: Unable to handle]

Re: [Xen-devel] Re: kernel BUG at mm/swapfile.c:2527! [was 3.0.0 Xen pv guest - BUG: Unable to handle]

Re: [Xen-devel] Re: kernel BUG at mm/swapfile.c:2527! [was 3.0.0 Xen pv guest - BUG: Unable to handle]

Re: [Xen-devel] Re: kernel BUG at mm/swapfile.c:2527! [was 3.0.0 Xen pv guest - BUG: Unable to handle]

Re: [Xen-devel] Re: kernel BUG at mm/swapfile.c:2527! [was 3.0.0 Xen pv guest - BUG: Unable to handle]

Re: [Xen-devel] Re: kernel BUG at mm/swapfile.c:2527! [was 3.0.0 Xen pv guest - BUG: Unable to handle]

Re: [Xen-devel] Re: kernel BUG at mm/swapfile.c:2527! [was 3.0.0 Xen pv guest - BUG: Unable to handle]

Re: [Xen-devel] Re: kernel BUG at mm/swapfile.c:2527! [was 3.0.0 Xen pv guest - BUG: Unable to handle]

Re: [Xen-devel] Re: kernel BUG at mm/swapfile.c:2527! [was 3.0.0 Xen pv guest - BUG: Unable to handle]

Re: [Xen-devel] Re: kernel BUG at mm/swapfile.c:2527! [was 3.0.0 Xen pv guest - BUG: Unable to handle]

Re: [Xen-devel] Re: kernel BUG at mm/swapfile.c:2527! [was 3.0.0 Xen pv guest - BUG: Unable to handle]

Re: [Xen-devel] Re: kernel BUG at mm/swapfile.c:2527! [was 3.0.0 Xen pv guest - BUG: Unable to handle]