thr3ads.net - Xen devel - [Xen-devel] kernel BUG at mm/swapfile.c:2524 [Apr 2011]

If this information is useful, please help other people find it:
Share via:

Peter Sandin

2011-Apr-06 21:59 UTC

[Xen-devel] kernel BUG at mm/swapfile.c:2524

Hello,

We''ve got some 2.6.38 domUs that are hitting a bug in mm/swapfile.c.
The issue has only cropped up since we have moved to 2.6.38. This issue has
happened on multiple separate physical machines. I''ve attached the
trace from one instance here, additional instances can be found along with a
copy of the domU kernel image and configuration at:

http://thesandins.net/xen/2.6.38/

------------[ cut here ]------------
kernel BUG at mm/swapfile.c:2524!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/devices/vbd-51728/block/xvdb/stat
Modules linked in:

Pid: 539, comm: apache2 Not tainted 2.6.38-linode31 #1
EIP: 0061:[<c01a36b6>] EFLAGS: 00010246 CPU: 0
EIP is at swap_count_continued+0x176/0x180
EAX: f57ba5f8 EBX: ed3c8f00 ECX: f57ba000 EDX: 00000000
ESI: ed3c5320 EDI: 00000080 EBP: 000005f8 ESP: eb80be80
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
Process apache2 (pid: 539, ti=eb80a000 task=cc472be0 task.ti=eb80a000)
Stack:
 ec8e68c0 000195f8 00000040 00000000 c01a37b1 000195f8 ec96ae20 ed14b000
 00000000 c01a3a18 00000000 c01955cc 00000000 80000007 00000000 c0106314
 00000023 ecf7ac4c 000195f8 ca6e0290 00000000 ffffffe8 b883d664 00000000
Call Trace:
 [<c01a37b1>] ? swap_entry_free+0xf1/0x120
 [<c01a3a18>] ? swap_free+0x18/0x30
 [<c01955cc>] ? handle_pte_fault+0x49c/0xac0
 [<c0106314>] ? check_events+0x8/0xc
 [<c0196e91>] ? handle_mm_fault+0x101/0x1a0
 [<c011e81b>] ? do_page_fault+0xfb/0x3e0
 [<c063f390>] ? _raw_spin_lock_irq+0x10/0x20
 [<c063fcd6>] ? error_code+0x5a/0x60
 [<c013f397>] ? sys_rt_sigaction+0x77/0xa0
 [<c011e720>] ? do_page_fault+0x0/0x3e0
 [<c063fcd6>] ? error_code+0x5a/0x60
 [<c0630000>] ? sctp_sockaddr_af+0x20/0x90
 [<c011e720>] ? do_page_fault+0x0/0x3e0
Code: ff 89 d8 e8 cd f7 f7 ff 01 e8 8d 76 00 c6 00 00 ba 01 00 00 00 eb b2 89 f8
3c 80 0f 94 c0 e9 b
9 fe ff ff 0f 0b eb fe 0f 0b eb fe <0f> 0b eb fe 0f 0b eb fe 66 90 83 ec
10 89 1c 24 89 c3 89 74 24
EIP: [<c01a36b6>] swap_count_continued+0x176/0x180 SS:ESP 0069:eb80be80
---[ end trace 41e4a2572fe1ada6 ]---

I''ve looked at the section of code that is generating the the fault,
but I''m a bit over my head. Does this look like it is a Xen specific
issue, or something that would be better addressed on the LKML? Any insight you
can provide on the source or a fix for this issue would be appreciated.

Thanks,
Peter
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Apr-07 13:50 UTC

head link

Re: [Xen-devel] kernel BUG at mm/swapfile.c:2524

On Wed, Apr 06, 2011 at 05:59:03PM -0400, Peter Sandin
wrote:> Hello,
> 
> We''ve got some 2.6.38 domUs that are hitting a bug in
mm/swapfile.c. The issue has only cropped up since we have moved to 2.6.38. This
issue has happened on multiple separate physical machines. I''ve
attached the trace from one instance here, additional instances can be found
along with a copy of the domU kernel image and configuration at:
> 
> http://thesandins.net/xen/2.6.38/
What exactly happend to cause this?> 
> ------------[ cut here ]------------
> kernel BUG at mm/swapfile.c:2524!
> invalid opcode: 0000 [#1] SMP
> last sysfs file: /sys/devices/vbd-51728/block/xvdb/stat
> Modules linked in:
> 
> Pid: 539, comm: apache2 Not tainted 2.6.38-linode31 #1
> EIP: 0061:[<c01a36b6>] EFLAGS: 00010246 CPU: 0
> EIP is at swap_count_continued+0x176/0x180
> EAX: f57ba5f8 EBX: ed3c8f00 ECX: f57ba000 EDX: 00000000
> ESI: ed3c5320 EDI: 00000080 EBP: 000005f8 ESP: eb80be80
>  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
> Process apache2 (pid: 539, ti=eb80a000 task=cc472be0 task.ti=eb80a000)
> Stack:
>  ec8e68c0 000195f8 00000040 00000000 c01a37b1 000195f8 ec96ae20 ed14b000
>  00000000 c01a3a18 00000000 c01955cc 00000000 80000007 00000000 c0106314
>  00000023 ecf7ac4c 000195f8 ca6e0290 00000000 ffffffe8 b883d664 00000000
> Call Trace:
>  [<c01a37b1>] ? swap_entry_free+0xf1/0x120
>  [<c01a3a18>] ? swap_free+0x18/0x30
>  [<c01955cc>] ? handle_pte_fault+0x49c/0xac0
>  [<c0106314>] ? check_events+0x8/0xc
>  [<c0196e91>] ? handle_mm_fault+0x101/0x1a0
>  [<c011e81b>] ? do_page_fault+0xfb/0x3e0
>  [<c063f390>] ? _raw_spin_lock_irq+0x10/0x20
>  [<c063fcd6>] ? error_code+0x5a/0x60
>  [<c013f397>] ? sys_rt_sigaction+0x77/0xa0
>  [<c011e720>] ? do_page_fault+0x0/0x3e0
>  [<c063fcd6>] ? error_code+0x5a/0x60
>  [<c0630000>] ? sctp_sockaddr_af+0x20/0x90
>  [<c011e720>] ? do_page_fault+0x0/0x3e0
> Code: ff 89 d8 e8 cd f7 f7 ff 01 e8 8d 76 00 c6 00 00 ba 01 00 00 00 eb b2
89 f8 3c 80 0f 94 c0 e9 b
> 9 fe ff ff 0f 0b eb fe 0f 0b eb fe <0f> 0b eb fe 0f 0b eb fe 66 90 83
ec 10 89 1c 24 89 c3 89 74 24
> EIP: [<c01a36b6>] swap_count_continued+0x176/0x180 SS:ESP
0069:eb80be80
> ---[ end trace 41e4a2572fe1ada6 ]---
> 
> I''ve looked at the section of code that is generating the the
fault, but I''m a bit over my head. Does this look like it is a Xen
specific issue, or something that would be better addressed on the LKML? Any
insight you can provide on the source or a fix for this issue would be
appreciated.
> 
> Thanks,
> Peter
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Peter Sandin

2011-Apr-12 14:39 UTC

head link

Re: [Xen-devel] kernel BUG at mm/swapfile.c:2524

The traces included here are from customer instances so we don''t know
exactly what they were doing at the time they hit this. Looking at their IO
usage, and the other information included with their reports it sounds like they
were swapping heavily in most cases. Unfortunately we haven''t been able
to reproduce this in a controlled environment. If you have any suggested tests
that I could run to help reproduce, or narrow down the source of this bug, I can
certainly give those a try. Another data point that may be helpful is that we
have only seen this issue with 32bit kernels.

--Peter

On Apr 7, 2011, at 9:50 AM, Konrad Rzeszutek Wilk wrote:
> On Wed, Apr 06, 2011 at 05:59:03PM -0400, Peter Sandin wrote:
>> Hello,
>> 
>> We''ve got some 2.6.38 domUs that are hitting a bug in
mm/swapfile.c. The issue has only cropped up since we have moved to 2.6.38. This
issue has happened on multiple separate physical machines. I''ve
attached the trace from one instance here, additional instances can be found
along with a copy of the domU kernel image and configuration at:
>> 
>> http://thesandins.net/xen/2.6.38/
> 
> What exactly happend to cause this?
>> 
>> ------------[ cut here ]------------
>> kernel BUG at mm/swapfile.c:2524!
>> invalid opcode: 0000 [#1] SMP
>> last sysfs file: /sys/devices/vbd-51728/block/xvdb/stat
>> Modules linked in:
>> 
>> Pid: 539, comm: apache2 Not tainted 2.6.38-linode31 #1
>> EIP: 0061:[<c01a36b6>] EFLAGS: 00010246 CPU: 0
>> EIP is at swap_count_continued+0x176/0x180
>> EAX: f57ba5f8 EBX: ed3c8f00 ECX: f57ba000 EDX: 00000000
>> ESI: ed3c5320 EDI: 00000080 EBP: 000005f8 ESP: eb80be80
>> DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
>> Process apache2 (pid: 539, ti=eb80a000 task=cc472be0 task.ti=eb80a000)
>> Stack:
>> ec8e68c0 000195f8 00000040 00000000 c01a37b1 000195f8 ec96ae20 ed14b000
>> 00000000 c01a3a18 00000000 c01955cc 00000000 80000007 00000000 c0106314
>> 00000023 ecf7ac4c 000195f8 ca6e0290 00000000 ffffffe8 b883d664 00000000
>> Call Trace:
>> [<c01a37b1>] ? swap_entry_free+0xf1/0x120
>> [<c01a3a18>] ? swap_free+0x18/0x30
>> [<c01955cc>] ? handle_pte_fault+0x49c/0xac0
>> [<c0106314>] ? check_events+0x8/0xc
>> [<c0196e91>] ? handle_mm_fault+0x101/0x1a0
>> [<c011e81b>] ? do_page_fault+0xfb/0x3e0
>> [<c063f390>] ? _raw_spin_lock_irq+0x10/0x20
>> [<c063fcd6>] ? error_code+0x5a/0x60
>> [<c013f397>] ? sys_rt_sigaction+0x77/0xa0
>> [<c011e720>] ? do_page_fault+0x0/0x3e0
>> [<c063fcd6>] ? error_code+0x5a/0x60
>> [<c0630000>] ? sctp_sockaddr_af+0x20/0x90
>> [<c011e720>] ? do_page_fault+0x0/0x3e0
>> Code: ff 89 d8 e8 cd f7 f7 ff 01 e8 8d 76 00 c6 00 00 ba 01 00 00 00 eb
b2 89 f8 3c 80 0f 94 c0 e9 b
>> 9 fe ff ff 0f 0b eb fe 0f 0b eb fe <0f> 0b eb fe 0f 0b eb fe 66
90 83 ec 10 89 1c 24 89 c3 89 74 24
>> EIP: [<c01a36b6>] swap_count_continued+0x176/0x180 SS:ESP
0069:eb80be80
>> ---[ end trace 41e4a2572fe1ada6 ]---
>> 
>> I''ve looked at the section of code that is generating the the
fault, but I''m a bit over my head. Does this look like it is a Xen
specific issue, or something that would be better addressed on the LKML? Any
insight you can provide on the source or a fix for this issue would be
appreciated.
>> 
>> Thanks,
>> Peter
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christopher S. Aker

2011-Apr-21 14:57 UTC

head link

Re: [Xen-devel] kernel BUG at mm/swapfile.c:2524

We''re still getting reports of this occurring with 2.6.38 domU. 
Here''s
a fresh one:

------------[ cut here ]------------
kernel BUG at mm/swapfile.c:2524!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/kernel/uevent_seqnum
Modules linked in:

Pid: 12589, comm: apache2 Not tainted 2.6.38-linode31 #1
EIP: 0061:[<c01a36b6>] EFLAGS: 00210246 CPU: 0
EIP is at swap_count_continued+0x176/0x180
EAX: f57ba6c3 EBX: ed3af500 ECX: f57ba000 EDX: 00000000
ESI: ed3c5280 EDI: 00000080 EBP: 000006c3 ESP: d4bcde30
  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
Process apache2 (pid: 12589, ti=d4bcc000 task=ec9c17d0 task.ti=d4bcc000)
Stack:
  ec03c4c0 000166c3 00000000 00000000 c01a37b1 b7d63000 ec03c4c0 000166c3
  00000000 c01a5b67 b7d63000 b7e00000 eae67b18 c0196296 6bb30067 80000006
  c01039c6 0000000c 00000000 00000000 002cd860 0006c92f d0374df0 d17b3040
Call Trace:
  [<c01a37b1>] ? swap_entry_free+0xf1/0x120
  [<c01a5b67>] ? free_swap_and_cache+0x27/0xd0
  [<c0196296>] ? unmap_vmas+0x3d6/0x820
  [<c01039c6>] ? __raw_callee_save_xen_make_pte+0x6/0x8
  [<c019a7d1>] ? exit_mmap+0x91/0x140
  [<c0130ceb>] ? mmput+0x2b/0xc0
  [<c0134697>] ? exit_mm+0xe7/0x120
  [<c063f390>] ? _raw_spin_lock_irq+0x10/0x20
  [<c013614a>] ? do_exit+0x10a/0x6e0
  [<c013675c>] ? do_group_exit+0x3c/0xa0
  [<c01367d1>] ? sys_exit_group+0x11/0x20
  [<c063f761>] ? syscall_call+0x7/0xb
  [<c0630000>] ? sctp_sockaddr_af+0x20/0x90
Code: ff 89 d8 e8 cd f7 f7 ff 01 e8 8d 76 00 c6 00 00 ba 01 00 00 00 eb 
b2 89 f8 3c 80 0f 94 c0 e9 b9 fe ff ff 0f 0b e
b fe 0f 0b eb fe <0f> 0b eb fe 0f 0b eb fe 66 90 83 ec 10 89 1c 24 89 c3 
89 74 24
EIP: [<c01a36b6>] swap_count_continued+0x176/0x180 SS:ESP 0069:d4bcde30
---[ end trace f3fdefcfb4d8b5dc ]---
Fixing recursive fault but reboot is needed!
BUG: scheduling while atomic: apache2/12589/0x00000001
Modules linked in:
Pid: 12589, comm: apache2 Tainted: G      D     2.6.38-linode31 #1
Call Trace:

  [<c063d979>] ? schedule+0x6a9/0x8b0
  [<c010630b>] ? xen_restore_fl_direct_end+0x0/0x1
  [<c063f3b1>] ? _raw_spin_unlock_irqrestore+0x11/0x20
  [<c0133323>] ? console_unlock+0x1c3/0x200
  [<c013374f>] ? vprintk+0x17f/0x3c0
  [<c01365ff>] ? do_exit+0x5bf/0x6e0
  [<c063f367>] ? _raw_spin_lock_irqsave+0x27/0x40
  [<c063f3b1>] ? _raw_spin_unlock_irqrestore+0x11/0x20
  [<c0132e96>] ? kmsg_dump+0x36/0xf0
  [<c0109a90>] ? do_invalid_op+0x0/0x90
  [<c0109a90>] ? do_invalid_op+0x0/0x90
  [<c010bea1>] ? oops_end+0x71/0xa0
  [<c0109b0f>] ? do_invalid_op+0x7f/0x90
  [<c01a36b6>] ? swap_count_continued+0x176/0x180
  [<c016d805>] ? handle_IRQ_event+0x55/0xc0
  [<c0105b37>] ? xen_force_evtchn_callback+0x17/0x30
  [<c0138090>] ? __do_softirq+0x0/0x130
  [<c0106314>] ? check_events+0x8/0xc
  [<c010630b>] ? xen_restore_fl_direct_end+0x0/0x1
  [<c010af92>] ? do_softirq+0x42/0xb0
  [<c063fcd6>] ? error_code+0x5a/0x60
  [<c012007b>] ? __change_page_attr_set_clr+0xa2b/0xb40
  [<c0109a90>] ? do_invalid_op+0x0/0x90
  [<c01a36b6>] ? swap_count_continued+0x176/0x180
  [<c01a37b1>] ? swap_entry_free+0xf1/0x120
  [<c01a5b67>] ? free_swap_and_cache+0x27/0xd0
  [<c0196296>] ? unmap_vmas+0x3d6/0x820
  [<c01039c6>] ? __raw_callee_save_xen_make_pte+0x6/0x8
  [<c019a7d1>] ? exit_mmap+0x91/0x140
  [<c0130ceb>] ? mmput+0x2b/0xc0
  [<c0134697>] ? exit_mm+0xe7/0x120
  [<c063f390>] ? _raw_spin_lock_irq+0x10/0x20
  [<c013614a>] ? do_exit+0x10a/0x6e0
  [<c013675c>] ? do_group_exit+0x3c/0xa0
  [<c01367d1>] ? sys_exit_group+0x11/0x20
  [<c063f761>] ? syscall_call+0x7/0xb
  [<c0630000>] ? sctp_sockaddr_af+0x20/0x90

-Chris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Apr-21 15:14 UTC

head link

Re: [Xen-devel] kernel BUG at mm/swapfile.c:2524

On Thu, Apr 21, 2011 at 10:57:58AM -0400, Christopher S. Aker
wrote:> We''re still getting reports of this occurring with 2.6.38 domU.
Darn. I was hoping it would have just disappeared on its own :-)
> Here''s a fresh one:
> 
> ------------[ cut here ]------------
> kernel BUG at mm/swapfile.c:2524!
> invalid opcode: 0000 [#1] SMP
> last sysfs file: /sys/kernel/uevent_seqnum
> Modules linked in:
> 
> Pid: 12589, comm: apache2 Not tainted 2.6.38-linode31 #1
> EIP: 0061:[<c01a36b6>] EFLAGS: 00210246 CPU: 0
> EIP is at swap_count_continued+0x176/0x180
> EAX: f57ba6c3 EBX: ed3af500 ECX: f57ba000 EDX: 00000000
> ESI: ed3c5280 EDI: 00000080 EBP: 000006c3 ESP: d4bcde30
>  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
> Process apache2 (pid: 12589, ti=d4bcc000 task=ec9c17d0 task.ti=d4bcc000)
> Stack:
>  ec03c4c0 000166c3 00000000 00000000 c01a37b1 b7d63000 ec03c4c0 000166c3
>  00000000 c01a5b67 b7d63000 b7e00000 eae67b18 c0196296 6bb30067 80000006
>  c01039c6 0000000c 00000000 00000000 002cd860 0006c92f d0374df0 d17b3040
> Call Trace:
>  [<c01a37b1>] ? swap_entry_free+0xf1/0x120
>  [<c01a5b67>] ? free_swap_and_cache+0x27/0xd0
>  [<c0196296>] ? unmap_vmas+0x3d6/0x820
>  [<c01039c6>] ? __raw_callee_save_xen_make_pte+0x6/0x8
>  [<c019a7d1>] ? exit_mmap+0x91/0x140
>  [<c0130ceb>] ? mmput+0x2b/0xc0
>  [<c0134697>] ? exit_mm+0xe7/0x120
>  [<c063f390>] ? _raw_spin_lock_irq+0x10/0x20
>  [<c013614a>] ? do_exit+0x10a/0x6e0
>  [<c013675c>] ? do_group_exit+0x3c/0xa0
>  [<c01367d1>] ? sys_exit_group+0x11/0x20
>  [<c063f761>] ? syscall_call+0x7/0xb
>  [<c0630000>] ? sctp_sockaddr_af+0x20/0x90
> Code: ff 89 d8 e8 cd f7 f7 ff 01 e8 8d 76 00 c6 00 00 ba 01 00 00 00
> eb b2 89 f8 3c 80 0f 94 c0 e9 b9 fe ff ff 0f 0b e
> b fe 0f 0b eb fe <0f> 0b eb fe 0f 0b eb fe 66 90 83 ec 10 89 1c 24
> 89 c3 89 74 24
> EIP: [<c01a36b6>] swap_count_continued+0x176/0x180 SS:ESP
0069:d4bcde30
> ---[ end trace f3fdefcfb4d8b5dc ]---
> Fixing recursive fault but reboot is needed!
> BUG: scheduling while atomic: apache2/12589/0x00000001
> Modules linked in:
> Pid: 12589, comm: apache2 Tainted: G      D     2.6.38-linode31 #1
Aaah, so another HTTP server.

How can I reproduce this?> Call Trace:
> 
>  [<c063d979>] ? schedule+0x6a9/0x8b0
>  [<c010630b>] ? xen_restore_fl_direct_end+0x0/0x1
>  [<c063f3b1>] ? _raw_spin_unlock_irqrestore+0x11/0x20
>  [<c0133323>] ? console_unlock+0x1c3/0x200
>  [<c013374f>] ? vprintk+0x17f/0x3c0
>  [<c01365ff>] ? do_exit+0x5bf/0x6e0
>  [<c063f367>] ? _raw_spin_lock_irqsave+0x27/0x40
>  [<c063f3b1>] ? _raw_spin_unlock_irqrestore+0x11/0x20
>  [<c0132e96>] ? kmsg_dump+0x36/0xf0
>  [<c0109a90>] ? do_invalid_op+0x0/0x90
>  [<c0109a90>] ? do_invalid_op+0x0/0x90
>  [<c010bea1>] ? oops_end+0x71/0xa0
>  [<c0109b0f>] ? do_invalid_op+0x7f/0x90
>  [<c01a36b6>] ? swap_count_continued+0x176/0x180
>  [<c016d805>] ? handle_IRQ_event+0x55/0xc0
>  [<c0105b37>] ? xen_force_evtchn_callback+0x17/0x30
>  [<c0138090>] ? __do_softirq+0x0/0x130
>  [<c0106314>] ? check_events+0x8/0xc
>  [<c010630b>] ? xen_restore_fl_direct_end+0x0/0x1
>  [<c010af92>] ? do_softirq+0x42/0xb0
>  [<c063fcd6>] ? error_code+0x5a/0x60
>  [<c012007b>] ? __change_page_attr_set_clr+0xa2b/0xb40
>  [<c0109a90>] ? do_invalid_op+0x0/0x90
>  [<c01a36b6>] ? swap_count_continued+0x176/0x180
>  [<c01a37b1>] ? swap_entry_free+0xf1/0x120
>  [<c01a5b67>] ? free_swap_and_cache+0x27/0xd0
So back to unmap_vmas. I wonder if we are missing the lazy_unmap flag
in 2.6.38.
>  [<c0196296>] ? unmap_vmas+0x3d6/0x820
>  [<c01039c6>] ? __raw_callee_save_xen_make_pte+0x6/0x8
>  [<c019a7d1>] ? exit_mmap+0x91/0x140
>  [<c0130ceb>] ? mmput+0x2b/0xc0
>  [<c0134697>] ? exit_mm+0xe7/0x120
>  [<c063f390>] ? _raw_spin_lock_irq+0x10/0x20
>  [<c013614a>] ? do_exit+0x10a/0x6e0
>  [<c013675c>] ? do_group_exit+0x3c/0xa0
>  [<c01367d1>] ? sys_exit_group+0x11/0x20
>  [<c063f761>] ? syscall_call+0x7/0xb
>  [<c0630000>] ? sctp_sockaddr_af+0x20/0x90
> 
> -Chris
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2011-Apr-21 15:23 UTC

head link

RE: [Xen-devel] kernel BUG at mm/swapfile.c:2524

> -----Original Message-----
> From: Christopher S. Aker [mailto:caker@theshore.net]
> Sent: Thursday, April 21, 2011 8:58 AM
> To: xen-devel@lists.xensource.com
> Cc: Konrad Rzeszutek Wilk; Peter Sandin
> Subject: Re: [Xen-devel] kernel BUG at mm/swapfile.c:2524
> 
> We''re still getting reports of this occurring with 2.6.38 domU. 
Here''s
> a fresh one:
IMHO, you might get more traction on this problem by asking
questions in linux-mm or lkml.  With limited study, it''s not obvious
to me that this problem could be Xen-related; more likely
a Linux swap subsystem race that gets provoked because
Xen is scheduling differently than bare metal.

 > ------------[ cut here ]------------
> kernel BUG at mm/swapfile.c:2524!
> invalid opcode: 0000 [#1] SMP
> last sysfs file: /sys/kernel/uevent_seqnum
> Modules linked in:
> 
> Pid: 12589, comm: apache2 Not tainted 2.6.38-linode31 #1
> EIP: 0061:[<c01a36b6>] EFLAGS: 00210246 CPU: 0
> EIP is at swap_count_continued+0x176/0x180
> EAX: f57ba6c3 EBX: ed3af500 ECX: f57ba000 EDX: 00000000
> ESI: ed3c5280 EDI: 00000080 EBP: 000006c3 ESP: d4bcde30
>   DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
> Process apache2 (pid: 12589, ti=d4bcc000 task=ec9c17d0
> task.ti=d4bcc000)
> Stack:
>   ec03c4c0 000166c3 00000000 00000000 c01a37b1 b7d63000 ec03c4c0
> 000166c3
>   00000000 c01a5b67 b7d63000 b7e00000 eae67b18 c0196296 6bb30067
> 80000006
>   c01039c6 0000000c 00000000 00000000 002cd860 0006c92f d0374df0
> d17b3040
> Call Trace:
>   [<c01a37b1>] ? swap_entry_free+0xf1/0x120
>   [<c01a5b67>] ? free_swap_and_cache+0x27/0xd0
>   [<c0196296>] ? unmap_vmas+0x3d6/0x820
>   [<c01039c6>] ? __raw_callee_save_xen_make_pte+0x6/0x8
>   [<c019a7d1>] ? exit_mmap+0x91/0x140
>   [<c0130ceb>] ? mmput+0x2b/0xc0
>   [<c0134697>] ? exit_mm+0xe7/0x120
>   [<c063f390>] ? _raw_spin_lock_irq+0x10/0x20
>   [<c013614a>] ? do_exit+0x10a/0x6e0
>   [<c013675c>] ? do_group_exit+0x3c/0xa0
>   [<c01367d1>] ? sys_exit_group+0x11/0x20
>   [<c063f761>] ? syscall_call+0x7/0xb
>   [<c0630000>] ? sctp_sockaddr_af+0x20/0x90
> Code: ff 89 d8 e8 cd f7 f7 ff 01 e8 8d 76 00 c6 00 00 ba 01 00 00 00 eb
> b2 89 f8 3c 80 0f 94 c0 e9 b9 fe ff ff 0f 0b e
> b fe 0f 0b eb fe <0f> 0b eb fe 0f 0b eb fe 66 90 83 ec 10 89 1c 24 89
> c3
> 89 74 24
> EIP: [<c01a36b6>] swap_count_continued+0x176/0x180 SS:ESP
0069:d4bcde30
> ---[ end trace f3fdefcfb4d8b5dc ]---
> Fixing recursive fault but reboot is needed!
> BUG: scheduling while atomic: apache2/12589/0x00000001
> Modules linked in:
> Pid: 12589, comm: apache2 Tainted: G      D     2.6.38-linode31 #1
> Call Trace:
> 
>   [<c063d979>] ? schedule+0x6a9/0x8b0
>   [<c010630b>] ? xen_restore_fl_direct_end+0x0/0x1
>   [<c063f3b1>] ? _raw_spin_unlock_irqrestore+0x11/0x20
>   [<c0133323>] ? console_unlock+0x1c3/0x200
>   [<c013374f>] ? vprintk+0x17f/0x3c0
>   [<c01365ff>] ? do_exit+0x5bf/0x6e0
>   [<c063f367>] ? _raw_spin_lock_irqsave+0x27/0x40
>   [<c063f3b1>] ? _raw_spin_unlock_irqrestore+0x11/0x20
>   [<c0132e96>] ? kmsg_dump+0x36/0xf0
>   [<c0109a90>] ? do_invalid_op+0x0/0x90
>   [<c0109a90>] ? do_invalid_op+0x0/0x90
>   [<c010bea1>] ? oops_end+0x71/0xa0
>   [<c0109b0f>] ? do_invalid_op+0x7f/0x90
>   [<c01a36b6>] ? swap_count_continued+0x176/0x180
>   [<c016d805>] ? handle_IRQ_event+0x55/0xc0
>   [<c0105b37>] ? xen_force_evtchn_callback+0x17/0x30
>   [<c0138090>] ? __do_softirq+0x0/0x130
>   [<c0106314>] ? check_events+0x8/0xc
>   [<c010630b>] ? xen_restore_fl_direct_end+0x0/0x1
>   [<c010af92>] ? do_softirq+0x42/0xb0
>   [<c063fcd6>] ? error_code+0x5a/0x60
>   [<c012007b>] ? __change_page_attr_set_clr+0xa2b/0xb40
>   [<c0109a90>] ? do_invalid_op+0x0/0x90
>   [<c01a36b6>] ? swap_count_continued+0x176/0x180
>   [<c01a37b1>] ? swap_entry_free+0xf1/0x120
>   [<c01a5b67>] ? free_swap_and_cache+0x27/0xd0
>   [<c0196296>] ? unmap_vmas+0x3d6/0x820
>   [<c01039c6>] ? __raw_callee_save_xen_make_pte+0x6/0x8
>   [<c019a7d1>] ? exit_mmap+0x91/0x140
>   [<c0130ceb>] ? mmput+0x2b/0xc0
>   [<c0134697>] ? exit_mm+0xe7/0x120
>   [<c063f390>] ? _raw_spin_lock_irq+0x10/0x20
>   [<c013614a>] ? do_exit+0x10a/0x6e0
>   [<c013675c>] ? do_group_exit+0x3c/0xa0
>   [<c01367d1>] ? sys_exit_group+0x11/0x20
>   [<c063f761>] ? syscall_call+0x7/0xb
>   [<c0630000>] ? sctp_sockaddr_af+0x20/0x90
> 
> -Chris
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Apr-21 15:24 UTC

head link

Re: [Xen-devel] kernel BUG at mm/swapfile.c:2524

> So back to unmap_vmas. I wonder if we are missing the lazy_unmap flag
> in 2.6.38.
> 
> >  [<c0196296>] ? unmap_vmas+0x3d6/0x820
Try the attached patch

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index f608942..19444e6 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -2049,6 +2049,8 @@ void __init xen_init_mmu_ops(void)
 	x86_init.paging.pagetable_setup_done = xen_pagetable_setup_done;
 	pv_mmu_ops = xen_mmu_ops;
 
+	vmap_lazy_unmap = false;
+
 	memset(dummy_mapping, 0xff, PAGE_SIZE);
 }
 
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 4ed6fcd..65c3d39 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -7,6 +7,8 @@
 
 struct vm_area_struct;		/* vma defining user mapping in mm_types.h */
 
+extern bool vmap_lazy_unmap;
+
 /* bits in flags of vmalloc''s vm_struct below */
 #define VM_IOREMAP	0x00000001	/* ioremap() and friends */
 #define VM_ALLOC	0x00000002	/* vmalloc() */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index f9b1667..6ab12de 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -31,6 +31,8 @@
 #include <asm/tlbflush.h>
 #include <asm/shmparam.h>
 
+bool vmap_lazy_unmap __read_mostly = true;
+
 /*** Page table manipulation functions ***/
 
 static void vunmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end)
@@ -501,6 +503,9 @@ static unsigned long lazy_max_pages(void)
 {
 	unsigned int log;
 
+	if (!vmap_lazy_unmap)
+		return 0;
+
 	log = fls(num_online_cpus());
 
 	return log * (32UL * 1024 * 1024 / PAGE_SIZE);


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christopher S. Aker

2011-Apr-27 17:49 UTC

head link

Re: [Xen-devel] kernel BUG at mm/swapfile.c:2524

On 4/21/11 11:24 AM, Konrad Rzeszutek Wilk wrote:>> So back to unmap_vmas. I wonder if we are missing the lazy_unmap flag
>> in 2.6.38.
> Try the attached patch
Patched and deployed last week, got this report last night:

------------[ cut here ]------------
kernel BUG at mm/swapfile.c:2529!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/devices/vbd-51712/block/xvda/removable
Modules linked in:

Pid: 17319, comm: apache2 Not tainted 2.6.38.3-linode32 #1
EIP: 0061:[<c01a3796>] EFLAGS: 00010246 CPU: 3
EIP is at swap_count_continued+0x176/0x180
EAX: f577ed93 EBX: ecf60600 ECX: f577e000 EDX: 00000000
ESI: ed3d82e0 EDI: 00000080 EBP: 00000d93 ESP: c3bc5e80
  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
Process apache2 (pid: 17319, ti=c3bc4000 task=eb5af7d0 task.ti=c3bc4000)
Stack:
  eb732cc0 0000ed93 00000040 00000000 c01a3891 0000ed93 ea3d9e28 ec521b00
  00000000 c01a3af8 00000000 c019569c 00000000 80000004 00000000 c0106314
  000000b1 ec60114c 0000ed93 d46d5d58 00000000 ffffffe8 b8b32018 00000000
Call Trace:
  [<c01a3891>] ? swap_entry_free+0xf1/0x120
  [<c01a3af8>] ? swap_free+0x18/0x30
  [<c019569c>] ? handle_pte_fault+0x49c/0xac0
  [<c0106314>] ? check_events+0x8/0xc
  [<c0196f61>] ? handle_mm_fault+0x101/0x1a0
  [<c011e81b>] ? do_page_fault+0xfb/0x3e0
  [<c01b13ee>] ? sys_stat64+0x1e/0x30
  [<c01581a4>] ? tick_resume_broadcast+0x4/0x90
  [<c011e720>] ? do_page_fault+0x0/0x3e0
  [<c063fea6>] ? error_code+0x5a/0x60
  [<c0630000>] ? sctp_get_port_local+0x270/0x340
  [<c011e720>] ? do_page_fault+0x0/0x3e0
Code: ff 89 d8 e8 cd f6 f7 ff 01 e8 8d 76 00 c6 00 00 ba 01 00 00 00 eb 
b2 89 f8 3c 80 0f 94 c0 e9 b9 fe
  ff ff 0f 0b eb fe 0f 0b eb fe <0f> 0b eb fe 0f 0b eb fe 66 90 83 ec 10 
89 1c 24 89 c3 89 74 24
EIP: [<c01a3796>] swap_count_continued+0x176/0x180 SS:ESP 0069:c3bc5e80
---[ end trace 6a2c16b916f3591f ]---

:(

-Chris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Apr-27 18:16 UTC

head link

Re: [Xen-devel] kernel BUG at mm/swapfile.c:2524

On Wed, Apr 27, 2011 at 01:49:29PM -0400, Christopher S. Aker
wrote:> On 4/21/11 11:24 AM, Konrad Rzeszutek Wilk wrote:
> >>So back to unmap_vmas. I wonder if we are missing the lazy_unmap
flag
> >>in 2.6.38.
> >Try the attached patch
> 
> Patched and deployed last week, got this report last night:
Ok, I need to be able to reproduce this. How do I do that?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christopher S. Aker

2011-May-31 14:43 UTC

head link

Re: [Xen-devel] kernel BUG at mm/swapfile.c:2524

On 4/27/11 2:16 PM, Konrad Rzeszutek Wilk wrote:> Ok, I need to be able to reproduce this. How do I do that?
Yeah - if only we knew. We haven''t been able to identify any patterns
in
the reports we''ve been getting.

The problem still exists in 2.6.39:

------------[ cut here ]------------
kernel BUG at mm/swapfile.c:2527!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/devices/vbd-51712/block/xvda/removable
Modules linked in:

Pid: 3701, comm: apache2 Not tainted 2.6.39-linode33 #5
EIP: 0061:[<c01a8fb6>] EFLAGS: 00010246 CPU: 0
EIP is at swap_count_continued+0x176/0x180
EAX: f57ba2e3 EBX: ebacd820 ECX: f57ba000 EDX: 00000000
ESI: ebfd81a0 EDI: 00000080 EBP: 000002e3 ESP: cb4d1e1c
  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
Process apache2 (pid: 3701, ti=cb4d0000 task=c2407030 task.ti=cb4d0000)
Stack:
  e9c27ec0 000042e3 00000040 00000000 c01a90b1 ce9762d8 e9c27ec0 000042e3
  00000000 c01ab367 ce9762d8 b825b000 cb4d1f04 c019b6b3 72d4e045 80000003
  0229c063 c0103fc5 93563000 00000006 00085c60 ce0cfe40 eb3a9ba0 72d4e045
Call Trace:
  [<c01a90b1>] ? swap_entry_free+0xf1/0x120
  [<c01ab367>] ? free_swap_and_cache+0x27/0xd0
  [<c019b6b3>] ? zap_pte_range+0x1b3/0x480
  [<c0103fc5>] ? pte_pfn_to_mfn+0xb5/0xd0
  [<c019ba91>] ? unmap_page_range+0x111/0x190
  [<c019bc3b>] ? unmap_vmas+0x12b/0x1e0
  [<c019ff01>] ? exit_mmap+0x91/0x140
  [<c013247b>] ? mmput+0x2b/0xc0
  [<c0135d7f>] ? exit_mm+0xef/0x120
  [<c068f4b0>] ? _raw_spin_lock_irq+0x10/0x20
  [<c0137975>] ? do_exit+0x125/0x350
  [<c019fe57>] ? remove_vma+0x37/0x50
  [<c0137bdc>] ? do_group_exit+0x3c/0xa0
  [<c0137c51>] ? sys_exit_group+0x11/0x20
  [<c068f7b1>] ? syscall_call+0x7/0xb
  [<c0680000>] ? sctp_icmp_proto_unreachable+0x20/0xc0
Code: ff 89 d8 e8 ed a7 f7 ff 01 e8 8d 76 00 c6 00 00 ba 01 00 00 00 eb 
b2 89 f8 3c 80 0f 94 c0 e9 b9 fe ff ff 0f 0b eb fe 0f 0b eb fe <0f> 0b 
eb fe 0f 0b eb fe 66 90 83 ec 10 89 1c 24 89 c3 89 74 24
EIP: [<c01a8fb6>] swap_count_continued+0x176/0x180 SS:ESP 0069:cb4d1e1c
---[ end trace 39b6b8ea9add1a97 ]---
Fixing recursive fault but reboot is needed!
BUG: scheduling while atomic: apache2/3701/0x00000001
Modules linked in:
Pid: 3701, comm: apache2 Tainted: G      D     2.6.39-linode33 #5
Call Trace:
  [<c068dade>] ? schedule+0x50e/0x6d0
  [<c013484f>] ? vprintk+0x1af/0x3e0
  [<c0105b17>] ? xen_force_evtchn_callback+0x17/0x30
  [<c0109370>] ? do_coprocessor_segment_overrun+0x80/0x80
  [<c0109370>] ? do_coprocessor_segment_overrun+0x80/0x80
  [<c0137b37>] ? do_exit+0x2e7/0x350
  [<c0109370>] ? do_coprocessor_segment_overrun+0x80/0x80
  [<c0109370>] ? do_coprocessor_segment_overrun+0x80/0x80
  [<c010b7d1>] ? oops_end+0x71/0xa0
  [<c01093ef>] ? do_invalid_op+0x7f/0x90
  [<c01a8fb6>] ? swap_count_continued+0x176/0x180
  [<c0173821>] ? handle_percpu_irq+0x31/0x50
  [<c0105b17>] ? xen_force_evtchn_callback+0x17/0x30
  [<c0139450>] ? __local_bh_enable+0x70/0x70
  [<c01062f4>] ? check_events+0x8/0xc
  [<c01062eb>] ? xen_restore_fl_direct_reloc+0x4/0x4
  [<c010a882>] ? do_softirq+0x42/0xb0
  [<c0139371>] ? irq_exit+0x31/0x90
  [<c044c94d>] ? xen_evtchn_do_upcall+0x1d/0x30
  [<c0103fc5>] ? pte_pfn_to_mfn+0xb5/0xd0
  [<c068fd26>] ? error_code+0x5a/0x60
  [<c012007b>] ? try_preserve_large_page+0x7b/0x340
  [<c0109370>] ? do_coprocessor_segment_overrun+0x80/0x80
  [<c01a8fb6>] ? swap_count_continued+0x176/0x180
  [<c01a90b1>] ? swap_entry_free+0xf1/0x120
  [<c01ab367>] ? free_swap_and_cache+0x27/0xd0
  [<c019b6b3>] ? zap_pte_range+0x1b3/0x480
  [<c0103fc5>] ? pte_pfn_to_mfn+0xb5/0xd0
  [<c019ba91>] ? unmap_page_range+0x111/0x190
  [<c019bc3b>] ? unmap_vmas+0x12b/0x1e0
  [<c019ff01>] ? exit_mmap+0x91/0x140
  [<c013247b>] ? mmput+0x2b/0xc0
  [<c0135d7f>] ? exit_mm+0xef/0x120
  [<c068f4b0>] ? _raw_spin_lock_irq+0x10/0x20
  [<c0137975>] ? do_exit+0x125/0x350
  [<c019fe57>] ? remove_vma+0x37/0x50
  [<c0137bdc>] ? do_group_exit+0x3c/0xa0
  [<c0137c51>] ? sys_exit_group+0x11/0x20
  [<c068f7b1>] ? syscall_call+0x7/0xb
  [<c0680000>] ? sctp_icmp_proto_unreachable+0x20/0xc0

-Chris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Apr 2011 - kernel BUG at mm/swapfile.c:2524

[Xen-devel] kernel BUG at mm/swapfile.c:2524

Re: [Xen-devel] kernel BUG at mm/swapfile.c:2524

Re: [Xen-devel] kernel BUG at mm/swapfile.c:2524

Re: [Xen-devel] kernel BUG at mm/swapfile.c:2524

Re: [Xen-devel] kernel BUG at mm/swapfile.c:2524

RE: [Xen-devel] kernel BUG at mm/swapfile.c:2524

Re: [Xen-devel] kernel BUG at mm/swapfile.c:2524

Re: [Xen-devel] kernel BUG at mm/swapfile.c:2524

Re: [Xen-devel] kernel BUG at mm/swapfile.c:2524

Re: [Xen-devel] kernel BUG at mm/swapfile.c:2524