Hello, We''ve got some 2.6.38 domUs that are hitting a bug in mm/swapfile.c. The issue has only cropped up since we have moved to 2.6.38. This issue has happened on multiple separate physical machines. I''ve attached the trace from one instance here, additional instances can be found along with a copy of the domU kernel image and configuration at: http://thesandins.net/xen/2.6.38/ ------------[ cut here ]------------ kernel BUG at mm/swapfile.c:2524! invalid opcode: 0000 [#1] SMP last sysfs file: /sys/devices/vbd-51728/block/xvdb/stat Modules linked in: Pid: 539, comm: apache2 Not tainted 2.6.38-linode31 #1 EIP: 0061:[<c01a36b6>] EFLAGS: 00010246 CPU: 0 EIP is at swap_count_continued+0x176/0x180 EAX: f57ba5f8 EBX: ed3c8f00 ECX: f57ba000 EDX: 00000000 ESI: ed3c5320 EDI: 00000080 EBP: 000005f8 ESP: eb80be80 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069 Process apache2 (pid: 539, ti=eb80a000 task=cc472be0 task.ti=eb80a000) Stack: ec8e68c0 000195f8 00000040 00000000 c01a37b1 000195f8 ec96ae20 ed14b000 00000000 c01a3a18 00000000 c01955cc 00000000 80000007 00000000 c0106314 00000023 ecf7ac4c 000195f8 ca6e0290 00000000 ffffffe8 b883d664 00000000 Call Trace: [<c01a37b1>] ? swap_entry_free+0xf1/0x120 [<c01a3a18>] ? swap_free+0x18/0x30 [<c01955cc>] ? handle_pte_fault+0x49c/0xac0 [<c0106314>] ? check_events+0x8/0xc [<c0196e91>] ? handle_mm_fault+0x101/0x1a0 [<c011e81b>] ? do_page_fault+0xfb/0x3e0 [<c063f390>] ? _raw_spin_lock_irq+0x10/0x20 [<c063fcd6>] ? error_code+0x5a/0x60 [<c013f397>] ? sys_rt_sigaction+0x77/0xa0 [<c011e720>] ? do_page_fault+0x0/0x3e0 [<c063fcd6>] ? error_code+0x5a/0x60 [<c0630000>] ? sctp_sockaddr_af+0x20/0x90 [<c011e720>] ? do_page_fault+0x0/0x3e0 Code: ff 89 d8 e8 cd f7 f7 ff 01 e8 8d 76 00 c6 00 00 ba 01 00 00 00 eb b2 89 f8 3c 80 0f 94 c0 e9 b 9 fe ff ff 0f 0b eb fe 0f 0b eb fe <0f> 0b eb fe 0f 0b eb fe 66 90 83 ec 10 89 1c 24 89 c3 89 74 24 EIP: [<c01a36b6>] swap_count_continued+0x176/0x180 SS:ESP 0069:eb80be80 ---[ end trace 41e4a2572fe1ada6 ]--- I''ve looked at the section of code that is generating the the fault, but I''m a bit over my head. Does this look like it is a Xen specific issue, or something that would be better addressed on the LKML? Any insight you can provide on the source or a fix for this issue would be appreciated. Thanks, Peter _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Apr-07 13:50 UTC
Re: [Xen-devel] kernel BUG at mm/swapfile.c:2524
On Wed, Apr 06, 2011 at 05:59:03PM -0400, Peter Sandin wrote:> Hello, > > We''ve got some 2.6.38 domUs that are hitting a bug in mm/swapfile.c. The issue has only cropped up since we have moved to 2.6.38. This issue has happened on multiple separate physical machines. I''ve attached the trace from one instance here, additional instances can be found along with a copy of the domU kernel image and configuration at: > > http://thesandins.net/xen/2.6.38/What exactly happend to cause this?> > ------------[ cut here ]------------ > kernel BUG at mm/swapfile.c:2524! > invalid opcode: 0000 [#1] SMP > last sysfs file: /sys/devices/vbd-51728/block/xvdb/stat > Modules linked in: > > Pid: 539, comm: apache2 Not tainted 2.6.38-linode31 #1 > EIP: 0061:[<c01a36b6>] EFLAGS: 00010246 CPU: 0 > EIP is at swap_count_continued+0x176/0x180 > EAX: f57ba5f8 EBX: ed3c8f00 ECX: f57ba000 EDX: 00000000 > ESI: ed3c5320 EDI: 00000080 EBP: 000005f8 ESP: eb80be80 > DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069 > Process apache2 (pid: 539, ti=eb80a000 task=cc472be0 task.ti=eb80a000) > Stack: > ec8e68c0 000195f8 00000040 00000000 c01a37b1 000195f8 ec96ae20 ed14b000 > 00000000 c01a3a18 00000000 c01955cc 00000000 80000007 00000000 c0106314 > 00000023 ecf7ac4c 000195f8 ca6e0290 00000000 ffffffe8 b883d664 00000000 > Call Trace: > [<c01a37b1>] ? swap_entry_free+0xf1/0x120 > [<c01a3a18>] ? swap_free+0x18/0x30 > [<c01955cc>] ? handle_pte_fault+0x49c/0xac0 > [<c0106314>] ? check_events+0x8/0xc > [<c0196e91>] ? handle_mm_fault+0x101/0x1a0 > [<c011e81b>] ? do_page_fault+0xfb/0x3e0 > [<c063f390>] ? _raw_spin_lock_irq+0x10/0x20 > [<c063fcd6>] ? error_code+0x5a/0x60 > [<c013f397>] ? sys_rt_sigaction+0x77/0xa0 > [<c011e720>] ? do_page_fault+0x0/0x3e0 > [<c063fcd6>] ? error_code+0x5a/0x60 > [<c0630000>] ? sctp_sockaddr_af+0x20/0x90 > [<c011e720>] ? do_page_fault+0x0/0x3e0 > Code: ff 89 d8 e8 cd f7 f7 ff 01 e8 8d 76 00 c6 00 00 ba 01 00 00 00 eb b2 89 f8 3c 80 0f 94 c0 e9 b > 9 fe ff ff 0f 0b eb fe 0f 0b eb fe <0f> 0b eb fe 0f 0b eb fe 66 90 83 ec 10 89 1c 24 89 c3 89 74 24 > EIP: [<c01a36b6>] swap_count_continued+0x176/0x180 SS:ESP 0069:eb80be80 > ---[ end trace 41e4a2572fe1ada6 ]--- > > I''ve looked at the section of code that is generating the the fault, but I''m a bit over my head. Does this look like it is a Xen specific issue, or something that would be better addressed on the LKML? Any insight you can provide on the source or a fix for this issue would be appreciated. > > Thanks, > Peter > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
The traces included here are from customer instances so we don''t know exactly what they were doing at the time they hit this. Looking at their IO usage, and the other information included with their reports it sounds like they were swapping heavily in most cases. Unfortunately we haven''t been able to reproduce this in a controlled environment. If you have any suggested tests that I could run to help reproduce, or narrow down the source of this bug, I can certainly give those a try. Another data point that may be helpful is that we have only seen this issue with 32bit kernels. --Peter On Apr 7, 2011, at 9:50 AM, Konrad Rzeszutek Wilk wrote:> On Wed, Apr 06, 2011 at 05:59:03PM -0400, Peter Sandin wrote: >> Hello, >> >> We''ve got some 2.6.38 domUs that are hitting a bug in mm/swapfile.c. The issue has only cropped up since we have moved to 2.6.38. This issue has happened on multiple separate physical machines. I''ve attached the trace from one instance here, additional instances can be found along with a copy of the domU kernel image and configuration at: >> >> http://thesandins.net/xen/2.6.38/ > > What exactly happend to cause this? >> >> ------------[ cut here ]------------ >> kernel BUG at mm/swapfile.c:2524! >> invalid opcode: 0000 [#1] SMP >> last sysfs file: /sys/devices/vbd-51728/block/xvdb/stat >> Modules linked in: >> >> Pid: 539, comm: apache2 Not tainted 2.6.38-linode31 #1 >> EIP: 0061:[<c01a36b6>] EFLAGS: 00010246 CPU: 0 >> EIP is at swap_count_continued+0x176/0x180 >> EAX: f57ba5f8 EBX: ed3c8f00 ECX: f57ba000 EDX: 00000000 >> ESI: ed3c5320 EDI: 00000080 EBP: 000005f8 ESP: eb80be80 >> DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069 >> Process apache2 (pid: 539, ti=eb80a000 task=cc472be0 task.ti=eb80a000) >> Stack: >> ec8e68c0 000195f8 00000040 00000000 c01a37b1 000195f8 ec96ae20 ed14b000 >> 00000000 c01a3a18 00000000 c01955cc 00000000 80000007 00000000 c0106314 >> 00000023 ecf7ac4c 000195f8 ca6e0290 00000000 ffffffe8 b883d664 00000000 >> Call Trace: >> [<c01a37b1>] ? swap_entry_free+0xf1/0x120 >> [<c01a3a18>] ? swap_free+0x18/0x30 >> [<c01955cc>] ? handle_pte_fault+0x49c/0xac0 >> [<c0106314>] ? check_events+0x8/0xc >> [<c0196e91>] ? handle_mm_fault+0x101/0x1a0 >> [<c011e81b>] ? do_page_fault+0xfb/0x3e0 >> [<c063f390>] ? _raw_spin_lock_irq+0x10/0x20 >> [<c063fcd6>] ? error_code+0x5a/0x60 >> [<c013f397>] ? sys_rt_sigaction+0x77/0xa0 >> [<c011e720>] ? do_page_fault+0x0/0x3e0 >> [<c063fcd6>] ? error_code+0x5a/0x60 >> [<c0630000>] ? sctp_sockaddr_af+0x20/0x90 >> [<c011e720>] ? do_page_fault+0x0/0x3e0 >> Code: ff 89 d8 e8 cd f7 f7 ff 01 e8 8d 76 00 c6 00 00 ba 01 00 00 00 eb b2 89 f8 3c 80 0f 94 c0 e9 b >> 9 fe ff ff 0f 0b eb fe 0f 0b eb fe <0f> 0b eb fe 0f 0b eb fe 66 90 83 ec 10 89 1c 24 89 c3 89 74 24 >> EIP: [<c01a36b6>] swap_count_continued+0x176/0x180 SS:ESP 0069:eb80be80 >> ---[ end trace 41e4a2572fe1ada6 ]--- >> >> I''ve looked at the section of code that is generating the the fault, but I''m a bit over my head. Does this look like it is a Xen specific issue, or something that would be better addressed on the LKML? Any insight you can provide on the source or a fix for this issue would be appreciated. >> >> Thanks, >> Peter >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christopher S. Aker
2011-Apr-21 14:57 UTC
Re: [Xen-devel] kernel BUG at mm/swapfile.c:2524
We''re still getting reports of this occurring with 2.6.38 domU. Here''s a fresh one: ------------[ cut here ]------------ kernel BUG at mm/swapfile.c:2524! invalid opcode: 0000 [#1] SMP last sysfs file: /sys/kernel/uevent_seqnum Modules linked in: Pid: 12589, comm: apache2 Not tainted 2.6.38-linode31 #1 EIP: 0061:[<c01a36b6>] EFLAGS: 00210246 CPU: 0 EIP is at swap_count_continued+0x176/0x180 EAX: f57ba6c3 EBX: ed3af500 ECX: f57ba000 EDX: 00000000 ESI: ed3c5280 EDI: 00000080 EBP: 000006c3 ESP: d4bcde30 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069 Process apache2 (pid: 12589, ti=d4bcc000 task=ec9c17d0 task.ti=d4bcc000) Stack: ec03c4c0 000166c3 00000000 00000000 c01a37b1 b7d63000 ec03c4c0 000166c3 00000000 c01a5b67 b7d63000 b7e00000 eae67b18 c0196296 6bb30067 80000006 c01039c6 0000000c 00000000 00000000 002cd860 0006c92f d0374df0 d17b3040 Call Trace: [<c01a37b1>] ? swap_entry_free+0xf1/0x120 [<c01a5b67>] ? free_swap_and_cache+0x27/0xd0 [<c0196296>] ? unmap_vmas+0x3d6/0x820 [<c01039c6>] ? __raw_callee_save_xen_make_pte+0x6/0x8 [<c019a7d1>] ? exit_mmap+0x91/0x140 [<c0130ceb>] ? mmput+0x2b/0xc0 [<c0134697>] ? exit_mm+0xe7/0x120 [<c063f390>] ? _raw_spin_lock_irq+0x10/0x20 [<c013614a>] ? do_exit+0x10a/0x6e0 [<c013675c>] ? do_group_exit+0x3c/0xa0 [<c01367d1>] ? sys_exit_group+0x11/0x20 [<c063f761>] ? syscall_call+0x7/0xb [<c0630000>] ? sctp_sockaddr_af+0x20/0x90 Code: ff 89 d8 e8 cd f7 f7 ff 01 e8 8d 76 00 c6 00 00 ba 01 00 00 00 eb b2 89 f8 3c 80 0f 94 c0 e9 b9 fe ff ff 0f 0b e b fe 0f 0b eb fe <0f> 0b eb fe 0f 0b eb fe 66 90 83 ec 10 89 1c 24 89 c3 89 74 24 EIP: [<c01a36b6>] swap_count_continued+0x176/0x180 SS:ESP 0069:d4bcde30 ---[ end trace f3fdefcfb4d8b5dc ]--- Fixing recursive fault but reboot is needed! BUG: scheduling while atomic: apache2/12589/0x00000001 Modules linked in: Pid: 12589, comm: apache2 Tainted: G D 2.6.38-linode31 #1 Call Trace: [<c063d979>] ? schedule+0x6a9/0x8b0 [<c010630b>] ? xen_restore_fl_direct_end+0x0/0x1 [<c063f3b1>] ? _raw_spin_unlock_irqrestore+0x11/0x20 [<c0133323>] ? console_unlock+0x1c3/0x200 [<c013374f>] ? vprintk+0x17f/0x3c0 [<c01365ff>] ? do_exit+0x5bf/0x6e0 [<c063f367>] ? _raw_spin_lock_irqsave+0x27/0x40 [<c063f3b1>] ? _raw_spin_unlock_irqrestore+0x11/0x20 [<c0132e96>] ? kmsg_dump+0x36/0xf0 [<c0109a90>] ? do_invalid_op+0x0/0x90 [<c0109a90>] ? do_invalid_op+0x0/0x90 [<c010bea1>] ? oops_end+0x71/0xa0 [<c0109b0f>] ? do_invalid_op+0x7f/0x90 [<c01a36b6>] ? swap_count_continued+0x176/0x180 [<c016d805>] ? handle_IRQ_event+0x55/0xc0 [<c0105b37>] ? xen_force_evtchn_callback+0x17/0x30 [<c0138090>] ? __do_softirq+0x0/0x130 [<c0106314>] ? check_events+0x8/0xc [<c010630b>] ? xen_restore_fl_direct_end+0x0/0x1 [<c010af92>] ? do_softirq+0x42/0xb0 [<c063fcd6>] ? error_code+0x5a/0x60 [<c012007b>] ? __change_page_attr_set_clr+0xa2b/0xb40 [<c0109a90>] ? do_invalid_op+0x0/0x90 [<c01a36b6>] ? swap_count_continued+0x176/0x180 [<c01a37b1>] ? swap_entry_free+0xf1/0x120 [<c01a5b67>] ? free_swap_and_cache+0x27/0xd0 [<c0196296>] ? unmap_vmas+0x3d6/0x820 [<c01039c6>] ? __raw_callee_save_xen_make_pte+0x6/0x8 [<c019a7d1>] ? exit_mmap+0x91/0x140 [<c0130ceb>] ? mmput+0x2b/0xc0 [<c0134697>] ? exit_mm+0xe7/0x120 [<c063f390>] ? _raw_spin_lock_irq+0x10/0x20 [<c013614a>] ? do_exit+0x10a/0x6e0 [<c013675c>] ? do_group_exit+0x3c/0xa0 [<c01367d1>] ? sys_exit_group+0x11/0x20 [<c063f761>] ? syscall_call+0x7/0xb [<c0630000>] ? sctp_sockaddr_af+0x20/0x90 -Chris _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Apr-21 15:14 UTC
Re: [Xen-devel] kernel BUG at mm/swapfile.c:2524
On Thu, Apr 21, 2011 at 10:57:58AM -0400, Christopher S. Aker wrote:> We''re still getting reports of this occurring with 2.6.38 domU.Darn. I was hoping it would have just disappeared on its own :-)> Here''s a fresh one: > > ------------[ cut here ]------------ > kernel BUG at mm/swapfile.c:2524! > invalid opcode: 0000 [#1] SMP > last sysfs file: /sys/kernel/uevent_seqnum > Modules linked in: > > Pid: 12589, comm: apache2 Not tainted 2.6.38-linode31 #1 > EIP: 0061:[<c01a36b6>] EFLAGS: 00210246 CPU: 0 > EIP is at swap_count_continued+0x176/0x180 > EAX: f57ba6c3 EBX: ed3af500 ECX: f57ba000 EDX: 00000000 > ESI: ed3c5280 EDI: 00000080 EBP: 000006c3 ESP: d4bcde30 > DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069 > Process apache2 (pid: 12589, ti=d4bcc000 task=ec9c17d0 task.ti=d4bcc000) > Stack: > ec03c4c0 000166c3 00000000 00000000 c01a37b1 b7d63000 ec03c4c0 000166c3 > 00000000 c01a5b67 b7d63000 b7e00000 eae67b18 c0196296 6bb30067 80000006 > c01039c6 0000000c 00000000 00000000 002cd860 0006c92f d0374df0 d17b3040 > Call Trace: > [<c01a37b1>] ? swap_entry_free+0xf1/0x120 > [<c01a5b67>] ? free_swap_and_cache+0x27/0xd0 > [<c0196296>] ? unmap_vmas+0x3d6/0x820 > [<c01039c6>] ? __raw_callee_save_xen_make_pte+0x6/0x8 > [<c019a7d1>] ? exit_mmap+0x91/0x140 > [<c0130ceb>] ? mmput+0x2b/0xc0 > [<c0134697>] ? exit_mm+0xe7/0x120 > [<c063f390>] ? _raw_spin_lock_irq+0x10/0x20 > [<c013614a>] ? do_exit+0x10a/0x6e0 > [<c013675c>] ? do_group_exit+0x3c/0xa0 > [<c01367d1>] ? sys_exit_group+0x11/0x20 > [<c063f761>] ? syscall_call+0x7/0xb > [<c0630000>] ? sctp_sockaddr_af+0x20/0x90 > Code: ff 89 d8 e8 cd f7 f7 ff 01 e8 8d 76 00 c6 00 00 ba 01 00 00 00 > eb b2 89 f8 3c 80 0f 94 c0 e9 b9 fe ff ff 0f 0b e > b fe 0f 0b eb fe <0f> 0b eb fe 0f 0b eb fe 66 90 83 ec 10 89 1c 24 > 89 c3 89 74 24 > EIP: [<c01a36b6>] swap_count_continued+0x176/0x180 SS:ESP 0069:d4bcde30 > ---[ end trace f3fdefcfb4d8b5dc ]--- > Fixing recursive fault but reboot is needed! > BUG: scheduling while atomic: apache2/12589/0x00000001 > Modules linked in: > Pid: 12589, comm: apache2 Tainted: G D 2.6.38-linode31 #1Aaah, so another HTTP server. How can I reproduce this?> Call Trace: > > [<c063d979>] ? schedule+0x6a9/0x8b0 > [<c010630b>] ? xen_restore_fl_direct_end+0x0/0x1 > [<c063f3b1>] ? _raw_spin_unlock_irqrestore+0x11/0x20 > [<c0133323>] ? console_unlock+0x1c3/0x200 > [<c013374f>] ? vprintk+0x17f/0x3c0 > [<c01365ff>] ? do_exit+0x5bf/0x6e0 > [<c063f367>] ? _raw_spin_lock_irqsave+0x27/0x40 > [<c063f3b1>] ? _raw_spin_unlock_irqrestore+0x11/0x20 > [<c0132e96>] ? kmsg_dump+0x36/0xf0 > [<c0109a90>] ? do_invalid_op+0x0/0x90 > [<c0109a90>] ? do_invalid_op+0x0/0x90 > [<c010bea1>] ? oops_end+0x71/0xa0 > [<c0109b0f>] ? do_invalid_op+0x7f/0x90 > [<c01a36b6>] ? swap_count_continued+0x176/0x180 > [<c016d805>] ? handle_IRQ_event+0x55/0xc0 > [<c0105b37>] ? xen_force_evtchn_callback+0x17/0x30 > [<c0138090>] ? __do_softirq+0x0/0x130 > [<c0106314>] ? check_events+0x8/0xc > [<c010630b>] ? xen_restore_fl_direct_end+0x0/0x1 > [<c010af92>] ? do_softirq+0x42/0xb0 > [<c063fcd6>] ? error_code+0x5a/0x60 > [<c012007b>] ? __change_page_attr_set_clr+0xa2b/0xb40 > [<c0109a90>] ? do_invalid_op+0x0/0x90 > [<c01a36b6>] ? swap_count_continued+0x176/0x180 > [<c01a37b1>] ? swap_entry_free+0xf1/0x120 > [<c01a5b67>] ? free_swap_and_cache+0x27/0xd0So back to unmap_vmas. I wonder if we are missing the lazy_unmap flag in 2.6.38.> [<c0196296>] ? unmap_vmas+0x3d6/0x820 > [<c01039c6>] ? __raw_callee_save_xen_make_pte+0x6/0x8 > [<c019a7d1>] ? exit_mmap+0x91/0x140 > [<c0130ceb>] ? mmput+0x2b/0xc0 > [<c0134697>] ? exit_mm+0xe7/0x120 > [<c063f390>] ? _raw_spin_lock_irq+0x10/0x20 > [<c013614a>] ? do_exit+0x10a/0x6e0 > [<c013675c>] ? do_group_exit+0x3c/0xa0 > [<c01367d1>] ? sys_exit_group+0x11/0x20 > [<c063f761>] ? syscall_call+0x7/0xb > [<c0630000>] ? sctp_sockaddr_af+0x20/0x90 > > -Chris > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> -----Original Message----- > From: Christopher S. Aker [mailto:caker@theshore.net] > Sent: Thursday, April 21, 2011 8:58 AM > To: xen-devel@lists.xensource.com > Cc: Konrad Rzeszutek Wilk; Peter Sandin > Subject: Re: [Xen-devel] kernel BUG at mm/swapfile.c:2524 > > We''re still getting reports of this occurring with 2.6.38 domU. Here''s > a fresh one:IMHO, you might get more traction on this problem by asking questions in linux-mm or lkml. With limited study, it''s not obvious to me that this problem could be Xen-related; more likely a Linux swap subsystem race that gets provoked because Xen is scheduling differently than bare metal.> ------------[ cut here ]------------ > kernel BUG at mm/swapfile.c:2524! > invalid opcode: 0000 [#1] SMP > last sysfs file: /sys/kernel/uevent_seqnum > Modules linked in: > > Pid: 12589, comm: apache2 Not tainted 2.6.38-linode31 #1 > EIP: 0061:[<c01a36b6>] EFLAGS: 00210246 CPU: 0 > EIP is at swap_count_continued+0x176/0x180 > EAX: f57ba6c3 EBX: ed3af500 ECX: f57ba000 EDX: 00000000 > ESI: ed3c5280 EDI: 00000080 EBP: 000006c3 ESP: d4bcde30 > DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069 > Process apache2 (pid: 12589, ti=d4bcc000 task=ec9c17d0 > task.ti=d4bcc000) > Stack: > ec03c4c0 000166c3 00000000 00000000 c01a37b1 b7d63000 ec03c4c0 > 000166c3 > 00000000 c01a5b67 b7d63000 b7e00000 eae67b18 c0196296 6bb30067 > 80000006 > c01039c6 0000000c 00000000 00000000 002cd860 0006c92f d0374df0 > d17b3040 > Call Trace: > [<c01a37b1>] ? swap_entry_free+0xf1/0x120 > [<c01a5b67>] ? free_swap_and_cache+0x27/0xd0 > [<c0196296>] ? unmap_vmas+0x3d6/0x820 > [<c01039c6>] ? __raw_callee_save_xen_make_pte+0x6/0x8 > [<c019a7d1>] ? exit_mmap+0x91/0x140 > [<c0130ceb>] ? mmput+0x2b/0xc0 > [<c0134697>] ? exit_mm+0xe7/0x120 > [<c063f390>] ? _raw_spin_lock_irq+0x10/0x20 > [<c013614a>] ? do_exit+0x10a/0x6e0 > [<c013675c>] ? do_group_exit+0x3c/0xa0 > [<c01367d1>] ? sys_exit_group+0x11/0x20 > [<c063f761>] ? syscall_call+0x7/0xb > [<c0630000>] ? sctp_sockaddr_af+0x20/0x90 > Code: ff 89 d8 e8 cd f7 f7 ff 01 e8 8d 76 00 c6 00 00 ba 01 00 00 00 eb > b2 89 f8 3c 80 0f 94 c0 e9 b9 fe ff ff 0f 0b e > b fe 0f 0b eb fe <0f> 0b eb fe 0f 0b eb fe 66 90 83 ec 10 89 1c 24 89 > c3 > 89 74 24 > EIP: [<c01a36b6>] swap_count_continued+0x176/0x180 SS:ESP 0069:d4bcde30 > ---[ end trace f3fdefcfb4d8b5dc ]--- > Fixing recursive fault but reboot is needed! > BUG: scheduling while atomic: apache2/12589/0x00000001 > Modules linked in: > Pid: 12589, comm: apache2 Tainted: G D 2.6.38-linode31 #1 > Call Trace: > > [<c063d979>] ? schedule+0x6a9/0x8b0 > [<c010630b>] ? xen_restore_fl_direct_end+0x0/0x1 > [<c063f3b1>] ? _raw_spin_unlock_irqrestore+0x11/0x20 > [<c0133323>] ? console_unlock+0x1c3/0x200 > [<c013374f>] ? vprintk+0x17f/0x3c0 > [<c01365ff>] ? do_exit+0x5bf/0x6e0 > [<c063f367>] ? _raw_spin_lock_irqsave+0x27/0x40 > [<c063f3b1>] ? _raw_spin_unlock_irqrestore+0x11/0x20 > [<c0132e96>] ? kmsg_dump+0x36/0xf0 > [<c0109a90>] ? do_invalid_op+0x0/0x90 > [<c0109a90>] ? do_invalid_op+0x0/0x90 > [<c010bea1>] ? oops_end+0x71/0xa0 > [<c0109b0f>] ? do_invalid_op+0x7f/0x90 > [<c01a36b6>] ? swap_count_continued+0x176/0x180 > [<c016d805>] ? handle_IRQ_event+0x55/0xc0 > [<c0105b37>] ? xen_force_evtchn_callback+0x17/0x30 > [<c0138090>] ? __do_softirq+0x0/0x130 > [<c0106314>] ? check_events+0x8/0xc > [<c010630b>] ? xen_restore_fl_direct_end+0x0/0x1 > [<c010af92>] ? do_softirq+0x42/0xb0 > [<c063fcd6>] ? error_code+0x5a/0x60 > [<c012007b>] ? __change_page_attr_set_clr+0xa2b/0xb40 > [<c0109a90>] ? do_invalid_op+0x0/0x90 > [<c01a36b6>] ? swap_count_continued+0x176/0x180 > [<c01a37b1>] ? swap_entry_free+0xf1/0x120 > [<c01a5b67>] ? free_swap_and_cache+0x27/0xd0 > [<c0196296>] ? unmap_vmas+0x3d6/0x820 > [<c01039c6>] ? __raw_callee_save_xen_make_pte+0x6/0x8 > [<c019a7d1>] ? exit_mmap+0x91/0x140 > [<c0130ceb>] ? mmput+0x2b/0xc0 > [<c0134697>] ? exit_mm+0xe7/0x120 > [<c063f390>] ? _raw_spin_lock_irq+0x10/0x20 > [<c013614a>] ? do_exit+0x10a/0x6e0 > [<c013675c>] ? do_group_exit+0x3c/0xa0 > [<c01367d1>] ? sys_exit_group+0x11/0x20 > [<c063f761>] ? syscall_call+0x7/0xb > [<c0630000>] ? sctp_sockaddr_af+0x20/0x90 > > -Chris > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Apr-21 15:24 UTC
Re: [Xen-devel] kernel BUG at mm/swapfile.c:2524
> So back to unmap_vmas. I wonder if we are missing the lazy_unmap flag > in 2.6.38.> > > [<c0196296>] ? unmap_vmas+0x3d6/0x820Try the attached patch diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index f608942..19444e6 100644 --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -2049,6 +2049,8 @@ void __init xen_init_mmu_ops(void) x86_init.paging.pagetable_setup_done = xen_pagetable_setup_done; pv_mmu_ops = xen_mmu_ops; + vmap_lazy_unmap = false; + memset(dummy_mapping, 0xff, PAGE_SIZE); } diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 4ed6fcd..65c3d39 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -7,6 +7,8 @@ struct vm_area_struct; /* vma defining user mapping in mm_types.h */ +extern bool vmap_lazy_unmap; + /* bits in flags of vmalloc''s vm_struct below */ #define VM_IOREMAP 0x00000001 /* ioremap() and friends */ #define VM_ALLOC 0x00000002 /* vmalloc() */ diff --git a/mm/vmalloc.c b/mm/vmalloc.c index f9b1667..6ab12de 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -31,6 +31,8 @@ #include <asm/tlbflush.h> #include <asm/shmparam.h> +bool vmap_lazy_unmap __read_mostly = true; + /*** Page table manipulation functions ***/ static void vunmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end) @@ -501,6 +503,9 @@ static unsigned long lazy_max_pages(void) { unsigned int log; + if (!vmap_lazy_unmap) + return 0; + log = fls(num_online_cpus()); return log * (32UL * 1024 * 1024 / PAGE_SIZE); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christopher S. Aker
2011-Apr-27 17:49 UTC
Re: [Xen-devel] kernel BUG at mm/swapfile.c:2524
On 4/21/11 11:24 AM, Konrad Rzeszutek Wilk wrote:>> So back to unmap_vmas. I wonder if we are missing the lazy_unmap flag >> in 2.6.38. > Try the attached patchPatched and deployed last week, got this report last night: ------------[ cut here ]------------ kernel BUG at mm/swapfile.c:2529! invalid opcode: 0000 [#1] SMP last sysfs file: /sys/devices/vbd-51712/block/xvda/removable Modules linked in: Pid: 17319, comm: apache2 Not tainted 2.6.38.3-linode32 #1 EIP: 0061:[<c01a3796>] EFLAGS: 00010246 CPU: 3 EIP is at swap_count_continued+0x176/0x180 EAX: f577ed93 EBX: ecf60600 ECX: f577e000 EDX: 00000000 ESI: ed3d82e0 EDI: 00000080 EBP: 00000d93 ESP: c3bc5e80 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069 Process apache2 (pid: 17319, ti=c3bc4000 task=eb5af7d0 task.ti=c3bc4000) Stack: eb732cc0 0000ed93 00000040 00000000 c01a3891 0000ed93 ea3d9e28 ec521b00 00000000 c01a3af8 00000000 c019569c 00000000 80000004 00000000 c0106314 000000b1 ec60114c 0000ed93 d46d5d58 00000000 ffffffe8 b8b32018 00000000 Call Trace: [<c01a3891>] ? swap_entry_free+0xf1/0x120 [<c01a3af8>] ? swap_free+0x18/0x30 [<c019569c>] ? handle_pte_fault+0x49c/0xac0 [<c0106314>] ? check_events+0x8/0xc [<c0196f61>] ? handle_mm_fault+0x101/0x1a0 [<c011e81b>] ? do_page_fault+0xfb/0x3e0 [<c01b13ee>] ? sys_stat64+0x1e/0x30 [<c01581a4>] ? tick_resume_broadcast+0x4/0x90 [<c011e720>] ? do_page_fault+0x0/0x3e0 [<c063fea6>] ? error_code+0x5a/0x60 [<c0630000>] ? sctp_get_port_local+0x270/0x340 [<c011e720>] ? do_page_fault+0x0/0x3e0 Code: ff 89 d8 e8 cd f6 f7 ff 01 e8 8d 76 00 c6 00 00 ba 01 00 00 00 eb b2 89 f8 3c 80 0f 94 c0 e9 b9 fe ff ff 0f 0b eb fe 0f 0b eb fe <0f> 0b eb fe 0f 0b eb fe 66 90 83 ec 10 89 1c 24 89 c3 89 74 24 EIP: [<c01a3796>] swap_count_continued+0x176/0x180 SS:ESP 0069:c3bc5e80 ---[ end trace 6a2c16b916f3591f ]--- :( -Chris _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Apr-27 18:16 UTC
Re: [Xen-devel] kernel BUG at mm/swapfile.c:2524
On Wed, Apr 27, 2011 at 01:49:29PM -0400, Christopher S. Aker wrote:> On 4/21/11 11:24 AM, Konrad Rzeszutek Wilk wrote: > >>So back to unmap_vmas. I wonder if we are missing the lazy_unmap flag > >>in 2.6.38. > >Try the attached patch > > Patched and deployed last week, got this report last night:Ok, I need to be able to reproduce this. How do I do that? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christopher S. Aker
2011-May-31 14:43 UTC
Re: [Xen-devel] kernel BUG at mm/swapfile.c:2524
On 4/27/11 2:16 PM, Konrad Rzeszutek Wilk wrote:> Ok, I need to be able to reproduce this. How do I do that?Yeah - if only we knew. We haven''t been able to identify any patterns in the reports we''ve been getting. The problem still exists in 2.6.39: ------------[ cut here ]------------ kernel BUG at mm/swapfile.c:2527! invalid opcode: 0000 [#1] SMP last sysfs file: /sys/devices/vbd-51712/block/xvda/removable Modules linked in: Pid: 3701, comm: apache2 Not tainted 2.6.39-linode33 #5 EIP: 0061:[<c01a8fb6>] EFLAGS: 00010246 CPU: 0 EIP is at swap_count_continued+0x176/0x180 EAX: f57ba2e3 EBX: ebacd820 ECX: f57ba000 EDX: 00000000 ESI: ebfd81a0 EDI: 00000080 EBP: 000002e3 ESP: cb4d1e1c DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069 Process apache2 (pid: 3701, ti=cb4d0000 task=c2407030 task.ti=cb4d0000) Stack: e9c27ec0 000042e3 00000040 00000000 c01a90b1 ce9762d8 e9c27ec0 000042e3 00000000 c01ab367 ce9762d8 b825b000 cb4d1f04 c019b6b3 72d4e045 80000003 0229c063 c0103fc5 93563000 00000006 00085c60 ce0cfe40 eb3a9ba0 72d4e045 Call Trace: [<c01a90b1>] ? swap_entry_free+0xf1/0x120 [<c01ab367>] ? free_swap_and_cache+0x27/0xd0 [<c019b6b3>] ? zap_pte_range+0x1b3/0x480 [<c0103fc5>] ? pte_pfn_to_mfn+0xb5/0xd0 [<c019ba91>] ? unmap_page_range+0x111/0x190 [<c019bc3b>] ? unmap_vmas+0x12b/0x1e0 [<c019ff01>] ? exit_mmap+0x91/0x140 [<c013247b>] ? mmput+0x2b/0xc0 [<c0135d7f>] ? exit_mm+0xef/0x120 [<c068f4b0>] ? _raw_spin_lock_irq+0x10/0x20 [<c0137975>] ? do_exit+0x125/0x350 [<c019fe57>] ? remove_vma+0x37/0x50 [<c0137bdc>] ? do_group_exit+0x3c/0xa0 [<c0137c51>] ? sys_exit_group+0x11/0x20 [<c068f7b1>] ? syscall_call+0x7/0xb [<c0680000>] ? sctp_icmp_proto_unreachable+0x20/0xc0 Code: ff 89 d8 e8 ed a7 f7 ff 01 e8 8d 76 00 c6 00 00 ba 01 00 00 00 eb b2 89 f8 3c 80 0f 94 c0 e9 b9 fe ff ff 0f 0b eb fe 0f 0b eb fe <0f> 0b eb fe 0f 0b eb fe 66 90 83 ec 10 89 1c 24 89 c3 89 74 24 EIP: [<c01a8fb6>] swap_count_continued+0x176/0x180 SS:ESP 0069:cb4d1e1c ---[ end trace 39b6b8ea9add1a97 ]--- Fixing recursive fault but reboot is needed! BUG: scheduling while atomic: apache2/3701/0x00000001 Modules linked in: Pid: 3701, comm: apache2 Tainted: G D 2.6.39-linode33 #5 Call Trace: [<c068dade>] ? schedule+0x50e/0x6d0 [<c013484f>] ? vprintk+0x1af/0x3e0 [<c0105b17>] ? xen_force_evtchn_callback+0x17/0x30 [<c0109370>] ? do_coprocessor_segment_overrun+0x80/0x80 [<c0109370>] ? do_coprocessor_segment_overrun+0x80/0x80 [<c0137b37>] ? do_exit+0x2e7/0x350 [<c0109370>] ? do_coprocessor_segment_overrun+0x80/0x80 [<c0109370>] ? do_coprocessor_segment_overrun+0x80/0x80 [<c010b7d1>] ? oops_end+0x71/0xa0 [<c01093ef>] ? do_invalid_op+0x7f/0x90 [<c01a8fb6>] ? swap_count_continued+0x176/0x180 [<c0173821>] ? handle_percpu_irq+0x31/0x50 [<c0105b17>] ? xen_force_evtchn_callback+0x17/0x30 [<c0139450>] ? __local_bh_enable+0x70/0x70 [<c01062f4>] ? check_events+0x8/0xc [<c01062eb>] ? xen_restore_fl_direct_reloc+0x4/0x4 [<c010a882>] ? do_softirq+0x42/0xb0 [<c0139371>] ? irq_exit+0x31/0x90 [<c044c94d>] ? xen_evtchn_do_upcall+0x1d/0x30 [<c0103fc5>] ? pte_pfn_to_mfn+0xb5/0xd0 [<c068fd26>] ? error_code+0x5a/0x60 [<c012007b>] ? try_preserve_large_page+0x7b/0x340 [<c0109370>] ? do_coprocessor_segment_overrun+0x80/0x80 [<c01a8fb6>] ? swap_count_continued+0x176/0x180 [<c01a90b1>] ? swap_entry_free+0xf1/0x120 [<c01ab367>] ? free_swap_and_cache+0x27/0xd0 [<c019b6b3>] ? zap_pte_range+0x1b3/0x480 [<c0103fc5>] ? pte_pfn_to_mfn+0xb5/0xd0 [<c019ba91>] ? unmap_page_range+0x111/0x190 [<c019bc3b>] ? unmap_vmas+0x12b/0x1e0 [<c019ff01>] ? exit_mmap+0x91/0x140 [<c013247b>] ? mmput+0x2b/0xc0 [<c0135d7f>] ? exit_mm+0xef/0x120 [<c068f4b0>] ? _raw_spin_lock_irq+0x10/0x20 [<c0137975>] ? do_exit+0x125/0x350 [<c019fe57>] ? remove_vma+0x37/0x50 [<c0137bdc>] ? do_group_exit+0x3c/0xa0 [<c0137c51>] ? sys_exit_group+0x11/0x20 [<c068f7b1>] ? syscall_call+0x7/0xb [<c0680000>] ? sctp_icmp_proto_unreachable+0x20/0xc0 -Chris _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel