Christopher S. Aker
2010-Dec-30 22:57 UTC
[Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request
Xen: 3.4.4-rc1-pre 64bit (xenbits @ 19986) Dom0: 2.6.32.27-1 PAE (xen/stable-2.6.32.x @ 75cc13f5aa29b4f3227d269ca165dfa8937c94fe) We''ve been running our xen-thrash testsuite on a bunch of hosts against a very recent build, and we''ve just hit this on one box: BUG: unable to handle kernel paging request at 15555d60 IP: [<c1022781>] vmalloc_sync_all+0xd1/0x1f0 *pdpt = 000000001d8ee027 *pde = 0000000000000000 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/system/xen_memory/xen_memory0/info/current_kb Modules linked in: dm_snapshot iTCO_wdt usbhid Pid: 44, comm: xenwatch Not tainted (2.6.32.27-1 #1) X8DTU EIP: 0061:[<c1022781>] EFLAGS: 00010007 CPU: 0 EIP is at vmalloc_sync_all+0xd1/0x1f0 EAX: 15555d60 EBX: c1a50c00 ECX: 55555001 EDX: 06855067 ESI: c173ad60 EDI: dd8f85c4 EBP: 00000009 ESP: dfd7fe64 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069 Process xenwatch (pid: 44, ti=dfd7e000 task=dfce90f0 task.ti=dfd7e000) Stack: 00000018 0001fc55 00000000 c0000d60 f5800000 1fc55067 00000000 1fc55067 <0> 00000000 c170025c dd2abe00 c898f180 dfd7ff20 c8a1171c c10829d0 c1081370 <0> 00000000 00000000 c12bbbd6 c1006bf4 0000000d dfd7fee4 dd2a0200 00000008 Call Trace: [<c10829d0>] ? alloc_vm_area+0x40/0x60 [<c1081370>] ? f+0x0/0x10 [<c12bbbd6>] ? blkif_map+0x36/0x1c0 [<c1006bf4>] ? check_events+0x8/0xc [<c12b2b0f>] ? xenbus_gather+0x5f/0x90 [<c12bb36c>] ? frontend_changed+0x25c/0x2d0 [<c12b36c5>] ? xenbus_otherend_changed+0x95/0xa0 [<c12b38bf>] ? frontend_changed+0xf/0x20 [<c12b1f57>] ? xenwatch_thread+0x87/0x130 [<c1048700>] ? autoremove_wake_function+0x0/0x40 [<c12b1ed0>] ? xenwatch_thread+0x0/0x130 [<c10484e4>] ? kthread+0x74/0x80 [<c1048470>] ? kthread+0x0/0x80 [<c1009e67>] ? kernel_thread_helper+0x7/0x10 Code: 04 8b 45 00 ff 15 14 2e 65 c1 8b 54 24 0c 25 00 f0 ff ff 8d 34 10 8b 16 8b 6e 04 f6 c2 01 74 7d 89 c8 25 00 f0 ff ff 03 44 24 0c <8b> 08 89 4c 24 04 8b 48 04 f6 44 24 04 01 75 67 89 e9 e8 18 32 EIP: [<c1022781>] vmalloc_sync_all+0xd1/0x1f0 SS:ESP 0069:dfd7fe64 CR2: 0000000015555d60 ---[ end trace 7a29128cd8a0e564 ]--- And then a whole load of soft lockup traces. Full output, hypervisor, and dom0 kernel are in here: http://theshore.net/~caker/xen/BUGS/2.6.32.27/ -Chris _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Dec-31 01:29 UTC
[Xen-devel] Re: 2.6.32.27 dom0 - BUG: unable to handle kernel paging request
On 12/31/2010 09:57 AM, Christopher S. Aker wrote:> Xen: 3.4.4-rc1-pre 64bit (xenbits @ 19986) > Dom0: 2.6.32.27-1 PAE (xen/stable-2.6.32.x @ > 75cc13f5aa29b4f3227d269ca165dfa8937c94fe) > > We''ve been running our xen-thrash testsuite on a bunch of hosts > against a very recent build, and we''ve just hit this on one box:Ah, interesting. This looks like something that Ian Jackson found on one of his test machines. What was going on at the time? J> > BUG: unable to handle kernel paging request at 15555d60 > IP: [<c1022781>] vmalloc_sync_all+0xd1/0x1f0 > *pdpt = 000000001d8ee027 *pde = 0000000000000000 > Oops: 0000 [#1] SMP > last sysfs file: > /sys/devices/system/xen_memory/xen_memory0/info/current_kb > Modules linked in: dm_snapshot iTCO_wdt usbhid > Pid: 44, comm: xenwatch Not tainted (2.6.32.27-1 #1) X8DTU > EIP: 0061:[<c1022781>] EFLAGS: 00010007 CPU: 0 > EIP is at vmalloc_sync_all+0xd1/0x1f0 > EAX: 15555d60 EBX: c1a50c00 ECX: 55555001 EDX: 06855067 > ESI: c173ad60 EDI: dd8f85c4 EBP: 00000009 ESP: dfd7fe64 > DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069 > Process xenwatch (pid: 44, ti=dfd7e000 task=dfce90f0 task.ti=dfd7e000) > Stack: > 00000018 0001fc55 00000000 c0000d60 f5800000 1fc55067 00000000 1fc55067 > <0> 00000000 c170025c dd2abe00 c898f180 dfd7ff20 c8a1171c c10829d0 > c1081370 > <0> 00000000 00000000 c12bbbd6 c1006bf4 0000000d dfd7fee4 dd2a0200 > 00000008 > Call Trace: > [<c10829d0>] ? alloc_vm_area+0x40/0x60 > [<c1081370>] ? f+0x0/0x10 > [<c12bbbd6>] ? blkif_map+0x36/0x1c0 > [<c1006bf4>] ? check_events+0x8/0xc > [<c12b2b0f>] ? xenbus_gather+0x5f/0x90 > [<c12bb36c>] ? frontend_changed+0x25c/0x2d0 > [<c12b36c5>] ? xenbus_otherend_changed+0x95/0xa0 > [<c12b38bf>] ? frontend_changed+0xf/0x20 > [<c12b1f57>] ? xenwatch_thread+0x87/0x130 > [<c1048700>] ? autoremove_wake_function+0x0/0x40 > [<c12b1ed0>] ? xenwatch_thread+0x0/0x130 > [<c10484e4>] ? kthread+0x74/0x80 > [<c1048470>] ? kthread+0x0/0x80 > [<c1009e67>] ? kernel_thread_helper+0x7/0x10 > Code: 04 8b 45 00 ff 15 14 2e 65 c1 8b 54 24 0c 25 00 f0 ff ff 8d 34 > 10 8b 16 8b 6e 04 f6 c2 01 74 7d 89 c8 25 00 f0 ff ff 03 44 24 0c <8b> > 08 89 4c 24 04 8b 48 04 f6 44 24 04 01 75 67 89 e9 e8 18 32 > EIP: [<c1022781>] vmalloc_sync_all+0xd1/0x1f0 SS:ESP 0069:dfd7fe64 > CR2: 0000000015555d60 > ---[ end trace 7a29128cd8a0e564 ]--- > > And then a whole load of soft lockup traces. Full output, hypervisor, > and dom0 kernel are in here: > > http://theshore.net/~caker/xen/BUGS/2.6.32.27/ > > -Chris >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christopher S. Aker
2010-Dec-31 17:19 UTC
[Xen-devel] Re: 2.6.32.27 dom0 - BUG: unable to handle kernel paging request
On Dec 30, 2010, at 8:29 PM, Jeremy Fitzhardinge wrote:> On 12/31/2010 09:57 AM, Christopher S. Aker wrote: >> Xen: 3.4.4-rc1-pre 64bit (xenbits @ 19986) >> Dom0: 2.6.32.27-1 PAE (xen/stable-2.6.32.x @ >> 75cc13f5aa29b4f3227d269ca165dfa8937c94fe) >> >> We''ve been running our xen-thrash testsuite on a bunch of hosts >> against a very recent build, and we''ve just hit this on one box: > > Ah, interesting. This looks like something that Ian Jackson found on > one of his test machines. > > What was going on at the time?Our xen-thrash testsuite was running. It was configured to boot: * 5 domUs swap thrashing (eatmem.c) * 5 domUs that busy-loop CPU * 5 domUs running crashme w/ 2.6.18 kernel * 5 domUs running crashme w/ pv_ops kernel * 5 domUs in a boot up -> sleep 60 -> shut down loop * 5 domUs in a boot up -> sleep 60 -> xm destroy loop We kicked off this identical run on 10 hosts. On another host I cranked these numbers up to total about 80 domUs. They''re all still running fine. The one that hit this BUG is still up, if any Hypervisor output would be helpful. -Chris _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christopher S. Aker
2011-Jan-02 20:08 UTC
Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request
On Dec 31, 2010, at 12:19 PM, Christopher S. Aker wrote:> We kicked off this identical run on 10 hosts. On another host I cranked these numbers up to total about 80 domUs. They''re all still running fine.Scratch that - another box just hit the identical trace: BUG: unable to handle kernel paging request at 15555d60 IP: [<c1022781>] vmalloc_sync_all+0xd1/0x1f0 *pdpt = 000000000c0ea027 *pde = 0000000000000000 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/system/xen_memory/xen_memory0/info/current_kb Modules linked in: dm_snapshot iTCO_wdt usbhid Pid: 44, comm: xenwatch Not tainted (2.6.32.27-1 #1) X8DTU EIP: 0061:[<c1022781>] EFLAGS: 00010007 CPU: 2 EIP is at vmalloc_sync_all+0xd1/0x1f0 EAX: 15555d60 EBX: c1aa8f00 ECX: 55555001 EDX: 06855067 ESI: c173ad60 EDI: ddbd7944 EBP: 00000009 ESP: dfd7fe64 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069 Process xenwatch (pid: 44, ti=dfd7e000 task=dfce90f0 task.ti=dfd7e000) Stack: 00000018 0001fc55 00000000 c0000d60 f5800000 1fc55067 00000000 1fc55067 <0> 00000000 c170025c cc974c20 d422a180 dfd7ff20 c8d44118 c10829d0 c1081370 <0> 00000000 00000000 c12bbbd6 c1006bf4 00000025 dfd7fee4 cc970220 00000009 Call Trace: [<c10829d0>] ? alloc_vm_area+0x40/0x60 [<c1081370>] ? f+0x0/0x10 [<c12bbbd6>] ? blkif_map+0x36/0x1c0 [<c1006bf4>] ? check_events+0x8/0xc [<c12b2b0f>] ? xenbus_gather+0x5f/0x90 [<c12bb36c>] ? frontend_changed+0x25c/0x2d0 [<c12b36c5>] ? xenbus_otherend_changed+0x95/0xa0 [<c12b38bf>] ? frontend_changed+0xf/0x20 [<c12b1f57>] ? xenwatch_thread+0x87/0x130 [<c1048700>] ? autoremove_wake_function+0x0/0x40 [<c12b1ed0>] ? xenwatch_thread+0x0/0x130 [<c10484e4>] ? kthread+0x74/0x80 [<c1048470>] ? kthread+0x0/0x80 [<c1009e67>] ? kernel_thread_helper+0x7/0x10 Code: 04 8b 45 00 ff 15 14 2e 65 c1 8b 54 24 0c 25 00 f0 ff ff 8d 34 10 8b 16 8b 6e 04 f6 c2 01 74 7d 89 c8 25 00 f0 ff ff 03 44 24 0c <8b> 08 89 4c 24 04 8b 48 04 f6 44 24 04 01 75 67 89 e9 e8 18 32 EIP: [<c1022781>] vmalloc_sync_all+0xd1/0x1f0 SS:ESP 0069:dfd7fe64 CR2: 0000000015555d60 ---[ end trace 48bbd5284e47e665 ]--- Thoughts or suggestions? I''d be happy to provide any additional information or perform more tests. Thanks! -Chris _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Jan-04 09:16 UTC
Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request
On Thu, 2010-12-30 at 22:57 +0000, Christopher S. Aker wrote:> Xen: 3.4.4-rc1-pre 64bit (xenbits @ 19986) > Dom0: 2.6.32.27-1 PAE (xen/stable-2.6.32.x @ > 75cc13f5aa29b4f3227d269ca165dfa8937c94fe)> > We''ve been running our xen-thrash testsuite on a bunch of hosts against > a very recent build, and we''ve just hit this on one box: > > BUG: unable to handle kernel paging request at 15555d60 > IP: [<c1022781>] vmalloc_sync_all+0xd1/0x1f0This looks similar to the issue which we thought was resolved via b2464c422fb44275deeb5770b668351860f68e0e. Can you convert 0xc10022781 to an exact line number? If you have a vmlinux with symbols then: $ gdb vmlinux (gdb) list *0xc10022781 should tell you the file and line. (gdb) disas 0xc10022781 might tell us something too. Ian.> *pdpt = 000000001d8ee027 *pde = 0000000000000000 > Oops: 0000 [#1] SMP > last sysfs file: /sys/devices/system/xen_memory/xen_memory0/info/current_kb > Modules linked in: dm_snapshot iTCO_wdt usbhid > Pid: 44, comm: xenwatch Not tainted (2.6.32.27-1 #1) X8DTU > EIP: 0061:[<c1022781>] EFLAGS: 00010007 CPU: 0 > EIP is at vmalloc_sync_all+0xd1/0x1f0 > EAX: 15555d60 EBX: c1a50c00 ECX: 55555001 EDX: 06855067 > ESI: c173ad60 EDI: dd8f85c4 EBP: 00000009 ESP: dfd7fe64 > DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069 > Process xenwatch (pid: 44, ti=dfd7e000 task=dfce90f0 task.ti=dfd7e000) > Stack: > 00000018 0001fc55 00000000 c0000d60 f5800000 1fc55067 00000000 1fc55067 > <0> 00000000 c170025c dd2abe00 c898f180 dfd7ff20 c8a1171c c10829d0 c1081370 > <0> 00000000 00000000 c12bbbd6 c1006bf4 0000000d dfd7fee4 dd2a0200 00000008 > Call Trace: > [<c10829d0>] ? alloc_vm_area+0x40/0x60 > [<c1081370>] ? f+0x0/0x10 > [<c12bbbd6>] ? blkif_map+0x36/0x1c0 > [<c1006bf4>] ? check_events+0x8/0xc > [<c12b2b0f>] ? xenbus_gather+0x5f/0x90 > [<c12bb36c>] ? frontend_changed+0x25c/0x2d0 > [<c12b36c5>] ? xenbus_otherend_changed+0x95/0xa0 > [<c12b38bf>] ? frontend_changed+0xf/0x20 > [<c12b1f57>] ? xenwatch_thread+0x87/0x130 > [<c1048700>] ? autoremove_wake_function+0x0/0x40 > [<c12b1ed0>] ? xenwatch_thread+0x0/0x130 > [<c10484e4>] ? kthread+0x74/0x80 > [<c1048470>] ? kthread+0x0/0x80 > [<c1009e67>] ? kernel_thread_helper+0x7/0x10 > Code: 04 8b 45 00 ff 15 14 2e 65 c1 8b 54 24 0c 25 00 f0 ff ff 8d 34 10 > 8b 16 8b 6e 04 f6 c2 01 74 7d 89 c8 25 00 f0 ff ff 03 44 24 0c <8b> 08 > 89 4c 24 04 8b 48 04 f6 44 24 04 01 75 67 89 e9 e8 18 32 > EIP: [<c1022781>] vmalloc_sync_all+0xd1/0x1f0 SS:ESP 0069:dfd7fe64 > CR2: 0000000015555d60 > ---[ end trace 7a29128cd8a0e564 ]--- > > And then a whole load of soft lockup traces. Full output, hypervisor, > and dom0 kernel are in here: > > http://theshore.net/~caker/xen/BUGS/2.6.32.27/ > > -Chris > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christopher S. Aker
2011-Jan-04 20:30 UTC
Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request
On Jan 4, 2011, at 4:16 AM, Ian Campbell wrote:> This looks similar to the issue which we thought was resolved via > b2464c422fb44275deeb5770b668351860f68e0e.Verified my tree has this changeset...> Can you convert 0xc10022781 to an exact line number? If you have a > vmlinux with symbols then: > $ gdb vmlinux > (gdb) list *0xc10022781 > should tell you the file and line. > > (gdb) disas 0xc10022781 > might tell us something too. > > Ian.~# uname -rv 2.6.32.27-1 #1 SMP Wed Dec 29 17:47:30 UTC 2010 ~# gdb vmlinux /proc/kcore -s /boot/System.map-2.6.32.27-1 GNU gdb 6.4-debian Copyright 2005 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i486-linux-gnu"...(no debugging symbols found) Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1". #0 0x00000000 in ?? () (gdb) list *0xc1022781 No symbol table is loaded. Use the "file" command. (gdb) disas 0xc1022781 Dump of assembler code for function vmalloc_sync_all: 0xc10226b0 <vmalloc_sync_all+0>: push %ebp 0xc10226b1 <vmalloc_sync_all+1>: push %edi 0xc10226b2 <vmalloc_sync_all+2>: push %esi 0xc10226b3 <vmalloc_sync_all+3>: push %ebx 0xc10226b4 <vmalloc_sync_all+4>: sub $0x28,%esp 0xc10226b7 <vmalloc_sync_all+7>: mov 0xc1652ca4,%eax 0xc10226bc <vmalloc_sync_all+12>: test %eax,%eax 0xc10226be <vmalloc_sync_all+14>: jne 0xc1022898 <vmalloc_sync_all+488> 0xc10226c4 <vmalloc_sync_all+20>: mov 0xc171e18c,%eax 0xc10226c9 <vmalloc_sync_all+25>: add $0x800000,%eax 0xc10226ce <vmalloc_sync_all+30>: and $0xffe00000,%eax 0xc10226d3 <vmalloc_sync_all+35>: cmp $0xbfffffff,%eax 0xc10226d8 <vmalloc_sync_all+40>: mov %eax,0x10(%esp) 0xc10226dc <vmalloc_sync_all+44>: jbe 0xc1022898 <vmalloc_sync_all+488> 0xc10226e2 <vmalloc_sync_all+50>: cmp 0xc1652ed8,%eax 0xc10226e8 <vmalloc_sync_all+56>: jae 0xc1022898 <vmalloc_sync_all+488> 0xc10226ee <vmalloc_sync_all+62>: data16 0xc10226ef <vmalloc_sync_all+63>: nop 0xc10226f0 <vmalloc_sync_all+64>: mov $0xc170785c,%eax 0xc10226f5 <vmalloc_sync_all+69>: call 0xc14ed140 <_spin_lock_irqsave> 0xc10226fa <vmalloc_sync_all+74>: mov %eax,0x24(%esp) 0xc10226fe <vmalloc_sync_all+78>: mov 0xc1652ec4,%eax 0xc1022703 <vmalloc_sync_all+83>: lea 0xffffffe8(%eax),%ebx 0xc1022706 <vmalloc_sync_all+86>: mov 0x18(%ebx),%edx 0xc1022709 <vmalloc_sync_all+89>: prefetchnta (%edx) 0xc102270c <vmalloc_sync_all+92>: nop 0xc102270d <vmalloc_sync_all+93>: cmp $0xc1652ec4,%eax 0xc1022712 <vmalloc_sync_all+98>: je 0xc1022868 <vmalloc_sync_all+440> 0xc1022718 <vmalloc_sync_all+104>: mov 0x10(%esp),%eax 0xc102271c <vmalloc_sync_all+108>: shr $0x1e,%eax 0xc102271f <vmalloc_sync_all+111>: shl $0x3,%eax 0xc1022722 <vmalloc_sync_all+114>: mov %eax,(%esp) 0xc1022725 <vmalloc_sync_all+117>: mov 0x10(%esp),%eax 0xc1022729 <vmalloc_sync_all+121>: shr $0x12,%eax 0xc102272c <vmalloc_sync_all+124>: and $0xff8,%eax 0xc1022731 <vmalloc_sync_all+129>: sub $0x40000000,%eax 0xc1022736 <vmalloc_sync_all+134>: mov %eax,0xc(%esp) 0xc102273a <vmalloc_sync_all+138>: jmp 0xc10227c8 <vmalloc_sync_all+280> 0xc102273f <vmalloc_sync_all+143>: nop 0xc1022740 <vmalloc_sync_all+144>: mov (%esp),%edx 0xc1022743 <vmalloc_sync_all+147>: mov (%eax,%edx,1),%ecx 0xc1022746 <vmalloc_sync_all+150>: mov 0x4(%eax,%edx,1),%edx 0xc102274a <vmalloc_sync_all+154>: mov %ecx,%eax 0xc102274c <vmalloc_sync_all+156>: call *0xc1652e14 0xc1022752 <vmalloc_sync_all+162>: mov %eax,%ecx 0xc1022754 <vmalloc_sync_all+164>: mov 0x4(%ebp),%edx 0xc1022757 <vmalloc_sync_all+167>: mov 0x0(%ebp),%eax 0xc102275a <vmalloc_sync_all+170>: call *0xc1652e14 0xc1022760 <vmalloc_sync_all+176>: mov 0xc(%esp),%edx 0xc1022764 <vmalloc_sync_all+180>: and $0xfffff000,%eax 0xc1022769 <vmalloc_sync_all+185>: lea (%eax,%edx,1),%esi 0xc102276c <vmalloc_sync_all+188>: mov (%esi),%edx 0xc102276e <vmalloc_sync_all+190>: mov 0x4(%esi),%ebp 0xc1022771 <vmalloc_sync_all+193>: test $0x1,%dl 0xc1022774 <vmalloc_sync_all+196>: je 0xc10227f3 <vmalloc_sync_all+323> 0xc1022776 <vmalloc_sync_all+198>: mov %ecx,%eax 0xc1022778 <vmalloc_sync_all+200>: and $0xfffff000,%eax 0xc102277d <vmalloc_sync_all+205>: add 0xc(%esp),%eax 0xc1022781 <vmalloc_sync_all+209>: mov (%eax),%ecx 0xc1022783 <vmalloc_sync_all+211>: mov %ecx,0x4(%esp) 0xc1022787 <vmalloc_sync_all+215>: mov 0x4(%eax),%ecx 0xc102278a <vmalloc_sync_all+218>: testb $0x1,0x4(%esp) 0xc102278f <vmalloc_sync_all+223>: jne 0xc10227f8 <vmalloc_sync_all+328> 0xc1022791 <vmalloc_sync_all+225>: mov %ebp,%ecx 0xc1022793 <vmalloc_sync_all+227>: call 0xc10059b0 <xen_set_pmd> 0xc1022798 <vmalloc_sync_all+232>: nop 0xc1022799 <vmalloc_sync_all+233>: lea 0x0(%esi),%esi 0xc10227a0 <vmalloc_sync_all+240>: mov %edi,%eax 0xc10227a2 <vmalloc_sync_all+242>: call 0xc10074b0 <xen_spin_unlock> 0xc10227a7 <vmalloc_sync_all+247>: nop 0xc10227a8 <vmalloc_sync_all+248>: test %esi,%esi 0xc10227aa <vmalloc_sync_all+250>: je 0xc1022868 <vmalloc_sync_all+440> 0xc10227b0 <vmalloc_sync_all+256>: mov 0x18(%ebx),%eax 0xc10227b3 <vmalloc_sync_all+259>: lea 0xffffffe8(%eax),%ebx 0xc10227b6 <vmalloc_sync_all+262>: mov 0x18(%ebx),%edx 0xc10227b9 <vmalloc_sync_all+265>: prefetchnta (%edx) 0xc10227bc <vmalloc_sync_all+268>: nop 0xc10227bd <vmalloc_sync_all+269>: cmp $0xc1652ec4,%eax 0xc10227c2 <vmalloc_sync_all+274>: je 0xc1022868 <vmalloc_sync_all+440> 0xc10227c8 <vmalloc_sync_all+280>: mov %ebx,%eax 0xc10227ca <vmalloc_sync_all+282>: call 0xc1025370 <pgd_page_get_mm> 0xc10227cf <vmalloc_sync_all+287>: lea 0x44(%eax),%edi 0xc10227d2 <vmalloc_sync_all+290>: mov %edi,%eax 0xc10227d4 <vmalloc_sync_all+292>: call 0xc14ed240 <_spin_lock> 0xc10227d9 <vmalloc_sync_all+297>: mov %ebx,%eax 0xc10227db <vmalloc_sync_all+299>: call 0xc10757a0 <page_address> 0xc10227e0 <vmalloc_sync_all+304>: mov (%esp),%ebp 0xc10227e3 <vmalloc_sync_all+307>: add 0xc1655f64,%ebp 0xc10227e9 <vmalloc_sync_all+313>: testb $0x1,0x0(%ebp) 0xc10227ed <vmalloc_sync_all+317>: jne 0xc1022740 <vmalloc_sync_all+144> 0xc10227f3 <vmalloc_sync_all+323>: xor %esi,%esi 0xc10227f5 <vmalloc_sync_all+325>: jmp 0xc10227a0 <vmalloc_sync_all+240> 0xc10227f7 <vmalloc_sync_all+327>: nop 0xc10227f8 <vmalloc_sync_all+328>: mov 0x4(%esp),%eax 0xc10227fc <vmalloc_sync_all+332>: mov %ecx,%edx 0xc10227fe <vmalloc_sync_all+334>: call *0xc1652e2c 0xc1022804 <vmalloc_sync_all+340>: mov %edx,%ebp 0xc1022806 <vmalloc_sync_all+342>: mov %eax,%ecx 0xc1022808 <vmalloc_sync_all+344>: mov 0x4(%esi),%edx 0xc102280b <vmalloc_sync_all+347>: mov (%esi),%eax 0xc102280d <vmalloc_sync_all+349>: call *0xc1652e2c 0xc1022813 <vmalloc_sync_all+355>: mov %edx,0x4(%esp) 0xc1022817 <vmalloc_sync_all+359>: mov %ecx,0x14(%esp) 0xc102281b <vmalloc_sync_all+363>: mov 0x14(%esp),%edx 0xc102281f <vmalloc_sync_all+367>: mov %ebp,0x18(%esp) 0xc1022823 <vmalloc_sync_all+371>: mov 0x18(%esp),%ecx 0xc1022827 <vmalloc_sync_all+375>: mov %eax,0x1c(%esp) 0xc102282b <vmalloc_sync_all+379>: mov 0x4(%esp),%eax 0xc102282f <vmalloc_sync_all+383>: shrd $0xc,%ecx,%edx 0xc1022833 <vmalloc_sync_all+387>: mov %edx,%ecx 0xc1022835 <vmalloc_sync_all+389>: mov %eax,0x20(%esp) 0xc1022839 <vmalloc_sync_all+393>: mov 0x1c(%esp),%eax 0xc102283d <vmalloc_sync_all+397>: shl $0x5,%ecx 0xc1022840 <vmalloc_sync_all+400>: mov 0x20(%esp),%edx 0xc1022844 <vmalloc_sync_all+404>: shrd $0xc,%edx,%eax 0xc1022848 <vmalloc_sync_all+408>: mov %eax,0x4(%esp) 0xc102284c <vmalloc_sync_all+412>: mov 0x4(%esp),%eax 0xc1022850 <vmalloc_sync_all+416>: shr $0xc,%edx 0xc1022853 <vmalloc_sync_all+419>: mov %edx,0x8(%esp) 0xc1022857 <vmalloc_sync_all+423>: shl $0x5,%eax 0xc102285a <vmalloc_sync_all+426>: cmp %eax,%ecx 0xc102285c <vmalloc_sync_all+428>: je 0xc10227a0 <vmalloc_sync_all+240> 0xc1022862 <vmalloc_sync_all+434>: ud2a 0xc1022864 <vmalloc_sync_all+436>: jmp 0xc1022864 <vmalloc_sync_all+436> 0xc1022866 <vmalloc_sync_all+438>: data16 0xc1022867 <vmalloc_sync_all+439>: nop 0xc1022868 <vmalloc_sync_all+440>: mov 0x24(%esp),%edx 0xc102286c <vmalloc_sync_all+444>: mov $0xc170785c,%eax 0xc1022871 <vmalloc_sync_all+449>: call 0xc14ed260 <_spin_unlock_irqrestore> 0xc1022876 <vmalloc_sync_all+454>: addl $0x200000,0x10(%esp) 0xc102287e <vmalloc_sync_all+462>: cmpl $0xbfffffff,0x10(%esp) 0xc1022886 <vmalloc_sync_all+470>: jbe 0xc1022898 <vmalloc_sync_all+488> 0xc1022888 <vmalloc_sync_all+472>: mov 0x10(%esp),%edx 0xc102288c <vmalloc_sync_all+476>: cmp %edx,0xc1652ed8 0xc1022892 <vmalloc_sync_all+482>: ja 0xc10226f0 <vmalloc_sync_all+64> 0xc1022898 <vmalloc_sync_all+488>: add $0x28,%esp 0xc102289b <vmalloc_sync_all+491>: pop %ebx 0xc102289c <vmalloc_sync_all+492>: pop %esi 0xc102289d <vmalloc_sync_all+493>: pop %edi 0xc102289e <vmalloc_sync_all+494>: pop %ebp 0xc102289f <vmalloc_sync_all+495>: ret End of assembler dump. (gdb) quit -Chris _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Jan-04 20:34 UTC
Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request
On Tue, 2011-01-04 at 20:30 +0000, Christopher S. Aker wrote:> > #0 0x00000000 in ?? () > (gdb) list *0xc1022781 > No symbol table is loaded. Use the "file" command.I think you need to enable CONFIG_DEBUG_INFO for this to work. I''ll see what I can figure out from the rest tomorrow. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christopher S. Aker
2011-Jan-04 21:59 UTC
Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request
On Jan 4, 2011, at 3:34 PM, Ian Campbell wrote:> On Tue, 2011-01-04 at 20:30 +0000, Christopher S. Aker wrote: >> >> No symbol table is loaded. Use the "file" command. > > I think you need to enable CONFIG_DEBUG_INFO for this to work.I rebuilt with CONFIG_DEBUG_INFO, and surprisingly it appears valid at the same address: # gdb vmlinux GNU gdb (GDB) 7.0-ubuntu Copyright (C) 2009 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "i486-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /build/xen/dom0/pv_ops/2.6.32.27-1-debug/vmlinux...done. (gdb) list *0xc1022781 0xc1022781 is in vmalloc_sync_all (/build/xen/dom0/pv_ops/2.6.32.27-1-debug/arch/x86/include/asm/pgtable.h:434). 429 #define pud_page(pud) pfn_to_page(pud_val(pud) >> PAGE_SHIFT) 430 431 /* Find an entry in the second-level page table.. */ 432 static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address) 433 { 434 return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address); 435 } 436 437 static inline int pud_large(pud_t pud) 438 { (gdb) quit -Chris _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christopher S. Aker
2011-Jan-09 18:07 UTC
Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request
On Jan 4, 2011, at 4:59 PM, Christopher S. Aker wrote:> > I rebuilt with CONFIG_DEBUG_INFO, and surprisingly it appears valid at the same address: > > # gdb vmlinux > (gdb) list *0xc1022781 > 0xc1022781 is in vmalloc_sync_all (/build/xen/dom0/pv_ops/2.6.32.27-1-debug/arch/x86/include/asm/pgtable.h:434). > 429 #define pud_page(pud) pfn_to_page(pud_val(pud) >> PAGE_SHIFT) > 430 > 431 /* Find an entry in the second-level page table.. */ > 432 static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address) > 433 { > 434 return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address); > 435 } > 436 > 437 static inline int pud_large(pud_t pud) > 438 {We hit the BUG again on a third test box -- at least it''s fairly easy to reproduce. Has anyone had a chance to poke at this, or have a suggestion for something for me to try/test? Thanks, -Chris _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Jan-10 18:56 UTC
Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request
On Sun, Jan 09, 2011 at 01:07:26PM -0500, Christopher S. Aker wrote:> On Jan 4, 2011, at 4:59 PM, Christopher S. Aker wrote: > > > > I rebuilt with CONFIG_DEBUG_INFO, and surprisingly it appears valid at the same address: > > > > # gdb vmlinux > > (gdb) list *0xc1022781 > > 0xc1022781 is in vmalloc_sync_all (/build/xen/dom0/pv_ops/2.6.32.27-1-debug/arch/x86/include/asm/pgtable.h:434). > > 429 #define pud_page(pud) pfn_to_page(pud_val(pud) >> PAGE_SHIFT) > > 430 > > 431 /* Find an entry in the second-level page table.. */ > > 432 static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address) > > 433 { > > 434 return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address); > > 435 } > > 436 > > 437 static inline int pud_large(pud_t pud) > > 438 { > > We hit the BUG again on a third test box -- at least it''s fairly easy to reproduce. Has anyone had a chance to poke at this, or have a suggestion for something for me to try/test?Which test makes it easy to reproduce? Oh wait, you have a whole bunch of guests pounding. Is it possible to narrow down which type of test is causing this? Or can you put up the domU guests along with the xm config files to try to reproduce this? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
John Weekes
2011-Jan-10 21:49 UTC
Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request
> We hit the BUG again on a third test box -- at least it''s fairly easy to reproduce. Has anyone had a chance to poke at this, or have a suggestion for something for me to try/test?Have you tried raising /proc/sys/vm/min_free_kbytes ? When I was seeing a similar "unable to handle kernel paging request" error on some machines, I bumped mine up to 32768, and it seemed to eliminate the problem. -John _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Jan-13 14:43 UTC
Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request
On Sun, Jan 09, 2011 at 01:07:26PM -0500, Christopher S. Aker wrote:> On Jan 4, 2011, at 4:59 PM, Christopher S. Aker wrote: > > > > I rebuilt with CONFIG_DEBUG_INFO, and surprisingly it appears valid at the same address: > > > > # gdb vmlinux > > (gdb) list *0xc1022781 > > 0xc1022781 is in vmalloc_sync_all (/build/xen/dom0/pv_ops/2.6.32.27-1-debug/arch/x86/include/asm/pgtable.h:434). > > 429 #define pud_page(pud) pfn_to_page(pud_val(pud) >> PAGE_SHIFT) > > 430 > > 431 /* Find an entry in the second-level page table.. */ > > 432 static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address) > > 433 { > > 434 return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address); > > 435 } > > 436 > > 437 static inline int pud_large(pud_t pud) > > 438 { > > We hit the BUG again on a third test box -- at least it''s fairly easy to reproduce. Has anyone had a chance to poke at this, or have a suggestion for something for me to try/test?I don''t have that much memory as you, so I''ve only been running a smaller subset of those guests. So far, nothing yet. How long did it take you to hit it? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christopher S. Aker
2011-Jan-15 15:57 UTC
Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request
On Jan 10, 2011, at 4:49 PM, John Weekes wrote:> Have you tried raising /proc/sys/vm/min_free_kbytes ? When I was seeing a similar "unable to handle kernel paging request" error on some machines, I bumped mine up to 32768, and it seemed to eliminate the problem.Thanks for your suggestion, John. I reset all 16 test machines and included a min_free_kbyte of 32M. Unfortunately one machine hit the identical BUG within 48 hours. So, this hasn''t fixed it. I''m willing to help in any way I can. -Chris _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christopher S. Aker
2011-Jan-31 21:07 UTC
Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request
> Xen: 3.4.4-rc1-pre 64bit (xenbits @ 19986)> Dom0: 2.6.32.27-1 PAE (xen/stable-2.6.32.x)> > We''ve been running our xen-thrash testsuite on a bunch of hosts > against a very recent build, and we''ve just hit this on one box: > > BUG: unable to handle kernel paging request at 15555d60Two additional boxes out of my last test round have also hit this. About one a week. Ian / Jeremy: Where do I go from here? -Chris _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Jan-31 21:17 UTC
Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request
On Mon, Jan 31, 2011 at 04:07:18PM -0500, Christopher S. Aker wrote:> >Xen: 3.4.4-rc1-pre 64bit (xenbits @ 19986) > > Dom0: 2.6.32.27-1 PAE (xen/stable-2.6.32.x) > > > >We''ve been running our xen-thrash testsuite on a bunch of hosts > >against a very recent build, and we''ve just hit this on one box: > > > >BUG: unable to handle kernel paging request at 15555d60Oh, I hit that if I do cat /sys/kernel/debug/kernel_page_tables. On 64-bit: sh-4.1# cd /sys/kernel/debug sh-4.1# ls acpi boot_params kernel_page_tables mce usb x86 bdi dri kprobes tracing wakeup_sources xen sh-4.1# cat kernel_page_tables [ 108.263615] BUG: unable to handle kernel paging request at ffff9d5555555000 [ 108.270480] IP: [<ffffffff81036bf0>] ptdump_show+0xc6/0x2f6 [ 108.276122] PGD 0 [ 108.278205] Oops: 0000 [#1] SMP [ 108.281504] last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:06:03.0/class [ 108.289316] CPU 3 [ 108.291137] Modules linked in: xen_evtchn video sg sd_mod radeon ahci libahci libata fbcon scsi_mod tileblit e1000e font bitblit ttm softcursor drm_kms_helper xen_blkfront xen_netfront fb_sys_fops sysimgblt sysfillrect syscopyarea xenfs [last unloaded: dump_dma] [ 108.314658] [ 108.316221] Pid: 3025, comm: cat Not tainted 2.6.38-rc2-00038-g7c92066 #1 DX58SO/ [ 108.324466] RIP: e030:[<ffffffff81036bf0>] [<ffffffff81036bf0>] ptdump_show+0xc6/0x2f6 [ 108.332538] RSP: e02b:ffff8800868f5dd8 EFLAGS: 00010286 [ 108.337919] RAX: ffff800000000000 RBX: ffff880085117700 RCX: 0000000000000000 [ 108.345126] RDX: ffff9d555555Killed5ff8 RSI: 000000 0000000000 RDI: sh-4.1# 0000000152460067 [ 108.345128] RBP: ffff8800868f5e78 R08: 0000000000000006 R09: 0000000000000000 [ 108.345130] R10: 00007fffec97cc30 R11: 0000000000000246 R12: ffff9d5555555000 [ 108.345132] R13: ffff880085117700 R14: ffffffff81803800 R15: ffff880000000000 [ 108.345137] FS: 00007f6c5a3d7700(0000) GS:ffff88009ce83000(0000) knlGS:0000000000000000 [ 108.345139] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [ 108.345140] CR2: ffff9d5555555000 CR3: 000000008b2c6000 CR4: 0000000000002660 [ 108.345142] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 108.345144] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 108.345146] Process cat (pid: 3025, threadinfo ffff8800868f4000, task ffff88009dbe8000) [ 108.345148] Stack: [ 108.345149] ffffffff81006ca2 0000000000000246 00007fffec97cc30 ffffffff8110efe7 [ 108.345152] ffff9d5555555ff8 0000800000000000 ffff88009f0029c0 0000800000000000 [ 108.345155] 0000000000000001 0000000000000000 0000000000000000 ffff800000000000 [ 108.345158] Call Trace: [ 108.345163] [<ffffffff81006ca2>] ? check_events+0x12/0x20 [ 108.345168] [<ffffffff8110efe7>] ? seq_read+0xbf/0x34a [ 108.345170] [<ffffffff8110efe7>] ? seq_read+0xbf/0x34a [ 108.345173] [<ffffffff8110f0a1>] seq_read+0x179/0x34a [ 108.345176] [<ffffffff810f5c32>] vfs_read+0xa6/0x102 [ 108.345178] [<ffffffff810f5d47>] sys_read+0x45/0x6c [ 108.345181] [<ffffffff8100a992>] system_call_fastpath+0x16/0x1b [ 108.345182] Code: 0f 00 00 00 88 ff ff 48 8d 14 10 4e 8d 24 38 48 8b 45 98 48 89 55 80 48 89 45 88 48 8b 45 88 48 c1 e0 10 48 c1 f8 10 48 89 45 b8 <49> 8b 3c 24 48 85 ff 0f 84 96 01 00 00 ff 14 25 b0 18 81 81 48 [ 108.345208] RIP [<ffffffff81036bf0>] ptdump_show+0xc6/0x2f6 [ 108.345211] RSP <ffff8800868f5dd8> [ 108.345212] CR2: ffff9d5555555000 [ 108.345214] ---[ end trace 9134d308b82bc832 ]--- The 32-bit hits this too, with the 15555d60 address. Does the xenthrash hit this file too? Do you know what file it touches when this happens?> > Two additional boxes out of my last test round have also hit this. > About one a week. > > Ian / Jeremy: Where do I go from here? > > -Chris > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2011-Jan-31 22:19 UTC
Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request
On 01/31/2011 01:17 PM, Konrad Rzeszutek Wilk wrote:> On Mon, Jan 31, 2011 at 04:07:18PM -0500, Christopher S. Aker wrote: >>> Xen: 3.4.4-rc1-pre 64bit (xenbits @ 19986) >>> Dom0: 2.6.32.27-1 PAE (xen/stable-2.6.32.x) >>> >>> We''ve been running our xen-thrash testsuite on a bunch of hosts >>> against a very recent build, and we''ve just hit this on one box: >>> >>> BUG: unable to handle kernel paging request at 15555d60 > Oh, I hit that if I do cat /sys/kernel/debug/kernel_page_tables. > > On 64-bit: > sh-4.1# cd /sys/kernel/debug > sh-4.1# ls > acpi boot_params kernel_page_tables mce usb x86 > bdi dri kprobes tracing wakeup_sources xen > sh-4.1# cat kernel_page_tables > [ 108.263615] BUG: unable to handle kernel paging request at ffff9d5555555000 > [ 108.270480] IP: [<ffffffff81036bf0>] ptdump_show+0xc6/0x2f6 > [ 108.276122] PGD 0 > [ 108.278205] Oops: 0000 [#1] SMP > [ 108.281504] last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:06:03.0/class > [ 108.289316] CPU 3 > [ 108.291137] Modules linked in: xen_evtchn video sg sd_mod radeon ahci libahci libata fbcon scsi_mod tileblit e1000e font bitblit ttm softcursor drm_kms_helper xen_blkfront xen_netfront fb_sys_fops sysimgblt sysfillrect syscopyarea xenfs [last unloaded: dump_dma] > [ 108.314658] > [ 108.316221] Pid: 3025, comm: cat Not tainted 2.6.38-rc2-00038-g7c92066 #1 DX58SO/ > [ 108.324466] RIP: e030:[<ffffffff81036bf0>] [<ffffffff81036bf0>] ptdump_show+0xc6/0x2f6 > [ 108.332538] RSP: e02b:ffff8800868f5dd8 EFLAGS: 00010286 > [ 108.337919] RAX: ffff800000000000 RBX: ffff880085117700 RCX: 0000000000000000 > [ 108.345126] RDX: ffff9d555555Killed5ff8 RSI: 000000 > 0000000000 RDI: sh-4.1# 0000000152460067 > [ 108.345128] RBP: ffff8800868f5e78 R08: 0000000000000006 R09: 0000000000000000 > [ 108.345130] R10: 00007fffec97cc30 R11: 0000000000000246 R12: ffff9d5555555000 > [ 108.345132] R13: ffff880085117700 R14: ffffffff81803800 R15: ffff880000000000 > [ 108.345137] FS: 00007f6c5a3d7700(0000) GS:ffff88009ce83000(0000) knlGS:0000000000000000 > [ 108.345139] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 108.345140] CR2: ffff9d5555555000 CR3: 000000008b2c6000 CR4: 0000000000002660 > [ 108.345142] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 108.345144] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [ 108.345146] Process cat (pid: 3025, threadinfo ffff8800868f4000, task ffff88009dbe8000) > [ 108.345148] Stack: > [ 108.345149] ffffffff81006ca2 0000000000000246 00007fffec97cc30 ffffffff8110efe7 > [ 108.345152] ffff9d5555555ff8 0000800000000000 ffff88009f0029c0 0000800000000000 > [ 108.345155] 0000000000000001 0000000000000000 0000000000000000 ffff800000000000 > [ 108.345158] Call Trace: > [ 108.345163] [<ffffffff81006ca2>] ? check_events+0x12/0x20 > [ 108.345168] [<ffffffff8110efe7>] ? seq_read+0xbf/0x34a > [ 108.345170] [<ffffffff8110efe7>] ? seq_read+0xbf/0x34a > [ 108.345173] [<ffffffff8110f0a1>] seq_read+0x179/0x34a > [ 108.345176] [<ffffffff810f5c32>] vfs_read+0xa6/0x102 > [ 108.345178] [<ffffffff810f5d47>] sys_read+0x45/0x6c > [ 108.345181] [<ffffffff8100a992>] system_call_fastpath+0x16/0x1b > [ 108.345182] Code: 0f 00 00 00 88 ff ff 48 8d 14 10 4e 8d 24 38 48 8b 45 98 48 89 55 80 48 89 45 88 48 8b 45 88 48 c1 e0 10 48 c1 f8 10 48 89 45 b8 <49> 8b 3c 24 48 85 ff 0f 84 96 01 00 00 ff 14 25 b0 18 81 81 48 > [ 108.345208] RIP [<ffffffff81036bf0>] ptdump_show+0xc6/0x2f6 > [ 108.345211] RSP <ffff8800868f5dd8> > [ 108.345212] CR2: ffff9d5555555000 > [ 108.345214] ---[ end trace 9134d308b82bc832 ]--- > > > The 32-bit hits this too, with the 15555d60 address. > > Does the xenthrash hit this file too? Do you know what file it touches when > this happens?I think you''re seeing the same address because many bogus m2p lookups return 0x55555555. I don''t think there''s any more similarity between what you''re seeing and Christopher''s report than that. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2011-Jan-31 22:22 UTC
Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request
On 01/31/2011 01:07 PM, Christopher S. Aker wrote:>> Xen: 3.4.4-rc1-pre 64bit (xenbits @ 19986) > > Dom0: 2.6.32.27-1 PAE (xen/stable-2.6.32.x) >> >> We''ve been running our xen-thrash testsuite on a bunch of hosts >> against a very recent build, and we''ve just hit this on one box: >> >> BUG: unable to handle kernel paging request at 15555d60 > > Two additional boxes out of my last test round have also hit this. > About one a week. > > Ian / Jeremy: Where do I go from here?There seems to be a moderately difficult-to-hit (but still pretty large) race in pagetable teardown. It *should* be protected by the pgd lock, so we need to work out where a teardown (or access) is happening without that lock. I think that''s going to be a matter of close code-review rather than any more testing. The interesting thing is that this problem seems to have come to the fore since the the patch that was explicitly intended to avoid it was put in :/... Before that, the race was theoretical, but AFAIK had never been observed in a pvops kernel (though it was seen in the Citrix product in non-pvops kernels, which is why we fixed it). I''ll try to stare at it in the next couple of days. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2011-Jan-31 22:25 UTC
Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request
On 01/31/2011 01:07 PM, Christopher S. Aker wrote:>> Xen: 3.4.4-rc1-pre 64bit (xenbits @ 19986) > > Dom0: 2.6.32.27-1 PAE (xen/stable-2.6.32.x) >> >> We''ve been running our xen-thrash testsuite on a bunch of hosts >> against a very recent build, and we''ve just hit this on one box: >> >> BUG: unable to handle kernel paging request at 15555d60 > > Two additional boxes out of my last test round have also hit this. > About one a week. > > Ian / Jeremy: Where do I go from here?It''s also not impossible this bug is related to the "get_user_pages" bug that has been discussed over the last few days. I need to think about that too. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christopher S. Aker
2011-Feb-14 23:52 UTC
Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request
On 1/31/11 5:25 PM, Jeremy Fitzhardinge wrote:> On 01/31/2011 01:07 PM, Christopher S. Aker wrote: >> Ian / Jeremy: Where do I go from here? > > It''s also not impossible this bug is related to the "get_user_pages" bug > that has been discussed over the last few days. I need to think about > that too.How''s that going? Any epiphanies? I''ve been trying to follow the list (and changesets) but wasn''t sure if a potential fix snuck in or if there''s something else I should be pressure cooking (an unrelated patch that may have the side effect of fixing my issue, for example). Thanks, -Chris _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2011-Feb-15 00:19 UTC
Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request
On 02/14/2011 03:52 PM, Christopher S. Aker wrote:> On 1/31/11 5:25 PM, Jeremy Fitzhardinge wrote: >> On 01/31/2011 01:07 PM, Christopher S. Aker wrote: >>> Ian / Jeremy: Where do I go from here? >> >> It''s also not impossible this bug is related to the "get_user_pages" bug >> that has been discussed over the last few days. I need to think about >> that too. > > How''s that going? Any epiphanies? I''ve been trying to follow the > list (and changesets) but wasn''t sure if a potential fix snuck in or > if there''s something else I should be pressure cooking (an unrelated > patch that may have the side effect of fixing my issue, for example).No, I had a close look at it the other day and remained stumped. It looks like the pgd is being freed while still in use, but I couldn''t see where that could happen unprotected from the lock. This is really bugging me - there''s something strange going on, which worries me. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christopher S. Aker
2011-Feb-15 01:15 UTC
Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request
On Feb 14, 2011, at 7:19 PM, Jeremy Fitzhardinge wrote:> No, I had a close look at it the other day and remained stumped. It > looks like the pgd is being freed while still in use, but I couldn''t see > where that could happen unprotected from the lock. This is really > bugging me - there''s something strange going on, which worries me.Hmmm. Well I doubt I can make any useful suggestions towards a solution at my current kernel hacking skill level, but perhaps some verbose debugging output sprinkled throughout may help? I''d be happy to reset my test suite for another round. -Chris _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel