thr3ads.net - Xen devel - [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request [Dec 2010]

If this information is useful, please help other people find it:
Share via:

Christopher S. Aker

2010-Dec-30 22:57 UTC

[Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request

Xen: 3.4.4-rc1-pre 64bit (xenbits @ 19986)
Dom0: 2.6.32.27-1 PAE (xen/stable-2.6.32.x @ 
75cc13f5aa29b4f3227d269ca165dfa8937c94fe)

We''ve been running our xen-thrash testsuite on a bunch of hosts against
a very recent build, and we''ve just hit this on one box:

BUG: unable to handle kernel paging request at 15555d60
IP: [<c1022781>] vmalloc_sync_all+0xd1/0x1f0
*pdpt = 000000001d8ee027 *pde = 0000000000000000
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/system/xen_memory/xen_memory0/info/current_kb
Modules linked in: dm_snapshot iTCO_wdt usbhid
Pid: 44, comm: xenwatch Not tainted (2.6.32.27-1 #1) X8DTU
EIP: 0061:[<c1022781>] EFLAGS: 00010007 CPU: 0
EIP is at vmalloc_sync_all+0xd1/0x1f0
EAX: 15555d60 EBX: c1a50c00 ECX: 55555001 EDX: 06855067
ESI: c173ad60 EDI: dd8f85c4 EBP: 00000009 ESP: dfd7fe64
  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
Process xenwatch (pid: 44, ti=dfd7e000 task=dfce90f0 task.ti=dfd7e000)
Stack:
  00000018 0001fc55 00000000 c0000d60 f5800000 1fc55067 00000000 1fc55067
<0> 00000000 c170025c dd2abe00 c898f180 dfd7ff20 c8a1171c c10829d0
c1081370
<0> 00000000 00000000 c12bbbd6 c1006bf4 0000000d dfd7fee4 dd2a0200
00000008
Call Trace:
  [<c10829d0>] ? alloc_vm_area+0x40/0x60
  [<c1081370>] ? f+0x0/0x10
  [<c12bbbd6>] ? blkif_map+0x36/0x1c0
  [<c1006bf4>] ? check_events+0x8/0xc
  [<c12b2b0f>] ? xenbus_gather+0x5f/0x90
  [<c12bb36c>] ? frontend_changed+0x25c/0x2d0
  [<c12b36c5>] ? xenbus_otherend_changed+0x95/0xa0
  [<c12b38bf>] ? frontend_changed+0xf/0x20
  [<c12b1f57>] ? xenwatch_thread+0x87/0x130
  [<c1048700>] ? autoremove_wake_function+0x0/0x40
  [<c12b1ed0>] ? xenwatch_thread+0x0/0x130
  [<c10484e4>] ? kthread+0x74/0x80
  [<c1048470>] ? kthread+0x0/0x80
  [<c1009e67>] ? kernel_thread_helper+0x7/0x10
Code: 04 8b 45 00 ff 15 14 2e 65 c1 8b 54 24 0c 25 00 f0 ff ff 8d 34 10 
8b 16 8b 6e 04 f6 c2 01 74 7d 89 c8 25 00 f0 ff ff 03 44 24 0c <8b> 08 
89 4c 24 04 8b 48 04 f6 44 24 04 01 75 67 89 e9 e8 18 32
EIP: [<c1022781>] vmalloc_sync_all+0xd1/0x1f0 SS:ESP 0069:dfd7fe64
CR2: 0000000015555d60
---[ end trace 7a29128cd8a0e564 ]---

And then a whole load of soft lockup traces.  Full output, hypervisor, 
and dom0 kernel are in here:

http://theshore.net/~caker/xen/BUGS/2.6.32.27/

-Chris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2010-Dec-31 01:29 UTC

head link

[Xen-devel] Re: 2.6.32.27 dom0 - BUG: unable to handle kernel paging request

On 12/31/2010 09:57 AM, Christopher S. Aker wrote:> Xen: 3.4.4-rc1-pre 64bit (xenbits @ 19986)
> Dom0: 2.6.32.27-1 PAE (xen/stable-2.6.32.x @
> 75cc13f5aa29b4f3227d269ca165dfa8937c94fe)
>
> We''ve been running our xen-thrash testsuite on a bunch of hosts
> against a very recent build, and we''ve just hit this on one box:
Ah, interesting.  This looks like something that Ian Jackson found on
one of his test machines.

What was going on at the time?

    J
>
> BUG: unable to handle kernel paging request at 15555d60
> IP: [<c1022781>] vmalloc_sync_all+0xd1/0x1f0
> *pdpt = 000000001d8ee027 *pde = 0000000000000000
> Oops: 0000 [#1] SMP
> last sysfs file:
> /sys/devices/system/xen_memory/xen_memory0/info/current_kb
> Modules linked in: dm_snapshot iTCO_wdt usbhid
> Pid: 44, comm: xenwatch Not tainted (2.6.32.27-1 #1) X8DTU
> EIP: 0061:[<c1022781>] EFLAGS: 00010007 CPU: 0
> EIP is at vmalloc_sync_all+0xd1/0x1f0
> EAX: 15555d60 EBX: c1a50c00 ECX: 55555001 EDX: 06855067
> ESI: c173ad60 EDI: dd8f85c4 EBP: 00000009 ESP: dfd7fe64
>  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
> Process xenwatch (pid: 44, ti=dfd7e000 task=dfce90f0 task.ti=dfd7e000)
> Stack:
>  00000018 0001fc55 00000000 c0000d60 f5800000 1fc55067 00000000 1fc55067
> <0> 00000000 c170025c dd2abe00 c898f180 dfd7ff20 c8a1171c c10829d0
> c1081370
> <0> 00000000 00000000 c12bbbd6 c1006bf4 0000000d dfd7fee4 dd2a0200
> 00000008
> Call Trace:
>  [<c10829d0>] ? alloc_vm_area+0x40/0x60
>  [<c1081370>] ? f+0x0/0x10
>  [<c12bbbd6>] ? blkif_map+0x36/0x1c0
>  [<c1006bf4>] ? check_events+0x8/0xc
>  [<c12b2b0f>] ? xenbus_gather+0x5f/0x90
>  [<c12bb36c>] ? frontend_changed+0x25c/0x2d0
>  [<c12b36c5>] ? xenbus_otherend_changed+0x95/0xa0
>  [<c12b38bf>] ? frontend_changed+0xf/0x20
>  [<c12b1f57>] ? xenwatch_thread+0x87/0x130
>  [<c1048700>] ? autoremove_wake_function+0x0/0x40
>  [<c12b1ed0>] ? xenwatch_thread+0x0/0x130
>  [<c10484e4>] ? kthread+0x74/0x80
>  [<c1048470>] ? kthread+0x0/0x80
>  [<c1009e67>] ? kernel_thread_helper+0x7/0x10
> Code: 04 8b 45 00 ff 15 14 2e 65 c1 8b 54 24 0c 25 00 f0 ff ff 8d 34
> 10 8b 16 8b 6e 04 f6 c2 01 74 7d 89 c8 25 00 f0 ff ff 03 44 24 0c
<8b>
> 08 89 4c 24 04 8b 48 04 f6 44 24 04 01 75 67 89 e9 e8 18 32
> EIP: [<c1022781>] vmalloc_sync_all+0xd1/0x1f0 SS:ESP 0069:dfd7fe64
> CR2: 0000000015555d60
> ---[ end trace 7a29128cd8a0e564 ]---
>
> And then a whole load of soft lockup traces.  Full output, hypervisor,
> and dom0 kernel are in here:
>
> http://theshore.net/~caker/xen/BUGS/2.6.32.27/
>
> -Chris
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christopher S. Aker

2010-Dec-31 17:19 UTC

head link

[Xen-devel] Re: 2.6.32.27 dom0 - BUG: unable to handle kernel paging request

On Dec 30, 2010, at 8:29 PM, Jeremy Fitzhardinge wrote:
> On 12/31/2010 09:57 AM, Christopher S. Aker wrote:
>> Xen: 3.4.4-rc1-pre 64bit (xenbits @ 19986)
>> Dom0: 2.6.32.27-1 PAE (xen/stable-2.6.32.x @
>> 75cc13f5aa29b4f3227d269ca165dfa8937c94fe)
>> 
>> We''ve been running our xen-thrash testsuite on a bunch of
hosts
>> against a very recent build, and we''ve just hit this on one
box:
> 
> Ah, interesting.  This looks like something that Ian Jackson found on
> one of his test machines.
> 
> What was going on at the time?
Our xen-thrash testsuite was running.  It was configured to boot:

* 5 domUs swap thrashing (eatmem.c)
* 5 domUs that busy-loop CPU
* 5 domUs running crashme w/ 2.6.18 kernel
* 5 domUs running crashme w/ pv_ops kernel
* 5 domUs in a boot up -> sleep 60 -> shut down loop
* 5 domUs in a boot up -> sleep 60 -> xm destroy loop

We kicked off this identical run on 10 hosts.  On another host I cranked these
numbers up to total about 80 domUs.  They''re all still running fine.

The one that hit this BUG is still up, if any Hypervisor output would be
helpful.

-Chris


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christopher S. Aker

2011-Jan-02 20:08 UTC

head link

Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request

On Dec 31, 2010, at 12:19 PM, Christopher S. Aker wrote:> We kicked off this identical run on 10 hosts.  On another host I cranked
these numbers up to total about 80 domUs.  They''re all still running
fine.
Scratch that - another box just hit the identical trace:

BUG: unable to handle kernel paging request at 15555d60
IP: [<c1022781>] vmalloc_sync_all+0xd1/0x1f0
*pdpt = 000000000c0ea027 *pde = 0000000000000000
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/system/xen_memory/xen_memory0/info/current_kb
Modules linked in: dm_snapshot iTCO_wdt usbhid
Pid: 44, comm: xenwatch Not tainted (2.6.32.27-1 #1) X8DTU
EIP: 0061:[<c1022781>] EFLAGS: 00010007 CPU: 2
EIP is at vmalloc_sync_all+0xd1/0x1f0
EAX: 15555d60 EBX: c1aa8f00 ECX: 55555001 EDX: 06855067
ESI: c173ad60 EDI: ddbd7944 EBP: 00000009 ESP: dfd7fe64
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
Process xenwatch (pid: 44, ti=dfd7e000 task=dfce90f0 task.ti=dfd7e000)
Stack:
00000018 0001fc55 00000000 c0000d60 f5800000 1fc55067 00000000 1fc55067
<0> 00000000 c170025c cc974c20 d422a180 dfd7ff20 c8d44118 c10829d0
c1081370
<0> 00000000 00000000 c12bbbd6 c1006bf4 00000025 dfd7fee4 cc970220
00000009
Call Trace:
[<c10829d0>] ? alloc_vm_area+0x40/0x60
[<c1081370>] ? f+0x0/0x10
[<c12bbbd6>] ? blkif_map+0x36/0x1c0
[<c1006bf4>] ? check_events+0x8/0xc
[<c12b2b0f>] ? xenbus_gather+0x5f/0x90
[<c12bb36c>] ? frontend_changed+0x25c/0x2d0
[<c12b36c5>] ? xenbus_otherend_changed+0x95/0xa0
[<c12b38bf>] ? frontend_changed+0xf/0x20
[<c12b1f57>] ? xenwatch_thread+0x87/0x130
[<c1048700>] ? autoremove_wake_function+0x0/0x40
[<c12b1ed0>] ? xenwatch_thread+0x0/0x130
[<c10484e4>] ? kthread+0x74/0x80
[<c1048470>] ? kthread+0x0/0x80
[<c1009e67>] ? kernel_thread_helper+0x7/0x10
Code: 04 8b 45 00 ff 15 14 2e 65 c1 8b 54 24 0c 25 00 f0 ff ff 8d 34 10 8b 16 8b
6e 04 f6 c2 01 74 7d 89 c8 25 00 f0 ff ff 03 44 24 0c <8b> 08 89 4c 24 04
8b 48 04 f6 44 24 04 01 75 67 89 e9 e8 18 32
EIP: [<c1022781>] vmalloc_sync_all+0xd1/0x1f0 SS:ESP 0069:dfd7fe64
CR2: 0000000015555d60
---[ end trace 48bbd5284e47e665 ]---

Thoughts or suggestions?  I''d be happy to provide any additional
information or perform more tests.

Thanks!
-Chris


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Campbell

2011-Jan-04 09:16 UTC

head link

Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request

On Thu, 2010-12-30 at 22:57 +0000, Christopher S. Aker
wrote:> Xen: 3.4.4-rc1-pre 64bit (xenbits @ 19986)
> Dom0: 2.6.32.27-1 PAE (xen/stable-2.6.32.x @ 
> 75cc13f5aa29b4f3227d269ca165dfa8937c94fe)
> 
> We''ve been running our xen-thrash testsuite on a bunch of hosts
against
> a very recent build, and we''ve just hit this on one box:
> 
> BUG: unable to handle kernel paging request at 15555d60
> IP: [<c1022781>] vmalloc_sync_all+0xd1/0x1f0
This looks similar to the issue which we thought was resolved via
b2464c422fb44275deeb5770b668351860f68e0e.

Can you convert 0xc10022781 to an exact line number? If you have a
vmlinux with symbols then:
	$ gdb vmlinux
	(gdb) list *0xc10022781
should tell you the file and line.

	(gdb) disas 0xc10022781
might tell us something too.

Ian.
> *pdpt = 000000001d8ee027 *pde = 0000000000000000
> Oops: 0000 [#1] SMP
> last sysfs file: /sys/devices/system/xen_memory/xen_memory0/info/current_kb
> Modules linked in: dm_snapshot iTCO_wdt usbhid
> Pid: 44, comm: xenwatch Not tainted (2.6.32.27-1 #1) X8DTU
> EIP: 0061:[<c1022781>] EFLAGS: 00010007 CPU: 0
> EIP is at vmalloc_sync_all+0xd1/0x1f0
> EAX: 15555d60 EBX: c1a50c00 ECX: 55555001 EDX: 06855067
> ESI: c173ad60 EDI: dd8f85c4 EBP: 00000009 ESP: dfd7fe64
>   DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
> Process xenwatch (pid: 44, ti=dfd7e000 task=dfce90f0 task.ti=dfd7e000)
> Stack:
>   00000018 0001fc55 00000000 c0000d60 f5800000 1fc55067 00000000 1fc55067
> <0> 00000000 c170025c dd2abe00 c898f180 dfd7ff20 c8a1171c c10829d0
c1081370
> <0> 00000000 00000000 c12bbbd6 c1006bf4 0000000d dfd7fee4 dd2a0200
00000008
> Call Trace:
>   [<c10829d0>] ? alloc_vm_area+0x40/0x60
>   [<c1081370>] ? f+0x0/0x10
>   [<c12bbbd6>] ? blkif_map+0x36/0x1c0
>   [<c1006bf4>] ? check_events+0x8/0xc
>   [<c12b2b0f>] ? xenbus_gather+0x5f/0x90
>   [<c12bb36c>] ? frontend_changed+0x25c/0x2d0
>   [<c12b36c5>] ? xenbus_otherend_changed+0x95/0xa0
>   [<c12b38bf>] ? frontend_changed+0xf/0x20
>   [<c12b1f57>] ? xenwatch_thread+0x87/0x130
>   [<c1048700>] ? autoremove_wake_function+0x0/0x40
>   [<c12b1ed0>] ? xenwatch_thread+0x0/0x130
>   [<c10484e4>] ? kthread+0x74/0x80
>   [<c1048470>] ? kthread+0x0/0x80
>   [<c1009e67>] ? kernel_thread_helper+0x7/0x10
> Code: 04 8b 45 00 ff 15 14 2e 65 c1 8b 54 24 0c 25 00 f0 ff ff 8d 34 10 
> 8b 16 8b 6e 04 f6 c2 01 74 7d 89 c8 25 00 f0 ff ff 03 44 24 0c <8b>
08
> 89 4c 24 04 8b 48 04 f6 44 24 04 01 75 67 89 e9 e8 18 32
> EIP: [<c1022781>] vmalloc_sync_all+0xd1/0x1f0 SS:ESP 0069:dfd7fe64
> CR2: 0000000015555d60
> ---[ end trace 7a29128cd8a0e564 ]---
> 
> And then a whole load of soft lockup traces.  Full output, hypervisor, 
> and dom0 kernel are in here:
> 
> http://theshore.net/~caker/xen/BUGS/2.6.32.27/
> 
> -Chris
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christopher S. Aker

2011-Jan-04 20:30 UTC

head link

Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request

On Jan 4, 2011, at 4:16 AM, Ian Campbell wrote:> This looks similar to the issue which we thought was resolved via
> b2464c422fb44275deeb5770b668351860f68e0e.
Verified my tree has this changeset...
> Can you convert 0xc10022781 to an exact line number? If you have a
> vmlinux with symbols then:
> 	$ gdb vmlinux
> 	(gdb) list *0xc10022781
> should tell you the file and line.
> 
> 	(gdb) disas 0xc10022781
> might tell us something too.
> 
> Ian.
~# uname -rv
2.6.32.27-1 #1 SMP Wed Dec 29 17:47:30 UTC 2010
~# gdb vmlinux /proc/kcore -s /boot/System.map-2.6.32.27-1 
GNU gdb 6.4-debian
Copyright 2005 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for
details.
This GDB was configured as "i486-linux-gnu"...(no debugging symbols
found)
Using host libthread_db library
"/lib/tls/i686/cmov/libthread_db.so.1".

#0  0x00000000 in ?? ()
(gdb) list *0xc1022781 
No symbol table is loaded.  Use the "file" command.
(gdb) disas 0xc1022781
Dump of assembler code for function vmalloc_sync_all:
0xc10226b0 <vmalloc_sync_all+0>:        push   %ebp
0xc10226b1 <vmalloc_sync_all+1>:        push   %edi
0xc10226b2 <vmalloc_sync_all+2>:        push   %esi
0xc10226b3 <vmalloc_sync_all+3>:        push   %ebx
0xc10226b4 <vmalloc_sync_all+4>:        sub    $0x28,%esp
0xc10226b7 <vmalloc_sync_all+7>:        mov    0xc1652ca4,%eax
0xc10226bc <vmalloc_sync_all+12>:       test   %eax,%eax
0xc10226be <vmalloc_sync_all+14>:       jne    0xc1022898
<vmalloc_sync_all+488>
0xc10226c4 <vmalloc_sync_all+20>:       mov    0xc171e18c,%eax
0xc10226c9 <vmalloc_sync_all+25>:       add    $0x800000,%eax
0xc10226ce <vmalloc_sync_all+30>:       and    $0xffe00000,%eax
0xc10226d3 <vmalloc_sync_all+35>:       cmp    $0xbfffffff,%eax
0xc10226d8 <vmalloc_sync_all+40>:       mov    %eax,0x10(%esp)
0xc10226dc <vmalloc_sync_all+44>:       jbe    0xc1022898
<vmalloc_sync_all+488>
0xc10226e2 <vmalloc_sync_all+50>:       cmp    0xc1652ed8,%eax
0xc10226e8 <vmalloc_sync_all+56>:       jae    0xc1022898
<vmalloc_sync_all+488>
0xc10226ee <vmalloc_sync_all+62>:       data16
0xc10226ef <vmalloc_sync_all+63>:       nop    
0xc10226f0 <vmalloc_sync_all+64>:       mov    $0xc170785c,%eax
0xc10226f5 <vmalloc_sync_all+69>:       call   0xc14ed140
<_spin_lock_irqsave>
0xc10226fa <vmalloc_sync_all+74>:       mov    %eax,0x24(%esp)
0xc10226fe <vmalloc_sync_all+78>:       mov    0xc1652ec4,%eax
0xc1022703 <vmalloc_sync_all+83>:       lea    0xffffffe8(%eax),%ebx
0xc1022706 <vmalloc_sync_all+86>:       mov    0x18(%ebx),%edx
0xc1022709 <vmalloc_sync_all+89>:       prefetchnta (%edx)
0xc102270c <vmalloc_sync_all+92>:       nop    
0xc102270d <vmalloc_sync_all+93>:       cmp    $0xc1652ec4,%eax
0xc1022712 <vmalloc_sync_all+98>:       je     0xc1022868
<vmalloc_sync_all+440>
0xc1022718 <vmalloc_sync_all+104>:      mov    0x10(%esp),%eax
0xc102271c <vmalloc_sync_all+108>:      shr    $0x1e,%eax
0xc102271f <vmalloc_sync_all+111>:      shl    $0x3,%eax
0xc1022722 <vmalloc_sync_all+114>:      mov    %eax,(%esp)
0xc1022725 <vmalloc_sync_all+117>:      mov    0x10(%esp),%eax
0xc1022729 <vmalloc_sync_all+121>:      shr    $0x12,%eax
0xc102272c <vmalloc_sync_all+124>:      and    $0xff8,%eax
0xc1022731 <vmalloc_sync_all+129>:      sub    $0x40000000,%eax
0xc1022736 <vmalloc_sync_all+134>:      mov    %eax,0xc(%esp)
0xc102273a <vmalloc_sync_all+138>:      jmp    0xc10227c8
<vmalloc_sync_all+280>
0xc102273f <vmalloc_sync_all+143>:      nop    
0xc1022740 <vmalloc_sync_all+144>:      mov    (%esp),%edx
0xc1022743 <vmalloc_sync_all+147>:      mov    (%eax,%edx,1),%ecx
0xc1022746 <vmalloc_sync_all+150>:      mov    0x4(%eax,%edx,1),%edx
0xc102274a <vmalloc_sync_all+154>:      mov    %ecx,%eax
0xc102274c <vmalloc_sync_all+156>:      call   *0xc1652e14
0xc1022752 <vmalloc_sync_all+162>:      mov    %eax,%ecx
0xc1022754 <vmalloc_sync_all+164>:      mov    0x4(%ebp),%edx
0xc1022757 <vmalloc_sync_all+167>:      mov    0x0(%ebp),%eax
0xc102275a <vmalloc_sync_all+170>:      call   *0xc1652e14
0xc1022760 <vmalloc_sync_all+176>:      mov    0xc(%esp),%edx
0xc1022764 <vmalloc_sync_all+180>:      and    $0xfffff000,%eax
0xc1022769 <vmalloc_sync_all+185>:      lea    (%eax,%edx,1),%esi
0xc102276c <vmalloc_sync_all+188>:      mov    (%esi),%edx
0xc102276e <vmalloc_sync_all+190>:      mov    0x4(%esi),%ebp
0xc1022771 <vmalloc_sync_all+193>:      test   $0x1,%dl
0xc1022774 <vmalloc_sync_all+196>:      je     0xc10227f3
<vmalloc_sync_all+323>
0xc1022776 <vmalloc_sync_all+198>:      mov    %ecx,%eax
0xc1022778 <vmalloc_sync_all+200>:      and    $0xfffff000,%eax
0xc102277d <vmalloc_sync_all+205>:      add    0xc(%esp),%eax
0xc1022781 <vmalloc_sync_all+209>:      mov    (%eax),%ecx
0xc1022783 <vmalloc_sync_all+211>:      mov    %ecx,0x4(%esp)
0xc1022787 <vmalloc_sync_all+215>:      mov    0x4(%eax),%ecx
0xc102278a <vmalloc_sync_all+218>:      testb  $0x1,0x4(%esp)
0xc102278f <vmalloc_sync_all+223>:      jne    0xc10227f8
<vmalloc_sync_all+328>
0xc1022791 <vmalloc_sync_all+225>:      mov    %ebp,%ecx
0xc1022793 <vmalloc_sync_all+227>:      call   0xc10059b0
<xen_set_pmd>
0xc1022798 <vmalloc_sync_all+232>:      nop    
0xc1022799 <vmalloc_sync_all+233>:      lea    0x0(%esi),%esi
0xc10227a0 <vmalloc_sync_all+240>:      mov    %edi,%eax
0xc10227a2 <vmalloc_sync_all+242>:      call   0xc10074b0
<xen_spin_unlock>
0xc10227a7 <vmalloc_sync_all+247>:      nop    
0xc10227a8 <vmalloc_sync_all+248>:      test   %esi,%esi
0xc10227aa <vmalloc_sync_all+250>:      je     0xc1022868
<vmalloc_sync_all+440>
0xc10227b0 <vmalloc_sync_all+256>:      mov    0x18(%ebx),%eax
0xc10227b3 <vmalloc_sync_all+259>:      lea    0xffffffe8(%eax),%ebx
0xc10227b6 <vmalloc_sync_all+262>:      mov    0x18(%ebx),%edx
0xc10227b9 <vmalloc_sync_all+265>:      prefetchnta (%edx)
0xc10227bc <vmalloc_sync_all+268>:      nop    
0xc10227bd <vmalloc_sync_all+269>:      cmp    $0xc1652ec4,%eax
0xc10227c2 <vmalloc_sync_all+274>:      je     0xc1022868
<vmalloc_sync_all+440>
0xc10227c8 <vmalloc_sync_all+280>:      mov    %ebx,%eax
0xc10227ca <vmalloc_sync_all+282>:      call   0xc1025370
<pgd_page_get_mm>
0xc10227cf <vmalloc_sync_all+287>:      lea    0x44(%eax),%edi
0xc10227d2 <vmalloc_sync_all+290>:      mov    %edi,%eax
0xc10227d4 <vmalloc_sync_all+292>:      call   0xc14ed240
<_spin_lock>
0xc10227d9 <vmalloc_sync_all+297>:      mov    %ebx,%eax
0xc10227db <vmalloc_sync_all+299>:      call   0xc10757a0
<page_address>
0xc10227e0 <vmalloc_sync_all+304>:      mov    (%esp),%ebp
0xc10227e3 <vmalloc_sync_all+307>:      add    0xc1655f64,%ebp
0xc10227e9 <vmalloc_sync_all+313>:      testb  $0x1,0x0(%ebp)
0xc10227ed <vmalloc_sync_all+317>:      jne    0xc1022740
<vmalloc_sync_all+144>
0xc10227f3 <vmalloc_sync_all+323>:      xor    %esi,%esi
0xc10227f5 <vmalloc_sync_all+325>:      jmp    0xc10227a0
<vmalloc_sync_all+240>
0xc10227f7 <vmalloc_sync_all+327>:      nop    
0xc10227f8 <vmalloc_sync_all+328>:      mov    0x4(%esp),%eax
0xc10227fc <vmalloc_sync_all+332>:      mov    %ecx,%edx
0xc10227fe <vmalloc_sync_all+334>:      call   *0xc1652e2c
0xc1022804 <vmalloc_sync_all+340>:      mov    %edx,%ebp
0xc1022806 <vmalloc_sync_all+342>:      mov    %eax,%ecx
0xc1022808 <vmalloc_sync_all+344>:      mov    0x4(%esi),%edx
0xc102280b <vmalloc_sync_all+347>:      mov    (%esi),%eax
0xc102280d <vmalloc_sync_all+349>:      call   *0xc1652e2c
0xc1022813 <vmalloc_sync_all+355>:      mov    %edx,0x4(%esp)
0xc1022817 <vmalloc_sync_all+359>:      mov    %ecx,0x14(%esp)
0xc102281b <vmalloc_sync_all+363>:      mov    0x14(%esp),%edx
0xc102281f <vmalloc_sync_all+367>:      mov    %ebp,0x18(%esp)
0xc1022823 <vmalloc_sync_all+371>:      mov    0x18(%esp),%ecx
0xc1022827 <vmalloc_sync_all+375>:      mov    %eax,0x1c(%esp)
0xc102282b <vmalloc_sync_all+379>:      mov    0x4(%esp),%eax
0xc102282f <vmalloc_sync_all+383>:      shrd   $0xc,%ecx,%edx
0xc1022833 <vmalloc_sync_all+387>:      mov    %edx,%ecx
0xc1022835 <vmalloc_sync_all+389>:      mov    %eax,0x20(%esp)
0xc1022839 <vmalloc_sync_all+393>:      mov    0x1c(%esp),%eax
0xc102283d <vmalloc_sync_all+397>:      shl    $0x5,%ecx
0xc1022840 <vmalloc_sync_all+400>:      mov    0x20(%esp),%edx
0xc1022844 <vmalloc_sync_all+404>:      shrd   $0xc,%edx,%eax
0xc1022848 <vmalloc_sync_all+408>:      mov    %eax,0x4(%esp)
0xc102284c <vmalloc_sync_all+412>:      mov    0x4(%esp),%eax
0xc1022850 <vmalloc_sync_all+416>:      shr    $0xc,%edx
0xc1022853 <vmalloc_sync_all+419>:      mov    %edx,0x8(%esp)
0xc1022857 <vmalloc_sync_all+423>:      shl    $0x5,%eax
0xc102285a <vmalloc_sync_all+426>:      cmp    %eax,%ecx
0xc102285c <vmalloc_sync_all+428>:      je     0xc10227a0
<vmalloc_sync_all+240>
0xc1022862 <vmalloc_sync_all+434>:      ud2a   
0xc1022864 <vmalloc_sync_all+436>:      jmp    0xc1022864
<vmalloc_sync_all+436>
0xc1022866 <vmalloc_sync_all+438>:      data16
0xc1022867 <vmalloc_sync_all+439>:      nop    
0xc1022868 <vmalloc_sync_all+440>:      mov    0x24(%esp),%edx
0xc102286c <vmalloc_sync_all+444>:      mov    $0xc170785c,%eax
0xc1022871 <vmalloc_sync_all+449>:      call   0xc14ed260
<_spin_unlock_irqrestore>
0xc1022876 <vmalloc_sync_all+454>:      addl   $0x200000,0x10(%esp)
0xc102287e <vmalloc_sync_all+462>:      cmpl   $0xbfffffff,0x10(%esp)
0xc1022886 <vmalloc_sync_all+470>:      jbe    0xc1022898
<vmalloc_sync_all+488>
0xc1022888 <vmalloc_sync_all+472>:      mov    0x10(%esp),%edx
0xc102288c <vmalloc_sync_all+476>:      cmp    %edx,0xc1652ed8
0xc1022892 <vmalloc_sync_all+482>:      ja     0xc10226f0
<vmalloc_sync_all+64>
0xc1022898 <vmalloc_sync_all+488>:      add    $0x28,%esp
0xc102289b <vmalloc_sync_all+491>:      pop    %ebx
0xc102289c <vmalloc_sync_all+492>:      pop    %esi
0xc102289d <vmalloc_sync_all+493>:      pop    %edi
0xc102289e <vmalloc_sync_all+494>:      pop    %ebp
0xc102289f <vmalloc_sync_all+495>:      ret    
End of assembler dump.
(gdb) quit

-Chris


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Campbell

2011-Jan-04 20:34 UTC

head link

Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request

On Tue, 2011-01-04 at 20:30 +0000, Christopher S. Aker
wrote:> 
> #0  0x00000000 in ?? ()
> (gdb) list *0xc1022781 
> No symbol table is loaded.  Use the "file" command. 
I think you need to enable CONFIG_DEBUG_INFO for this to work.

I''ll see what I can figure out from the rest tomorrow.

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christopher S. Aker

2011-Jan-04 21:59 UTC

head link

Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request

On Jan 4, 2011, at 3:34 PM, Ian Campbell wrote:> On Tue, 2011-01-04 at 20:30 +0000, Christopher S. Aker wrote:
>> 
>> No symbol table is loaded.  Use the "file" command. 
> 
> I think you need to enable CONFIG_DEBUG_INFO for this to work.
I rebuilt with CONFIG_DEBUG_INFO, and surprisingly it appears valid at the same
address:

# gdb vmlinux
GNU gdb (GDB) 7.0-ubuntu
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show
copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /build/xen/dom0/pv_ops/2.6.32.27-1-debug/vmlinux...done.
(gdb) list *0xc1022781
0xc1022781 is in vmalloc_sync_all
(/build/xen/dom0/pv_ops/2.6.32.27-1-debug/arch/x86/include/asm/pgtable.h:434).
429     #define pud_page(pud)           pfn_to_page(pud_val(pud) >>
PAGE_SHIFT)
430
431     /* Find an entry in the second-level page table.. */
432     static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
433     {
434             return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
435     }
436
437     static inline int pud_large(pud_t pud)
438     {
(gdb) quit

-Chris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christopher S. Aker

2011-Jan-09 18:07 UTC

head link

Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request

On Jan 4, 2011, at 4:59 PM, Christopher S. Aker wrote:> 
> I rebuilt with CONFIG_DEBUG_INFO, and surprisingly it appears valid at the
same address:
> 
> # gdb vmlinux
> (gdb) list *0xc1022781
> 0xc1022781 is in vmalloc_sync_all
(/build/xen/dom0/pv_ops/2.6.32.27-1-debug/arch/x86/include/asm/pgtable.h:434).
> 429     #define pud_page(pud)           pfn_to_page(pud_val(pud) >>
PAGE_SHIFT)
> 430
> 431     /* Find an entry in the second-level page table.. */
> 432     static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
> 433     {
> 434             return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
> 435     }
> 436
> 437     static inline int pud_large(pud_t pud)
> 438     {
We hit the BUG again on a third test box -- at least it''s fairly easy
to reproduce.  Has anyone had a chance to poke at this, or have a suggestion for
something for me to try/test?

Thanks,
-Chris


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Jan-10 18:56 UTC

head link

Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request

On Sun, Jan 09, 2011 at 01:07:26PM -0500, Christopher S. Aker
wrote:> On Jan 4, 2011, at 4:59 PM, Christopher S. Aker wrote:
> > 
> > I rebuilt with CONFIG_DEBUG_INFO, and surprisingly it appears valid at
the same address:
> > 
> > # gdb vmlinux
> > (gdb) list *0xc1022781
> > 0xc1022781 is in vmalloc_sync_all
(/build/xen/dom0/pv_ops/2.6.32.27-1-debug/arch/x86/include/asm/pgtable.h:434).
> > 429     #define pud_page(pud)           pfn_to_page(pud_val(pud)
>> PAGE_SHIFT)
> > 430
> > 431     /* Find an entry in the second-level page table.. */
> > 432     static inline pmd_t *pmd_offset(pud_t *pud, unsigned long
address)
> > 433     {
> > 434             return (pmd_t *)pud_page_vaddr(*pud) +
pmd_index(address);
> > 435     }
> > 436
> > 437     static inline int pud_large(pud_t pud)
> > 438     {
> 
> We hit the BUG again on a third test box -- at least it''s fairly
easy to reproduce.  Has anyone had a chance to poke at this, or have a
suggestion for something for me to try/test?
Which test makes it easy to reproduce? Oh wait, you have a whole bunch of guests
pounding. Is it possible to narrow down which type of test is causing this? Or
can
you put up the domU guests along with the xm config files to try to reproduce
this?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

John Weekes

2011-Jan-10 21:49 UTC

head link

Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request

> We hit the BUG again on a third test box -- at least it''s fairly
easy to reproduce.  Has anyone had a chance to poke at this, or have a
suggestion for something for me to try/test?
Have you tried raising /proc/sys/vm/min_free_kbytes ? When I was seeing 
a similar "unable to handle kernel paging request" error on some 
machines, I bumped mine up to 32768, and it seemed to eliminate the problem.

-John

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Jan-13 14:43 UTC

head link

Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request

On Sun, Jan 09, 2011 at 01:07:26PM -0500, Christopher S. Aker
wrote:> On Jan 4, 2011, at 4:59 PM, Christopher S. Aker wrote:
> > 
> > I rebuilt with CONFIG_DEBUG_INFO, and surprisingly it appears valid at
the same address:
> > 
> > # gdb vmlinux
> > (gdb) list *0xc1022781
> > 0xc1022781 is in vmalloc_sync_all
(/build/xen/dom0/pv_ops/2.6.32.27-1-debug/arch/x86/include/asm/pgtable.h:434).
> > 429     #define pud_page(pud)           pfn_to_page(pud_val(pud)
>> PAGE_SHIFT)
> > 430
> > 431     /* Find an entry in the second-level page table.. */
> > 432     static inline pmd_t *pmd_offset(pud_t *pud, unsigned long
address)
> > 433     {
> > 434             return (pmd_t *)pud_page_vaddr(*pud) +
pmd_index(address);
> > 435     }
> > 436
> > 437     static inline int pud_large(pud_t pud)
> > 438     {
> 
> We hit the BUG again on a third test box -- at least it''s fairly
easy to reproduce.  Has anyone had a chance to poke at this, or have a
suggestion for something for me to try/test?
I don''t have that much memory as you, so I''ve only been
running a smaller subset of those guests.
So far, nothing yet. How long did it take you to hit it? 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christopher S. Aker

2011-Jan-15 15:57 UTC

head link

Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request

On Jan 10, 2011, at 4:49 PM, John Weekes wrote:> Have you tried raising /proc/sys/vm/min_free_kbytes ? When I was seeing a
similar "unable to handle kernel paging request" error on some
machines, I bumped mine up to 32768, and it seemed to eliminate the problem.
Thanks for your suggestion, John.  I reset all 16 test machines and included a
min_free_kbyte of 32M.  Unfortunately one machine hit the identical BUG within
48 hours.  So, this hasn''t fixed it.

I''m willing to help in any way I can.

-Chris


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christopher S. Aker

2011-Jan-31 21:07 UTC

head link

Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request

> Xen: 3.4.4-rc1-pre 64bit (xenbits @ 19986)
 > Dom0: 2.6.32.27-1 PAE (xen/stable-2.6.32.x)>
> We''ve been running our xen-thrash testsuite on a bunch of hosts
> against a very recent build, and we''ve just hit this on one box:
>
> BUG: unable to handle kernel paging request at 15555d60
Two additional boxes out of my last test round have also hit this. 
About one a week.

Ian / Jeremy:  Where do I go from here?

-Chris


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Jan-31 21:17 UTC

head link

Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request

On Mon, Jan 31, 2011 at 04:07:18PM -0500, Christopher S. Aker
wrote:> >Xen: 3.4.4-rc1-pre 64bit (xenbits @ 19986)
> > Dom0: 2.6.32.27-1 PAE (xen/stable-2.6.32.x)
> >
> >We''ve been running our xen-thrash testsuite on a bunch of
hosts
> >against a very recent build, and we''ve just hit this on one
box:
> >
> >BUG: unable to handle kernel paging request at 15555d60
Oh, I hit that if I do cat /sys/kernel/debug/kernel_page_tables.

On 64-bit:
sh-4.1# cd /sys/kernel/debug
sh-4.1# ls
acpi  boot_params  kernel_page_tables  mce      usb             x86
bdi   dri          kprobes             tracing  wakeup_sources  xen
sh-4.1# cat kernel_page_tables
[  108.263615] BUG: unable to handle kernel paging request at ffff9d5555555000
[  108.270480] IP: [<ffffffff81036bf0>] ptdump_show+0xc6/0x2f6
[  108.276122] PGD 0 
[  108.278205] Oops: 0000 [#1] SMP 
[  108.281504] last sysfs file:
/sys/devices/pci0000:00/0000:00:1e.0/0000:06:03.0/class
[  108.289316] CPU 3 
[  108.291137] Modules linked in: xen_evtchn video sg sd_mod radeon ahci libahci
libata fbcon scsi_mod tileblit e1000e font bitblit ttm softcursor drm_kms_helper
xen_blkfront xen_netfront fb_sys_fops sysimgblt sysfillrect syscopyarea xenfs
[last unloaded: dump_dma]
[  108.314658] 
[  108.316221] Pid: 3025, comm: cat Not tainted 2.6.38-rc2-00038-g7c92066 #1
DX58SO/
[  108.324466] RIP: e030:[<ffffffff81036bf0>]  [<ffffffff81036bf0>]
ptdump_show+0xc6/0x2f6
[  108.332538] RSP: e02b:ffff8800868f5dd8  EFLAGS: 00010286
[  108.337919] RAX: ffff800000000000 RBX: ffff880085117700 RCX: 0000000000000000
[  108.345126] RDX: ffff9d555555Killed5ff8 RSI: 000000
0000000000 RDI: sh-4.1# 0000000152460067
[  108.345128] RBP: ffff8800868f5e78 R08: 0000000000000006 R09: 0000000000000000
[  108.345130] R10: 00007fffec97cc30 R11: 0000000000000246 R12: ffff9d5555555000
[  108.345132] R13: ffff880085117700 R14: ffffffff81803800 R15: ffff880000000000
[  108.345137] FS:  00007f6c5a3d7700(0000) GS:ffff88009ce83000(0000)
knlGS:0000000000000000
[  108.345139] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[  108.345140] CR2: ffff9d5555555000 CR3: 000000008b2c6000 CR4: 0000000000002660
[  108.345142] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  108.345144] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  108.345146] Process cat (pid: 3025, threadinfo ffff8800868f4000, task
ffff88009dbe8000)
[  108.345148] Stack:
[  108.345149]  ffffffff81006ca2 0000000000000246 00007fffec97cc30
ffffffff8110efe7
[  108.345152]  ffff9d5555555ff8 0000800000000000 ffff88009f0029c0
0000800000000000
[  108.345155]  0000000000000001 0000000000000000 0000000000000000
ffff800000000000
[  108.345158] Call Trace:
[  108.345163]  [<ffffffff81006ca2>] ? check_events+0x12/0x20
[  108.345168]  [<ffffffff8110efe7>] ? seq_read+0xbf/0x34a
[  108.345170]  [<ffffffff8110efe7>] ? seq_read+0xbf/0x34a
[  108.345173]  [<ffffffff8110f0a1>] seq_read+0x179/0x34a
[  108.345176]  [<ffffffff810f5c32>] vfs_read+0xa6/0x102
[  108.345178]  [<ffffffff810f5d47>] sys_read+0x45/0x6c
[  108.345181]  [<ffffffff8100a992>] system_call_fastpath+0x16/0x1b
[  108.345182] Code: 0f 00 00 00 88 ff ff 48 8d 14 10 4e 8d 24 38 48 8b 45 98 48
89 55 80 48 89 45 88 48 8b 45 88 48 c1 e0 10 48 c1 f8 10 48 89 45 b8 <49>
8b 3c 24 48 85 ff 0f 84 96 01 00 00 ff 14 25 b0 18 81 81 48
[  108.345208] RIP  [<ffffffff81036bf0>] ptdump_show+0xc6/0x2f6
[  108.345211]  RSP <ffff8800868f5dd8>
[  108.345212] CR2: ffff9d5555555000
[  108.345214] ---[ end trace 9134d308b82bc832 ]---


The 32-bit hits this too, with the 15555d60 address.

Does the xenthrash hit this file too? Do you know what file it touches when
this happens?> 
> Two additional boxes out of my last test round have also hit this.
> About one a week.
> 
> Ian / Jeremy:  Where do I go from here?
> 
> -Chris
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2011-Jan-31 22:19 UTC

head link

Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request

On 01/31/2011 01:17 PM, Konrad Rzeszutek Wilk wrote:> On Mon, Jan 31, 2011 at 04:07:18PM -0500, Christopher S. Aker wrote:
>>> Xen: 3.4.4-rc1-pre 64bit (xenbits @ 19986)
>>> Dom0: 2.6.32.27-1 PAE (xen/stable-2.6.32.x)
>>>
>>> We''ve been running our xen-thrash testsuite on a bunch of
hosts
>>> against a very recent build, and we''ve just hit this on
one box:
>>>
>>> BUG: unable to handle kernel paging request at 15555d60
> Oh, I hit that if I do cat /sys/kernel/debug/kernel_page_tables.
>
> On 64-bit:
> sh-4.1# cd /sys/kernel/debug
> sh-4.1# ls
> acpi  boot_params  kernel_page_tables  mce      usb             x86
> bdi   dri          kprobes             tracing  wakeup_sources  xen
> sh-4.1# cat kernel_page_tables
> [  108.263615] BUG: unable to handle kernel paging request at
ffff9d5555555000
> [  108.270480] IP: [<ffffffff81036bf0>] ptdump_show+0xc6/0x2f6
> [  108.276122] PGD 0 
> [  108.278205] Oops: 0000 [#1] SMP 
> [  108.281504] last sysfs file:
/sys/devices/pci0000:00/0000:00:1e.0/0000:06:03.0/class
> [  108.289316] CPU 3 
> [  108.291137] Modules linked in: xen_evtchn video sg sd_mod radeon ahci
libahci libata fbcon scsi_mod tileblit e1000e font bitblit ttm softcursor
drm_kms_helper xen_blkfront xen_netfront fb_sys_fops sysimgblt sysfillrect
syscopyarea xenfs [last unloaded: dump_dma]
> [  108.314658] 
> [  108.316221] Pid: 3025, comm: cat Not tainted 2.6.38-rc2-00038-g7c92066
#1 DX58SO/
> [  108.324466] RIP: e030:[<ffffffff81036bf0>] 
[<ffffffff81036bf0>] ptdump_show+0xc6/0x2f6
> [  108.332538] RSP: e02b:ffff8800868f5dd8  EFLAGS: 00010286
> [  108.337919] RAX: ffff800000000000 RBX: ffff880085117700 RCX:
0000000000000000
> [  108.345126] RDX: ffff9d555555Killed5ff8 RSI: 000000
> 0000000000 RDI: sh-4.1# 0000000152460067
> [  108.345128] RBP: ffff8800868f5e78 R08: 0000000000000006 R09:
0000000000000000
> [  108.345130] R10: 00007fffec97cc30 R11: 0000000000000246 R12:
ffff9d5555555000
> [  108.345132] R13: ffff880085117700 R14: ffffffff81803800 R15:
ffff880000000000
> [  108.345137] FS:  00007f6c5a3d7700(0000) GS:ffff88009ce83000(0000)
knlGS:0000000000000000
> [  108.345139] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> [  108.345140] CR2: ffff9d5555555000 CR3: 000000008b2c6000 CR4:
0000000000002660
> [  108.345142] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
> [  108.345144] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
> [  108.345146] Process cat (pid: 3025, threadinfo ffff8800868f4000, task
ffff88009dbe8000)
> [  108.345148] Stack:
> [  108.345149]  ffffffff81006ca2 0000000000000246 00007fffec97cc30
ffffffff8110efe7
> [  108.345152]  ffff9d5555555ff8 0000800000000000 ffff88009f0029c0
0000800000000000
> [  108.345155]  0000000000000001 0000000000000000 0000000000000000
ffff800000000000
> [  108.345158] Call Trace:
> [  108.345163]  [<ffffffff81006ca2>] ? check_events+0x12/0x20
> [  108.345168]  [<ffffffff8110efe7>] ? seq_read+0xbf/0x34a
> [  108.345170]  [<ffffffff8110efe7>] ? seq_read+0xbf/0x34a
> [  108.345173]  [<ffffffff8110f0a1>] seq_read+0x179/0x34a
> [  108.345176]  [<ffffffff810f5c32>] vfs_read+0xa6/0x102
> [  108.345178]  [<ffffffff810f5d47>] sys_read+0x45/0x6c
> [  108.345181]  [<ffffffff8100a992>] system_call_fastpath+0x16/0x1b
> [  108.345182] Code: 0f 00 00 00 88 ff ff 48 8d 14 10 4e 8d 24 38 48 8b 45
98 48 89 55 80 48 89 45 88 48 8b 45 88 48 c1 e0 10 48 c1 f8 10 48 89 45 b8
<49> 8b 3c 24 48 85 ff 0f 84 96 01 00 00 ff 14 25 b0 18 81 81 48
> [  108.345208] RIP  [<ffffffff81036bf0>] ptdump_show+0xc6/0x2f6
> [  108.345211]  RSP <ffff8800868f5dd8>
> [  108.345212] CR2: ffff9d5555555000
> [  108.345214] ---[ end trace 9134d308b82bc832 ]---
>
>
> The 32-bit hits this too, with the 15555d60 address.
>
> Does the xenthrash hit this file too? Do you know what file it touches when
> this happens?
I think you''re seeing the same address because many bogus m2p lookups
return 0x55555555.  I don''t think there''s any more similarity
between
what you''re seeing and Christopher''s report than that.

    J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2011-Jan-31 22:22 UTC

head link

Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request

On 01/31/2011 01:07 PM, Christopher S. Aker wrote:>> Xen: 3.4.4-rc1-pre 64bit (xenbits @ 19986)
> > Dom0: 2.6.32.27-1 PAE (xen/stable-2.6.32.x)
>>
>> We''ve been running our xen-thrash testsuite on a bunch of
hosts
>> against a very recent build, and we''ve just hit this on one
box:
>>
>> BUG: unable to handle kernel paging request at 15555d60
>
> Two additional boxes out of my last test round have also hit this.
> About one a week.
>
> Ian / Jeremy:  Where do I go from here?
There seems to be a moderately difficult-to-hit (but still pretty large)
race in pagetable teardown.  It *should* be protected by the pgd lock,
so we need to work out where a teardown (or access) is happening without
that lock.  I think that''s going to be a matter of close code-review
rather than any more testing.

The interesting thing is that this problem seems to have come to the
fore since the the patch that was explicitly intended to avoid it was
put in :/...  Before that, the race was theoretical, but AFAIK had never
been observed in a pvops kernel (though it was seen in the Citrix
product in non-pvops kernels, which is why we fixed it).

I''ll try to stare at it in the next couple of days.

    J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2011-Jan-31 22:25 UTC

head link

Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request

On 01/31/2011 01:07 PM, Christopher S. Aker wrote:>> Xen: 3.4.4-rc1-pre 64bit (xenbits @ 19986)
> > Dom0: 2.6.32.27-1 PAE (xen/stable-2.6.32.x)
>>
>> We''ve been running our xen-thrash testsuite on a bunch of
hosts
>> against a very recent build, and we''ve just hit this on one
box:
>>
>> BUG: unable to handle kernel paging request at 15555d60
>
> Two additional boxes out of my last test round have also hit this.
> About one a week.
>
> Ian / Jeremy:  Where do I go from here?
It''s also not impossible this bug is related to the
"get_user_pages" bug
that has been discussed over the last few days.  I need to think about
that too.

    J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christopher S. Aker

2011-Feb-14 23:52 UTC

head link

Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request

On 1/31/11 5:25 PM, Jeremy Fitzhardinge wrote:> On 01/31/2011 01:07 PM, Christopher S. Aker wrote:
>> Ian / Jeremy:  Where do I go from here?
>
> It''s also not impossible this bug is related to the
"get_user_pages" bug
> that has been discussed over the last few days.  I need to think about
> that too.
How''s that going?  Any epiphanies?  I''ve been trying to follow
the list
(and changesets) but wasn''t sure if a potential fix snuck in or if 
there''s something else I should be pressure cooking (an unrelated patch
that may have the side effect of fixing my issue, for example).

Thanks,
-Chris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2011-Feb-15 00:19 UTC

head link

Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request

On 02/14/2011 03:52 PM, Christopher S. Aker wrote:> On 1/31/11 5:25 PM, Jeremy Fitzhardinge wrote:
>> On 01/31/2011 01:07 PM, Christopher S. Aker wrote:
>>> Ian / Jeremy:  Where do I go from here?
>>
>> It''s also not impossible this bug is related to the
"get_user_pages" bug
>> that has been discussed over the last few days.  I need to think about
>> that too.
>
> How''s that going?  Any epiphanies?  I''ve been trying to
follow the
> list (and changesets) but wasn''t sure if a potential fix snuck in
or
> if there''s something else I should be pressure cooking (an
unrelated
> patch that may have the side effect of fixing my issue, for example).
No, I had a close look at it the other day and remained stumped.  It
looks like the pgd is being freed while still in use, but I couldn''t
see
where that could happen unprotected from the lock.  This is really
bugging me - there''s something strange going on, which worries me.

    J


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christopher S. Aker

2011-Feb-15 01:15 UTC

head link

Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request

On Feb 14, 2011, at 7:19 PM, Jeremy Fitzhardinge wrote:> No, I had a close look at it the other day and remained stumped.  It
> looks like the pgd is being freed while still in use, but I
couldn''t see
> where that could happen unprotected from the lock.  This is really
> bugging me - there''s something strange going on, which worries me.
Hmmm.  Well I doubt I can make any useful suggestions towards a solution at my
current kernel hacking skill level, but perhaps some verbose debugging output
sprinkled throughout may help?  I''d be happy to reset my test suite for
another round.

-Chris


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Dec 2010 - 2.6.32.27 dom0 - BUG: unable to handle kernel paging request

[Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request

[Xen-devel] Re: 2.6.32.27 dom0 - BUG: unable to handle kernel paging request

[Xen-devel] Re: 2.6.32.27 dom0 - BUG: unable to handle kernel paging request

Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request

Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request

Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request

Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request

Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request

Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request

Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request

Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request

Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request

Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request

Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request

Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request

Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request

Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request

Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request

Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request

Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request

Re: [Xen-devel] 2.6.32.27 dom0 - BUG: unable to handle kernel paging request