thr3ads.net - Xen users - [Xen-users] Soft lockup with kernel 2.6.24 [Jun 2008]

If this information is useful, please help other people find it:
Share via:

Jean-Michel Bonnefond

2008-Jun-24 09:03 UTC

[Xen-users] Soft lockup with kernel 2.6.24

Hello,

I''m facing a soft lockup kernel bug on a dom0 that result in freezing
the
server.
The soft lockup systematically appears between 10 to 20 hours after the
server reboot.

I''m using xen 3.2.0 with kernel 2.6.24-18-xen on an ubuntu server 8.04.
The underlying server is an HP Proliant DL385 G2, with a dual-core AMD
Opteron 2214 HE

Here is some example of the console logs I have when it happen :

[51694.282459] BUG: soft lockup - CPU#0 stuck for 11s! [nrpe:8707]
[51694.282469]
[51694.282470] Pid: 8707, comm: nrpe Not tainted (2.6.24-18-xen #1)
[51694.282472] EIP: 0061:[ipv6:_spin_lock+0xa/0x10] EFLAGS: 00200282 CPU: 0
[51694.282476] EIP is at _spin_lock+0xa/0x10
[51694.282477] EAX: c1b6286c EBX: 00000000 ECX: c1b62860 EDX: 00000598
[51694.282501] ESI: 17743067 EDI: 00000000 EBP: c0477158 ESP: e7fb1ddc
[51694.282503]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
[51694.282506] CR0: 80050033 CR2: b7cb3e70 CR3: 2c1ed000 CR4: 00000660
[51694.282508] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[51694.282511] DR6: ffff0ff0 DR7: 00000400
[51694.282512]  [__do_fault+0x3b8/0x6b0] __do_fault+0x3b8/0x6b0
[51694.282591]  [handle_mm_fault+0x249/0x1350] handle_mm_fault+0x249/0x1350
[51694.282629]  [sock_aio_read+0x120/0x130] sock_aio_read+0x120/0x130
[51694.282740]  [do_page_fault+0x366/0xe90] do_page_fault+0x366/0xe90
[51694.282755]  [__do_softirq+0x92/0x130] __do_softirq+0x92/0x130
[51694.282787]  [vfs_read+0x11c/0x170] vfs_read+0x11c/0x170
[51694.282802]  [sys_read+0x41/0x70] sys_read+0x41/0x70
[51694.282813]  [do_page_fault+0x0/0xe90] do_page_fault+0x0/0xe90
[51694.282819]  [error_code+0x35/0x40] error_code+0x35/0x40
[51694.282856]  ======================
[99821.430516] BUG: soft lockup - CPU#0 stuck for 11s! [kswapd0:237]
[99821.430527]
[99821.430531] Pid: 237, comm: kswapd0 Tainted: G      D (2.6.24-18-xen #1)
[99821.430534] EIP: 0061:[<c032767a>] EFLAGS: 00000286 CPU: 0
[99821.430563] EIP is at _spin_lock+0xa/0x10
[99821.430565] EAX: c1af8ecc EBX: 00000000 ECX: 24a76000 EDX: 00000000
[99821.430569] ESI: 1aa76067 EDI: c1af8ecc EBP: 000002c0 ESP: ed617dec
[99821.430572]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
[99821.430578] CR0: 8005003b CR2: b7fc5000 CR3: 2c44a000 CR4: 00000660
[99821.430582] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[99821.430585] DR6: ffff0ff0 DR7: 00000400
[99821.430588]  [<c017723b>] page_check_address+0x1cb/0x3c0
[99821.430644]  [<c0119858>] xen_invlpg_mask+0x38/0x40
[99821.430682]  [<c017749e>] page_referenced_one+0x6e/0x190
[99821.430732]  [<c01785cc>] page_referenced+0xec/0x130
[99821.430773]  [<c016713f>] shrink_active_list+0x18f/0x5c0
[99821.430941]  [<c01681dd>] shrink_zone+0xdd/0x100
[99821.430980]  [<c016883c>] kswapd+0x44c/0x490
[99821.431068]  [<c013bb90>] autoremove_wake_function+0x0/0x40
[99821.431105]  [<c011e260>] complete+0x40/0x60
[99821.431137]  [<c01683f0>] kswapd+0x0/0x490
[99821.431144]  [<c013b8d2>] kthread+0x42/0x70
[99821.431148]  [<c013b890>] kthread+0x0/0x70
[99821.431177]  [<c0105bb7>] kernel_thread_helper+0x7/0x10
[99821.431215]  ======================[99926.949316] BUG: soft lockup - CPU#1
stuck for 11s! [ps:8318]
[99926.949322]
[99926.949324] Pid: 8318, comm: ps Tainted: G      D (2.6.24-18-xen #1)
[99926.949327] EIP: 0061:[<c0327677>] EFLAGS: 00000286 CPU: 1
[99926.949331] EIP is at _spin_lock+0x7/0x10
[99926.949334] EAX: c1af8ecc EBX: 00000000 ECX: c1af8ec0 EDX: 00000248
[99926.949337] ESI: 1aa76067 EDI: 00000000 EBP: c0477158 ESP: e4b15ddc
[99926.949356]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
[99926.949360] CR0: 80050033 CR2: 080492e0 CR3: 24941000 CR4: 00000660
[99926.949364] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[99926.949368] DR6: ffff0ff0 DR7: 00000400
[99926.949370]  [<c016bd18>] __do_fault+0x3b8/0x6b0
[99926.949420]  [<c017885d>] anon_vma_prepare+0x1d/0xe0
[99926.949445]  [<c0170c69>] handle_mm_fault+0x249/0x1350
[99926.949510]  [<c0162456>] __pagevec_free+0x26/0x30
[99926.949561]  [<c0329216>] do_page_fault+0x366/0xe90
[99926.949583]  [<c01165fb>] check_pgt_cache+0x1b/0x20
[99926.949597]  [<c0173667>] unmap_region+0x107/0x120
[99926.949622]  [<c0174250>] do_munmap+0x180/0x1f0
[99926.949653]  [<c0328eb0>] do_page_fault+0x0/0xe90
[99926.949660]  [<c0327b55>] error_code+0x35/0x40
[99926.949702]  ======================

Do you have already heard of such a problem ?

Thanks,
Jean-Michel.


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Christopher Isip

2008-Jun-24 20:55 UTC

head link

Re: [Xen-users] Soft lockup with kernel 2.6.24

On Tue, Jun 24, 2008 at 5:03 AM, Jean-Michel Bonnefond <pompon2@gmail.com>
wrote:
> Hello,
>
> I''m facing a soft lockup kernel bug on a dom0 that result in
freezing the
> server.
> The soft lockup systematically appears between 10 to 20 hours after the
> server reboot.
>
> I''m using xen 3.2.0 with kernel 2.6.24-18-xen on an ubuntu server
8.04.
> The underlying server is an HP Proliant DL385 G2, with a dual-core AMD
> Opteron 2214 HE
>
> Here is some example of the console logs I have when it happen :
>
> [51694.282459] BUG: soft lockup - CPU#0 stuck for 11s! [nrpe:8707]
> [51694.282469]
> [51694.282470] Pid: 8707, comm: nrpe Not tainted (2.6.24-18-xen #1)
> [51694.282472] EIP: 0061:[ipv6:_spin_lock+0xa/0x10] EFLAGS: 00200282 CPU: 0
> [51694.282476] EIP is at _spin_lock+0xa/0x10
> [51694.282477] EAX: c1b6286c EBX: 00000000 ECX: c1b62860 EDX: 00000598
> [51694.282501] ESI: 17743067 EDI: 00000000 EBP: c0477158 ESP: e7fb1ddc
> [51694.282503]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
> [51694.282506] CR0: 80050033 CR2: b7cb3e70 CR3: 2c1ed000 CR4: 00000660
> [51694.282508] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [51694.282511] DR6: ffff0ff0 DR7: 00000400
> [51694.282512]  [__do_fault+0x3b8/0x6b0] __do_fault+0x3b8/0x6b0
> [51694.282591]  [handle_mm_fault+0x249/0x1350] handle_mm_fault+0x249/0x1350
> [51694.282629]  [sock_aio_read+0x120/0x130] sock_aio_read+0x120/0x130
> [51694.282740]  [do_page_fault+0x366/0xe90] do_page_fault+0x366/0xe90
> [51694.282755]  [__do_softirq+0x92/0x130] __do_softirq+0x92/0x130
> [51694.282787]  [vfs_read+0x11c/0x170] vfs_read+0x11c/0x170
> [51694.282802]  [sys_read+0x41/0x70] sys_read+0x41/0x70
> [51694.282813]  [do_page_fault+0x0/0xe90] do_page_fault+0x0/0xe90
> [51694.282819]  [error_code+0x35/0x40] error_code+0x35/0x40
> [51694.282856]  ======================>
> [99821.430516] BUG: soft lockup - CPU#0 stuck for 11s! [kswapd0:237]
> [99821.430527]
> [99821.430531] Pid: 237, comm: kswapd0 Tainted: G      D (2.6.24-18-xen #1)
> [99821.430534] EIP: 0061:[<c032767a>] EFLAGS: 00000286 CPU: 0
> [99821.430563] EIP is at _spin_lock+0xa/0x10
> [99821.430565] EAX: c1af8ecc EBX: 00000000 ECX: 24a76000 EDX: 00000000
> [99821.430569] ESI: 1aa76067 EDI: c1af8ecc EBP: 000002c0 ESP: ed617dec
> [99821.430572]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
> [99821.430578] CR0: 8005003b CR2: b7fc5000 CR3: 2c44a000 CR4: 00000660
> [99821.430582] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [99821.430585] DR6: ffff0ff0 DR7: 00000400
> [99821.430588]  [<c017723b>] page_check_address+0x1cb/0x3c0
> [99821.430644]  [<c0119858>] xen_invlpg_mask+0x38/0x40
> [99821.430682]  [<c017749e>] page_referenced_one+0x6e/0x190
> [99821.430732]  [<c01785cc>] page_referenced+0xec/0x130
> [99821.430773]  [<c016713f>] shrink_active_list+0x18f/0x5c0
> [99821.430941]  [<c01681dd>] shrink_zone+0xdd/0x100
> [99821.430980]  [<c016883c>] kswapd+0x44c/0x490
> [99821.431068]  [<c013bb90>] autoremove_wake_function+0x0/0x40
> [99821.431105]  [<c011e260>] complete+0x40/0x60
> [99821.431137]  [<c01683f0>] kswapd+0x0/0x490
> [99821.431144]  [<c013b8d2>] kthread+0x42/0x70
> [99821.431148]  [<c013b890>] kthread+0x0/0x70
> [99821.431177]  [<c0105bb7>] kernel_thread_helper+0x7/0x10
> [99821.431215]  ======================> [99926.949316] BUG: soft lockup
- CPU#1 stuck for 11s! [ps:8318]
> [99926.949322]
> [99926.949324] Pid: 8318, comm: ps Tainted: G      D (2.6.24-18-xen #1)
> [99926.949327] EIP: 0061:[<c0327677>] EFLAGS: 00000286 CPU: 1
> [99926.949331] EIP is at _spin_lock+0x7/0x10
> [99926.949334] EAX: c1af8ecc EBX: 00000000 ECX: c1af8ec0 EDX: 00000248
> [99926.949337] ESI: 1aa76067 EDI: 00000000 EBP: c0477158 ESP: e4b15ddc
> [99926.949356]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
> [99926.949360] CR0: 80050033 CR2: 080492e0 CR3: 24941000 CR4: 00000660
> [99926.949364] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [99926.949368] DR6: ffff0ff0 DR7: 00000400
> [99926.949370]  [<c016bd18>] __do_fault+0x3b8/0x6b0
> [99926.949420]  [<c017885d>] anon_vma_prepare+0x1d/0xe0
> [99926.949445]  [<c0170c69>] handle_mm_fault+0x249/0x1350
> [99926.949510]  [<c0162456>] __pagevec_free+0x26/0x30
> [99926.949561]  [<c0329216>] do_page_fault+0x366/0xe90
> [99926.949583]  [<c01165fb>] check_pgt_cache+0x1b/0x20
> [99926.949597]  [<c0173667>] unmap_region+0x107/0x120
> [99926.949622]  [<c0174250>] do_munmap+0x180/0x1f0
> [99926.949653]  [<c0328eb0>] do_page_fault+0x0/0xe90
> [99926.949660]  [<c0327b55>] error_code+0x35/0x40
> [99926.949702]  ======================>
>
> Do you have already heard of such a problem ?
>
> Thanks,
> Jean-Michel.
>
>
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xensource.com
> http://lists.xensource.com/xen-users
>
I saw this error in one of my Ubuntu 8.04 domUs the other day when the free
memory in dom0 approached a very low value.  I rebooted the system and it
seems to be fine.  But then again, I might see the problem again in a
week''s
time.  I also use nrpe.

Chris


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Jean-Michel Bonnefond

2008-Jun-25 07:54 UTC

head link

Re: [Xen-users] Soft lockup with kernel 2.6.24

Thanks Christopher.

I finally downgrade my dom0 to linux 2.6.18 wich is no more provided with
ubuntu 8.04 :-( , and it seems to be more stable.
I''ve googled some few people having the same problem but find no clues
or
answer. It seems to be a kernel bug linked to some specific hardware...

However if someone is interested I could reproduce the bug and provide more
informations.

Jean-Michel.


2008/6/24 Christopher Isip <cmisip@gmail.com>:
>
>
> On Tue, Jun 24, 2008 at 5:03 AM, Jean-Michel Bonnefond
<pompon2@gmail.com>
> wrote:
>
>> Hello,
>>
>> I''m facing a soft lockup kernel bug on a dom0 that result in
freezing the
>> server.
>> The soft lockup systematically appears between 10 to 20 hours after the
>> server reboot.
>>
>> I''m using xen 3.2.0 with kernel 2.6.24-18-xen on an ubuntu
server 8.04.
>> The underlying server is an HP Proliant DL385 G2, with a dual-core AMD
>> Opteron 2214 HE
>>
>> Here is some example of the console logs I have when it happen :
>>
>> [51694.282459] BUG: soft lockup - CPU#0 stuck for 11s! [nrpe:8707]
>> [51694.282469]
>> [51694.282470] Pid: 8707, comm: nrpe Not tainted (2.6.24-18-xen #1)
>> [51694.282472] EIP: 0061:[ipv6:_spin_lock+0xa/0x10] EFLAGS: 00200282
CPU:
>> 0
>> [51694.282476] EIP is at _spin_lock+0xa/0x10
>> [51694.282477] EAX: c1b6286c EBX: 00000000 ECX: c1b62860 EDX: 00000598
>> [51694.282501] ESI: 17743067 EDI: 00000000 EBP: c0477158 ESP: e7fb1ddc
>> [51694.282503]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
>> [51694.282506] CR0: 80050033 CR2: b7cb3e70 CR3: 2c1ed000 CR4: 00000660
>> [51694.282508] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
>> [51694.282511] DR6: ffff0ff0 DR7: 00000400
>> [51694.282512]  [__do_fault+0x3b8/0x6b0] __do_fault+0x3b8/0x6b0
>> [51694.282591]  [handle_mm_fault+0x249/0x1350]
>> handle_mm_fault+0x249/0x1350
>> [51694.282629]  [sock_aio_read+0x120/0x130] sock_aio_read+0x120/0x130
>> [51694.282740]  [do_page_fault+0x366/0xe90] do_page_fault+0x366/0xe90
>> [51694.282755]  [__do_softirq+0x92/0x130] __do_softirq+0x92/0x130
>> [51694.282787]  [vfs_read+0x11c/0x170] vfs_read+0x11c/0x170
>> [51694.282802]  [sys_read+0x41/0x70] sys_read+0x41/0x70
>> [51694.282813]  [do_page_fault+0x0/0xe90] do_page_fault+0x0/0xe90
>> [51694.282819]  [error_code+0x35/0x40] error_code+0x35/0x40
>> [51694.282856]  ======================>>
>> [99821.430516] BUG: soft lockup - CPU#0 stuck for 11s! [kswapd0:237]
>> [99821.430527]
>> [99821.430531] Pid: 237, comm: kswapd0 Tainted: G      D (2.6.24-18-xen
>> #1)
>> [99821.430534] EIP: 0061:[<c032767a>] EFLAGS: 00000286 CPU: 0
>> [99821.430563] EIP is at _spin_lock+0xa/0x10
>> [99821.430565] EAX: c1af8ecc EBX: 00000000 ECX: 24a76000 EDX: 00000000
>> [99821.430569] ESI: 1aa76067 EDI: c1af8ecc EBP: 000002c0 ESP: ed617dec
>> [99821.430572]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
>> [99821.430578] CR0: 8005003b CR2: b7fc5000 CR3: 2c44a000 CR4: 00000660
>> [99821.430582] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
>> [99821.430585] DR6: ffff0ff0 DR7: 00000400
>> [99821.430588]  [<c017723b>] page_check_address+0x1cb/0x3c0
>> [99821.430644]  [<c0119858>] xen_invlpg_mask+0x38/0x40
>> [99821.430682]  [<c017749e>] page_referenced_one+0x6e/0x190
>> [99821.430732]  [<c01785cc>] page_referenced+0xec/0x130
>> [99821.430773]  [<c016713f>] shrink_active_list+0x18f/0x5c0
>> [99821.430941]  [<c01681dd>] shrink_zone+0xdd/0x100
>> [99821.430980]  [<c016883c>] kswapd+0x44c/0x490
>> [99821.431068]  [<c013bb90>] autoremove_wake_function+0x0/0x40
>> [99821.431105]  [<c011e260>] complete+0x40/0x60
>> [99821.431137]  [<c01683f0>] kswapd+0x0/0x490
>> [99821.431144]  [<c013b8d2>] kthread+0x42/0x70
>> [99821.431148]  [<c013b890>] kthread+0x0/0x70
>> [99821.431177]  [<c0105bb7>] kernel_thread_helper+0x7/0x10
>> [99821.431215]  ======================>> [99926.949316] BUG: soft
lockup - CPU#1 stuck for 11s! [ps:8318]
>> [99926.949322]
>> [99926.949324] Pid: 8318, comm: ps Tainted: G      D (2.6.24-18-xen #1)
>> [99926.949327] EIP: 0061:[<c0327677>] EFLAGS: 00000286 CPU: 1
>> [99926.949331] EIP is at _spin_lock+0x7/0x10
>> [99926.949334] EAX: c1af8ecc EBX: 00000000 ECX: c1af8ec0 EDX: 00000248
>> [99926.949337] ESI: 1aa76067 EDI: 00000000 EBP: c0477158 ESP: e4b15ddc
>> [99926.949356]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
>> [99926.949360] CR0: 80050033 CR2: 080492e0 CR3: 24941000 CR4: 00000660
>> [99926.949364] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
>> [99926.949368] DR6: ffff0ff0 DR7: 00000400
>> [99926.949370]  [<c016bd18>] __do_fault+0x3b8/0x6b0
>> [99926.949420]  [<c017885d>] anon_vma_prepare+0x1d/0xe0
>> [99926.949445]  [<c0170c69>] handle_mm_fault+0x249/0x1350
>> [99926.949510]  [<c0162456>] __pagevec_free+0x26/0x30
>> [99926.949561]  [<c0329216>] do_page_fault+0x366/0xe90
>> [99926.949583]  [<c01165fb>] check_pgt_cache+0x1b/0x20
>> [99926.949597]  [<c0173667>] unmap_region+0x107/0x120
>> [99926.949622]  [<c0174250>] do_munmap+0x180/0x1f0
>> [99926.949653]  [<c0328eb0>] do_page_fault+0x0/0xe90
>> [99926.949660]  [<c0327b55>] error_code+0x35/0x40
>> [99926.949702]  ======================>>
>>
>> Do you have already heard of such a problem ?
>>
>> Thanks,
>> Jean-Michel.
>>
>>
>> _______________________________________________
>> Xen-users mailing list
>> Xen-users@lists.xensource.com
>> http://lists.xensource.com/xen-users
>>
>
> I saw this error in one of my Ubuntu 8.04 domUs the other day when the free
> memory in dom0 approached a very low value.  I rebooted the system and it
> seems to be fine.  But then again, I might see the problem again in a
week''s
> time.  I also use nrpe.
>
> Chris
>
>

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Xen users - Jun 2008 - Soft lockup with kernel 2.6.24

[Xen-users] Soft lockup with kernel 2.6.24

Re: [Xen-users] Soft lockup with kernel 2.6.24

Re: [Xen-users] Soft lockup with kernel 2.6.24