Hello, I''m facing a soft lockup kernel bug on a dom0 that result in freezing the server. The soft lockup systematically appears between 10 to 20 hours after the server reboot. I''m using xen 3.2.0 with kernel 2.6.24-18-xen on an ubuntu server 8.04. The underlying server is an HP Proliant DL385 G2, with a dual-core AMD Opteron 2214 HE Here is some example of the console logs I have when it happen : [51694.282459] BUG: soft lockup - CPU#0 stuck for 11s! [nrpe:8707] [51694.282469] [51694.282470] Pid: 8707, comm: nrpe Not tainted (2.6.24-18-xen #1) [51694.282472] EIP: 0061:[ipv6:_spin_lock+0xa/0x10] EFLAGS: 00200282 CPU: 0 [51694.282476] EIP is at _spin_lock+0xa/0x10 [51694.282477] EAX: c1b6286c EBX: 00000000 ECX: c1b62860 EDX: 00000598 [51694.282501] ESI: 17743067 EDI: 00000000 EBP: c0477158 ESP: e7fb1ddc [51694.282503] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069 [51694.282506] CR0: 80050033 CR2: b7cb3e70 CR3: 2c1ed000 CR4: 00000660 [51694.282508] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [51694.282511] DR6: ffff0ff0 DR7: 00000400 [51694.282512] [__do_fault+0x3b8/0x6b0] __do_fault+0x3b8/0x6b0 [51694.282591] [handle_mm_fault+0x249/0x1350] handle_mm_fault+0x249/0x1350 [51694.282629] [sock_aio_read+0x120/0x130] sock_aio_read+0x120/0x130 [51694.282740] [do_page_fault+0x366/0xe90] do_page_fault+0x366/0xe90 [51694.282755] [__do_softirq+0x92/0x130] __do_softirq+0x92/0x130 [51694.282787] [vfs_read+0x11c/0x170] vfs_read+0x11c/0x170 [51694.282802] [sys_read+0x41/0x70] sys_read+0x41/0x70 [51694.282813] [do_page_fault+0x0/0xe90] do_page_fault+0x0/0xe90 [51694.282819] [error_code+0x35/0x40] error_code+0x35/0x40 [51694.282856] ====================== [99821.430516] BUG: soft lockup - CPU#0 stuck for 11s! [kswapd0:237] [99821.430527] [99821.430531] Pid: 237, comm: kswapd0 Tainted: G D (2.6.24-18-xen #1) [99821.430534] EIP: 0061:[<c032767a>] EFLAGS: 00000286 CPU: 0 [99821.430563] EIP is at _spin_lock+0xa/0x10 [99821.430565] EAX: c1af8ecc EBX: 00000000 ECX: 24a76000 EDX: 00000000 [99821.430569] ESI: 1aa76067 EDI: c1af8ecc EBP: 000002c0 ESP: ed617dec [99821.430572] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069 [99821.430578] CR0: 8005003b CR2: b7fc5000 CR3: 2c44a000 CR4: 00000660 [99821.430582] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [99821.430585] DR6: ffff0ff0 DR7: 00000400 [99821.430588] [<c017723b>] page_check_address+0x1cb/0x3c0 [99821.430644] [<c0119858>] xen_invlpg_mask+0x38/0x40 [99821.430682] [<c017749e>] page_referenced_one+0x6e/0x190 [99821.430732] [<c01785cc>] page_referenced+0xec/0x130 [99821.430773] [<c016713f>] shrink_active_list+0x18f/0x5c0 [99821.430941] [<c01681dd>] shrink_zone+0xdd/0x100 [99821.430980] [<c016883c>] kswapd+0x44c/0x490 [99821.431068] [<c013bb90>] autoremove_wake_function+0x0/0x40 [99821.431105] [<c011e260>] complete+0x40/0x60 [99821.431137] [<c01683f0>] kswapd+0x0/0x490 [99821.431144] [<c013b8d2>] kthread+0x42/0x70 [99821.431148] [<c013b890>] kthread+0x0/0x70 [99821.431177] [<c0105bb7>] kernel_thread_helper+0x7/0x10 [99821.431215] ======================[99926.949316] BUG: soft lockup - CPU#1 stuck for 11s! [ps:8318] [99926.949322] [99926.949324] Pid: 8318, comm: ps Tainted: G D (2.6.24-18-xen #1) [99926.949327] EIP: 0061:[<c0327677>] EFLAGS: 00000286 CPU: 1 [99926.949331] EIP is at _spin_lock+0x7/0x10 [99926.949334] EAX: c1af8ecc EBX: 00000000 ECX: c1af8ec0 EDX: 00000248 [99926.949337] ESI: 1aa76067 EDI: 00000000 EBP: c0477158 ESP: e4b15ddc [99926.949356] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069 [99926.949360] CR0: 80050033 CR2: 080492e0 CR3: 24941000 CR4: 00000660 [99926.949364] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [99926.949368] DR6: ffff0ff0 DR7: 00000400 [99926.949370] [<c016bd18>] __do_fault+0x3b8/0x6b0 [99926.949420] [<c017885d>] anon_vma_prepare+0x1d/0xe0 [99926.949445] [<c0170c69>] handle_mm_fault+0x249/0x1350 [99926.949510] [<c0162456>] __pagevec_free+0x26/0x30 [99926.949561] [<c0329216>] do_page_fault+0x366/0xe90 [99926.949583] [<c01165fb>] check_pgt_cache+0x1b/0x20 [99926.949597] [<c0173667>] unmap_region+0x107/0x120 [99926.949622] [<c0174250>] do_munmap+0x180/0x1f0 [99926.949653] [<c0328eb0>] do_page_fault+0x0/0xe90 [99926.949660] [<c0327b55>] error_code+0x35/0x40 [99926.949702] ====================== Do you have already heard of such a problem ? Thanks, Jean-Michel. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Tue, Jun 24, 2008 at 5:03 AM, Jean-Michel Bonnefond <pompon2@gmail.com> wrote:> Hello, > > I''m facing a soft lockup kernel bug on a dom0 that result in freezing the > server. > The soft lockup systematically appears between 10 to 20 hours after the > server reboot. > > I''m using xen 3.2.0 with kernel 2.6.24-18-xen on an ubuntu server 8.04. > The underlying server is an HP Proliant DL385 G2, with a dual-core AMD > Opteron 2214 HE > > Here is some example of the console logs I have when it happen : > > [51694.282459] BUG: soft lockup - CPU#0 stuck for 11s! [nrpe:8707] > [51694.282469] > [51694.282470] Pid: 8707, comm: nrpe Not tainted (2.6.24-18-xen #1) > [51694.282472] EIP: 0061:[ipv6:_spin_lock+0xa/0x10] EFLAGS: 00200282 CPU: 0 > [51694.282476] EIP is at _spin_lock+0xa/0x10 > [51694.282477] EAX: c1b6286c EBX: 00000000 ECX: c1b62860 EDX: 00000598 > [51694.282501] ESI: 17743067 EDI: 00000000 EBP: c0477158 ESP: e7fb1ddc > [51694.282503] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069 > [51694.282506] CR0: 80050033 CR2: b7cb3e70 CR3: 2c1ed000 CR4: 00000660 > [51694.282508] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 > [51694.282511] DR6: ffff0ff0 DR7: 00000400 > [51694.282512] [__do_fault+0x3b8/0x6b0] __do_fault+0x3b8/0x6b0 > [51694.282591] [handle_mm_fault+0x249/0x1350] handle_mm_fault+0x249/0x1350 > [51694.282629] [sock_aio_read+0x120/0x130] sock_aio_read+0x120/0x130 > [51694.282740] [do_page_fault+0x366/0xe90] do_page_fault+0x366/0xe90 > [51694.282755] [__do_softirq+0x92/0x130] __do_softirq+0x92/0x130 > [51694.282787] [vfs_read+0x11c/0x170] vfs_read+0x11c/0x170 > [51694.282802] [sys_read+0x41/0x70] sys_read+0x41/0x70 > [51694.282813] [do_page_fault+0x0/0xe90] do_page_fault+0x0/0xe90 > [51694.282819] [error_code+0x35/0x40] error_code+0x35/0x40 > [51694.282856] ======================> > [99821.430516] BUG: soft lockup - CPU#0 stuck for 11s! [kswapd0:237] > [99821.430527] > [99821.430531] Pid: 237, comm: kswapd0 Tainted: G D (2.6.24-18-xen #1) > [99821.430534] EIP: 0061:[<c032767a>] EFLAGS: 00000286 CPU: 0 > [99821.430563] EIP is at _spin_lock+0xa/0x10 > [99821.430565] EAX: c1af8ecc EBX: 00000000 ECX: 24a76000 EDX: 00000000 > [99821.430569] ESI: 1aa76067 EDI: c1af8ecc EBP: 000002c0 ESP: ed617dec > [99821.430572] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069 > [99821.430578] CR0: 8005003b CR2: b7fc5000 CR3: 2c44a000 CR4: 00000660 > [99821.430582] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 > [99821.430585] DR6: ffff0ff0 DR7: 00000400 > [99821.430588] [<c017723b>] page_check_address+0x1cb/0x3c0 > [99821.430644] [<c0119858>] xen_invlpg_mask+0x38/0x40 > [99821.430682] [<c017749e>] page_referenced_one+0x6e/0x190 > [99821.430732] [<c01785cc>] page_referenced+0xec/0x130 > [99821.430773] [<c016713f>] shrink_active_list+0x18f/0x5c0 > [99821.430941] [<c01681dd>] shrink_zone+0xdd/0x100 > [99821.430980] [<c016883c>] kswapd+0x44c/0x490 > [99821.431068] [<c013bb90>] autoremove_wake_function+0x0/0x40 > [99821.431105] [<c011e260>] complete+0x40/0x60 > [99821.431137] [<c01683f0>] kswapd+0x0/0x490 > [99821.431144] [<c013b8d2>] kthread+0x42/0x70 > [99821.431148] [<c013b890>] kthread+0x0/0x70 > [99821.431177] [<c0105bb7>] kernel_thread_helper+0x7/0x10 > [99821.431215] ======================> [99926.949316] BUG: soft lockup - CPU#1 stuck for 11s! [ps:8318] > [99926.949322] > [99926.949324] Pid: 8318, comm: ps Tainted: G D (2.6.24-18-xen #1) > [99926.949327] EIP: 0061:[<c0327677>] EFLAGS: 00000286 CPU: 1 > [99926.949331] EIP is at _spin_lock+0x7/0x10 > [99926.949334] EAX: c1af8ecc EBX: 00000000 ECX: c1af8ec0 EDX: 00000248 > [99926.949337] ESI: 1aa76067 EDI: 00000000 EBP: c0477158 ESP: e4b15ddc > [99926.949356] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069 > [99926.949360] CR0: 80050033 CR2: 080492e0 CR3: 24941000 CR4: 00000660 > [99926.949364] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 > [99926.949368] DR6: ffff0ff0 DR7: 00000400 > [99926.949370] [<c016bd18>] __do_fault+0x3b8/0x6b0 > [99926.949420] [<c017885d>] anon_vma_prepare+0x1d/0xe0 > [99926.949445] [<c0170c69>] handle_mm_fault+0x249/0x1350 > [99926.949510] [<c0162456>] __pagevec_free+0x26/0x30 > [99926.949561] [<c0329216>] do_page_fault+0x366/0xe90 > [99926.949583] [<c01165fb>] check_pgt_cache+0x1b/0x20 > [99926.949597] [<c0173667>] unmap_region+0x107/0x120 > [99926.949622] [<c0174250>] do_munmap+0x180/0x1f0 > [99926.949653] [<c0328eb0>] do_page_fault+0x0/0xe90 > [99926.949660] [<c0327b55>] error_code+0x35/0x40 > [99926.949702] ======================> > > Do you have already heard of such a problem ? > > Thanks, > Jean-Michel. > > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users >I saw this error in one of my Ubuntu 8.04 domUs the other day when the free memory in dom0 approached a very low value. I rebooted the system and it seems to be fine. But then again, I might see the problem again in a week''s time. I also use nrpe. Chris _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Jean-Michel Bonnefond
2008-Jun-25 07:54 UTC
Re: [Xen-users] Soft lockup with kernel 2.6.24
Thanks Christopher. I finally downgrade my dom0 to linux 2.6.18 wich is no more provided with ubuntu 8.04 :-( , and it seems to be more stable. I''ve googled some few people having the same problem but find no clues or answer. It seems to be a kernel bug linked to some specific hardware... However if someone is interested I could reproduce the bug and provide more informations. Jean-Michel. 2008/6/24 Christopher Isip <cmisip@gmail.com>:> > > On Tue, Jun 24, 2008 at 5:03 AM, Jean-Michel Bonnefond <pompon2@gmail.com> > wrote: > >> Hello, >> >> I''m facing a soft lockup kernel bug on a dom0 that result in freezing the >> server. >> The soft lockup systematically appears between 10 to 20 hours after the >> server reboot. >> >> I''m using xen 3.2.0 with kernel 2.6.24-18-xen on an ubuntu server 8.04. >> The underlying server is an HP Proliant DL385 G2, with a dual-core AMD >> Opteron 2214 HE >> >> Here is some example of the console logs I have when it happen : >> >> [51694.282459] BUG: soft lockup - CPU#0 stuck for 11s! [nrpe:8707] >> [51694.282469] >> [51694.282470] Pid: 8707, comm: nrpe Not tainted (2.6.24-18-xen #1) >> [51694.282472] EIP: 0061:[ipv6:_spin_lock+0xa/0x10] EFLAGS: 00200282 CPU: >> 0 >> [51694.282476] EIP is at _spin_lock+0xa/0x10 >> [51694.282477] EAX: c1b6286c EBX: 00000000 ECX: c1b62860 EDX: 00000598 >> [51694.282501] ESI: 17743067 EDI: 00000000 EBP: c0477158 ESP: e7fb1ddc >> [51694.282503] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069 >> [51694.282506] CR0: 80050033 CR2: b7cb3e70 CR3: 2c1ed000 CR4: 00000660 >> [51694.282508] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 >> [51694.282511] DR6: ffff0ff0 DR7: 00000400 >> [51694.282512] [__do_fault+0x3b8/0x6b0] __do_fault+0x3b8/0x6b0 >> [51694.282591] [handle_mm_fault+0x249/0x1350] >> handle_mm_fault+0x249/0x1350 >> [51694.282629] [sock_aio_read+0x120/0x130] sock_aio_read+0x120/0x130 >> [51694.282740] [do_page_fault+0x366/0xe90] do_page_fault+0x366/0xe90 >> [51694.282755] [__do_softirq+0x92/0x130] __do_softirq+0x92/0x130 >> [51694.282787] [vfs_read+0x11c/0x170] vfs_read+0x11c/0x170 >> [51694.282802] [sys_read+0x41/0x70] sys_read+0x41/0x70 >> [51694.282813] [do_page_fault+0x0/0xe90] do_page_fault+0x0/0xe90 >> [51694.282819] [error_code+0x35/0x40] error_code+0x35/0x40 >> [51694.282856] ======================>> >> [99821.430516] BUG: soft lockup - CPU#0 stuck for 11s! [kswapd0:237] >> [99821.430527] >> [99821.430531] Pid: 237, comm: kswapd0 Tainted: G D (2.6.24-18-xen >> #1) >> [99821.430534] EIP: 0061:[<c032767a>] EFLAGS: 00000286 CPU: 0 >> [99821.430563] EIP is at _spin_lock+0xa/0x10 >> [99821.430565] EAX: c1af8ecc EBX: 00000000 ECX: 24a76000 EDX: 00000000 >> [99821.430569] ESI: 1aa76067 EDI: c1af8ecc EBP: 000002c0 ESP: ed617dec >> [99821.430572] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069 >> [99821.430578] CR0: 8005003b CR2: b7fc5000 CR3: 2c44a000 CR4: 00000660 >> [99821.430582] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 >> [99821.430585] DR6: ffff0ff0 DR7: 00000400 >> [99821.430588] [<c017723b>] page_check_address+0x1cb/0x3c0 >> [99821.430644] [<c0119858>] xen_invlpg_mask+0x38/0x40 >> [99821.430682] [<c017749e>] page_referenced_one+0x6e/0x190 >> [99821.430732] [<c01785cc>] page_referenced+0xec/0x130 >> [99821.430773] [<c016713f>] shrink_active_list+0x18f/0x5c0 >> [99821.430941] [<c01681dd>] shrink_zone+0xdd/0x100 >> [99821.430980] [<c016883c>] kswapd+0x44c/0x490 >> [99821.431068] [<c013bb90>] autoremove_wake_function+0x0/0x40 >> [99821.431105] [<c011e260>] complete+0x40/0x60 >> [99821.431137] [<c01683f0>] kswapd+0x0/0x490 >> [99821.431144] [<c013b8d2>] kthread+0x42/0x70 >> [99821.431148] [<c013b890>] kthread+0x0/0x70 >> [99821.431177] [<c0105bb7>] kernel_thread_helper+0x7/0x10 >> [99821.431215] ======================>> [99926.949316] BUG: soft lockup - CPU#1 stuck for 11s! [ps:8318] >> [99926.949322] >> [99926.949324] Pid: 8318, comm: ps Tainted: G D (2.6.24-18-xen #1) >> [99926.949327] EIP: 0061:[<c0327677>] EFLAGS: 00000286 CPU: 1 >> [99926.949331] EIP is at _spin_lock+0x7/0x10 >> [99926.949334] EAX: c1af8ecc EBX: 00000000 ECX: c1af8ec0 EDX: 00000248 >> [99926.949337] ESI: 1aa76067 EDI: 00000000 EBP: c0477158 ESP: e4b15ddc >> [99926.949356] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069 >> [99926.949360] CR0: 80050033 CR2: 080492e0 CR3: 24941000 CR4: 00000660 >> [99926.949364] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 >> [99926.949368] DR6: ffff0ff0 DR7: 00000400 >> [99926.949370] [<c016bd18>] __do_fault+0x3b8/0x6b0 >> [99926.949420] [<c017885d>] anon_vma_prepare+0x1d/0xe0 >> [99926.949445] [<c0170c69>] handle_mm_fault+0x249/0x1350 >> [99926.949510] [<c0162456>] __pagevec_free+0x26/0x30 >> [99926.949561] [<c0329216>] do_page_fault+0x366/0xe90 >> [99926.949583] [<c01165fb>] check_pgt_cache+0x1b/0x20 >> [99926.949597] [<c0173667>] unmap_region+0x107/0x120 >> [99926.949622] [<c0174250>] do_munmap+0x180/0x1f0 >> [99926.949653] [<c0328eb0>] do_page_fault+0x0/0xe90 >> [99926.949660] [<c0327b55>] error_code+0x35/0x40 >> [99926.949702] ======================>> >> >> Do you have already heard of such a problem ? >> >> Thanks, >> Jean-Michel. >> >> >> _______________________________________________ >> Xen-users mailing list >> Xen-users@lists.xensource.com >> http://lists.xensource.com/xen-users >> > > I saw this error in one of my Ubuntu 8.04 domUs the other day when the free > memory in dom0 approached a very low value. I rebooted the system and it > seems to be fine. But then again, I might see the problem again in a week''s > time. I also use nrpe. > > Chris > >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users