Pasi Kärkkäinen
2010-Mar-05 12:47 UTC
[Xen-devel] Debian Lenny 2.6.26-2-xen-686 crashing as multi-vcpu domU, stack trace
Hello, I have a PV domU running Debian Lenny 2.6.26-2-xen-686. It has 2 vcpus, 2GB of memory, and it crashes every 14-30 days.. Basicly the guest get stuck somehow, and starts to consume all the CPU time it can get. "xm console <guest>" doesn''t allow me to do anything (I can''t login, but I can see the messages on the console - no errors there), and it doesn''t respond from the network either. It''s not totally crashed either, it''s just stuck in some loop. no errors or anything special in "xm log". Any ideas? I tried running xenctx on it a couple of times from dom0: eip: c0105c0f jiffies_to_st+0x17 esp: dd747dcc eax: 91dc1272 ebx: 2e88c443 ecx: 03020006 edx: 91dc1272 esi: 91dc1272 edi: dd747dec ebp: c0378184 cs: 00000061 ds: 0000007b fs: 000000d8 gs: 00000033 Stack: 00000003 00000103 dd747dec c023d164 00000000 00000001 00000000 00000000 00000006 c123872c 00000022 c023f968 00002221 c0378184 00002221 00000000 00000100 c02cbedf ed4026c0 00000000 c0106444 0000007b c011007b 000000d8 fffffef0 c0101227 00000061 00000246 c023cdd9 00000fd0 c1234020 00000000 Code: 89 c6 53 8b 1d 80 81 37 c0 0f ae e8 66 90 f6 c3 01 74 04 f3 90 <eb> ec a1 40 81 37 c0 89 f1 29 c1 Call Trace: [<c0105c0f>] jiffies_to_st+0x17 <-- [<c023d164>] xen_poll_irq+0x41 [<c023f968>] xen_spin_wait+0xcc [<c02cbedf>] _spin_lock+0x31 [<c0106444>] timer_interrupt+0x37 [<c011007b>] __change_page_attr_set_clr+0x4cd [<c0101227>] hypercall_page+0x227 [<c023cdd9>] force_evtchn_callback+0xa [<c0122648>] current_fs_time+0x13 [<c0184c61>] mnt_drop_write+0x1b [<c01502e7>] generic_file_aio_read+0x49a [<c0149901>] handle_IRQ_event+0x36 [<c014aa15>] handle_level_irq+0x90 [<c0105b00>] do_IRQ+0x4d [<c023d8c7>] evtchn_do_upcall+0xfa [<c010412c>] hypervisor_callback+0x3c [<c013326b>] current_kernel_time+0xb [<c012263d>] current_fs_time+0x8 [<c0223b1f>] tty_write+0x191 [<c0225b5e>] n_tty_open+0x88 [<c022398e>] send_break+0x5e [<c017060c>] vfs_write+0x83 [<c0170bde>] sys_write+0x3c [<c0103f76>] syscall_call+0x7 running it again: eip: c0105c06 jiffies_to_st+0xe esp: dd747dcc eax: 91dc1272 ebx: 2e88c443 ecx: 03020006 edx: 91dc1272 esi: 91dc1272 edi: dd747dec ebp: c0378184 cs: 00000061 ds: 0000007b fs: 000000d8 gs: 00000033 Stack: 00000003 00000103 dd747dec c023d164 00000000 00000001 00000000 00000000 00000006 c123872c 00000022 c023f968 00002221 c0378184 00002221 00000000 00000100 c02cbedf ed4026c0 00000000 c0106444 0000007b c011007b 000000d8 fffffef0 c0101227 00000061 00000246 c023cdd9 00000fd0 c1234020 00000000 Code: f1 89 43 14 5b c3 c3 57 56 89 c6 53 8b 1d 80 81 37 c0 0f ae e8 <66> 90 f6 c3 01 74 04 f3 90 eb ec Call Trace: [<c0105c06>] jiffies_to_st+0xe <-- [<c023d164>] xen_poll_irq+0x41 [<c023f968>] xen_spin_wait+0xcc [<c02cbedf>] _spin_lock+0x31 [<c0106444>] timer_interrupt+0x37 [<c011007b>] __change_page_attr_set_clr+0x4cd [<c0101227>] hypercall_page+0x227 [<c023cdd9>] force_evtchn_callback+0xa [<c0122648>] current_fs_time+0x13 [<c0184c61>] mnt_drop_write+0x1b [<c01502e7>] generic_file_aio_read+0x49a [<c0149901>] handle_IRQ_event+0x36 [<c014aa15>] handle_level_irq+0x90 [<c0105b00>] do_IRQ+0x4d [<c023d8c7>] evtchn_do_upcall+0xfa [<c010412c>] hypervisor_callback+0x3c [<c013326b>] current_kernel_time+0xb [<c012263d>] current_fs_time+0x8 [<c0223b1f>] tty_write+0x191 [<c0225b5e>] n_tty_open+0x88 [<c022398e>] send_break+0x5e [<c017060c>] vfs_write+0x83 [<c0170bde>] sys_write+0x3c [<c0103f76>] syscall_call+0x7 -- Pasi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-Mar-05 13:00 UTC
Re: [Xen-devel] Debian Lenny 2.6.26-2-xen-686 crashing as multi-vcpu domU, stack trace
>>> Pasi Kärkkäinen<pasik@iki.fi> 05.03.10 13:47 >>> >Basicly the guest get stuck somehow, and starts to consume all the CPU time it can get.Would seem like you posted the xenctx output only for one of the two vCPU-s (which isn''t able to acquire xtime_lock as it seems) - the question is what the other vCPU is doing then. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Pasi Kärkkäinen
2010-Mar-05 13:12 UTC
Re: [Xen-devel] Debian Lenny 2.6.26-2-xen-686 crashing as multi-vcpu domU, stack trace
On Fri, Mar 05, 2010 at 01:00:39PM +0000, Jan Beulich wrote:> >>> Pasi Kärkkäinen<pasik@iki.fi> 05.03.10 13:47 >>> > >Basicly the guest get stuck somehow, and starts to consume all the CPU time it can get. > > Would seem like you posted the xenctx output only for one of the two > vCPU-s (which isn''t able to acquire xtime_lock as it seems) - the > question is what the other vCPU is doing then. >Damnit, I didn''t realize that.. and now the domU is destroyed/rebooted already.. -- Pasi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Pasi Kärkkäinen
2010-Mar-07 15:52 UTC
Re: [Xen-devel] Debian Lenny 2.6.26-2-xen-686 crashing as multi-vcpu domU, stack trace
On Fri, Mar 05, 2010 at 01:00:39PM +0000, Jan Beulich wrote:> >>> Pasi Kärkkäinen<pasik@iki.fi> 05.03.10 13:47 >>> > >Basicly the guest get stuck somehow, and starts to consume all the CPU time it can get. > > Would seem like you posted the xenctx output only for one of the two > vCPU-s (which isn''t able to acquire xtime_lock as it seems) - the > question is what the other vCPU is doing then. >Ok, another guest crashed, so here are the xenctx outputs for both vcpus: vcpu 0: eip: c0105c0f jiffies_to_st+0x17 esp: ec8d3cd4 eax: bc58158b ebx: 15e6990b ecx: 03020006 edx: bc58158b esi: bc58158b edi: ec8d3cf4 ebp: c0378184 cs: 00000061 ds: 0000007b fs: 000000d8 gs: 00000033 Stack: 00000003 00000103 ec8d3cf4 c023d00c 00000000 00000001 00000000 00000000 00000006 c149d72c 00000086 c023f80f 00008685 c0378184 00008685 00000000 00000100 c02cbd87 ed4026c0 00000000 c0106444 095ff7f8 0354f7fa 00000000 001200fa 007a0001 00000000 00000000 00000000 00000000 c1499020 00000000 Code: 89 c6 53 8b 1d 80 81 37 c0 0f ae e8 66 90 f6 c3 01 74 04 f3 90 <eb> ec a1 40 81 37 c0 89 f1 29 c1 Call Trace: [<c0105c0f>] jiffies_to_st+0x17 <-- [<c023d00c>] startup_pirq+0x46 [<c023f80f>] uuid_show+0x4d [<c02cbd87>] rwsem_down_read_failed+0x23 [<c0106444>] timer_interrupt+0x37 [<c026c4a8>] register_netdevice_notifier+0xeb [<c0149949>] handle_bad_irq+0x10 [<c014aa5d>] handle_level_irq+0xd8 [<c0105b00>] do_IRQ+0x4d [<c023d76f>] xen_clear_irq_pending+0x3 [<c010412c>] hypervisor_callback+0x3c [<c01332b2>] timekeeping_suspend+0x1c [<c012269d>] sys_stime+0x8 [<c0181d5d>] igrab+0x1d [<c015032d>] generic_file_aio_read+0x4e0 [<c016fe8e>] do_sync_write+0xb3 [<c012ec98>] wake_bit_function+0x23 [<c0265ded>] skb_copy_and_csum_bits+0x22f [<c026966a>] netdev_create_hash+0x20 [<c01b945b>] security_bprm_post_apply_creds+0x1 [<c016fdcf>] do_sync_readv_writev+0xed [<c017061e>] vfs_write+0x95 [<c0170a6f>] sys_lseek+0x15 [<c0103f76>] syscall_call+0x7 vcpu 1: eip: c0105c0f jiffies_to_st+0x17 esp: ed447d60 eax: bc58158b ebx: 15e6990b ecx: 03020009 edx: bc58158b esi: bc58158b edi: ed447d80 ebp: c14a4b40 cs: 00000061 ds: 0000007b fs: 000000d8 gs: 00000000 Stack: 00000003 00000107 ed447d80 c023d00c 00000000 00000001 00000000 00000000 00000009 c14a472c 000000fb c023f80f 0000fbfa c14a4b40 0000fbfa ed447dcc ed4a15e0 c02cbd87 c03bfb40 c14a4b40 c01175ac ed4a15e0 ed42bb84 00000000 00000000 c01177c6 00000003 00000001 ed4c5fbc ed42bb84 00000001 00000000 Code: 89 c6 53 8b 1d 80 81 37 c0 0f ae e8 66 90 f6 c3 01 74 04 f3 90 <eb> ec a1 40 81 37 c0 89 f1 29 c1 Call Trace: [<c0105c0f>] jiffies_to_st+0x17 <-- [<c023d00c>] startup_pirq+0x46 [<c023f80f>] uuid_show+0x4d [<c02cbd87>] rwsem_down_read_failed+0x23 [<c01175ac>] sched_move_task+0x71 [<c01177c6>] try_to_wake_up+0xbc [<c012eca5>] wake_bit_function+0x30 [<c0114b84>] sys_sched_get_priority_max+0x16 [<c011684b>] print_cfs_rq+0x84 [<c012c27d>] flush_cpu_workqueue+0xc [<c012c5c5>] queue_work+0x28 [<c012c620>] __cancel_work_timer+0x3 [<c0106644>] timer_interrupt+0x237 [<c011684b>] print_cfs_rq+0x84 [<c0105f74>] get_nsec_offset+0xe [<c0149949>] handle_bad_irq+0x10 [<c014aa5d>] handle_level_irq+0xd8 [<c0105b00>] do_IRQ+0x4d [<c023d76f>] xen_clear_irq_pending+0x3 [<c010412c>] hypervisor_callback+0x3c [<c01013a7>] hypercall_page+0x3a7 [<c0105f52>] xen_safe_halt+0x9f [<c01028ab>] xen_idle+0x1b [<c0102810>] cpu_idle+0xa8 -- Pasi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-Mar-08 08:05 UTC
Re: [Xen-devel] Debian Lenny 2.6.26-2-xen-686 crashing as multi-vcpu domU, stack trace
>>> Pasi Kärkkäinen<pasik@iki.fi> 07.03.10 16:52 >>> >Ok, another guest crashed, so here are the xenctx outputs for both vcpus:Which would require some cleaning up - afaict these are imprecise call traces, which are pretty hard to analyze without the corresponding binary. In any case both vCPU-s appear to try to acquire different spin locks, the chances are good that this simply is an ABBA deadlock. But it''s certainly also suspicious that *both* have handle_bad_irq() on their call stack. Jan>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>vcpu 0: eip: c0105c0f jiffies_to_st+0x17 esp: ec8d3cd4 eax: bc58158b ebx: 15e6990b ecx: 03020006 edx: bc58158b esi: bc58158b edi: ec8d3cf4 ebp: c0378184 cs: 00000061 ds: 0000007b fs: 000000d8 gs: 00000033 Stack: 00000003 00000103 ec8d3cf4 c023d00c 00000000 00000001 00000000 00000000 00000006 c149d72c 00000086 c023f80f 00008685 c0378184 00008685 00000000 00000100 c02cbd87 ed4026c0 00000000 c0106444 095ff7f8 0354f7fa 00000000 001200fa 007a0001 00000000 00000000 00000000 00000000 c1499020 00000000 Code: 89 c6 53 8b 1d 80 81 37 c0 0f ae e8 66 90 f6 c3 01 74 04 f3 90 <eb> ec a1 40 81 37 c0 89 f1 29 c1 Call Trace: [<c0105c0f>] jiffies_to_st+0x17 <-- [<c023d00c>] startup_pirq+0x46 [<c023f80f>] uuid_show+0x4d [<c02cbd87>] rwsem_down_read_failed+0x23 [<c0106444>] timer_interrupt+0x37 [<c026c4a8>] register_netdevice_notifier+0xeb [<c0149949>] handle_bad_irq+0x10 [<c014aa5d>] handle_level_irq+0xd8 [<c0105b00>] do_IRQ+0x4d [<c023d76f>] xen_clear_irq_pending+0x3 [<c010412c>] hypervisor_callback+0x3c [<c01332b2>] timekeeping_suspend+0x1c [<c012269d>] sys_stime+0x8 [<c0181d5d>] igrab+0x1d [<c015032d>] generic_file_aio_read+0x4e0 [<c016fe8e>] do_sync_write+0xb3 [<c012ec98>] wake_bit_function+0x23 [<c0265ded>] skb_copy_and_csum_bits+0x22f [<c026966a>] netdev_create_hash+0x20 [<c01b945b>] security_bprm_post_apply_creds+0x1 [<c016fdcf>] do_sync_readv_writev+0xed [<c017061e>] vfs_write+0x95 [<c0170a6f>] sys_lseek+0x15 [<c0103f76>] syscall_call+0x7 vcpu 1: eip: c0105c0f jiffies_to_st+0x17 esp: ed447d60 eax: bc58158b ebx: 15e6990b ecx: 03020009 edx: bc58158b esi: bc58158b edi: ed447d80 ebp: c14a4b40 cs: 00000061 ds: 0000007b fs: 000000d8 gs: 00000000 Stack: 00000003 00000107 ed447d80 c023d00c 00000000 00000001 00000000 00000000 00000009 c14a472c 000000fb c023f80f 0000fbfa c14a4b40 0000fbfa ed447dcc ed4a15e0 c02cbd87 c03bfb40 c14a4b40 c01175ac ed4a15e0 ed42bb84 00000000 00000000 c01177c6 00000003 00000001 ed4c5fbc ed42bb84 00000001 00000000 Code: 89 c6 53 8b 1d 80 81 37 c0 0f ae e8 66 90 f6 c3 01 74 04 f3 90 <eb> ec a1 40 81 37 c0 89 f1 29 c1 Call Trace: [<c0105c0f>] jiffies_to_st+0x17 <-- [<c023d00c>] startup_pirq+0x46 [<c023f80f>] uuid_show+0x4d [<c02cbd87>] rwsem_down_read_failed+0x23 [<c01175ac>] sched_move_task+0x71 [<c01177c6>] try_to_wake_up+0xbc [<c012eca5>] wake_bit_function+0x30 [<c0114b84>] sys_sched_get_priority_max+0x16 [<c011684b>] print_cfs_rq+0x84 [<c012c27d>] flush_cpu_workqueue+0xc [<c012c5c5>] queue_work+0x28 [<c012c620>] __cancel_work_timer+0x3 [<c0106644>] timer_interrupt+0x237 [<c011684b>] print_cfs_rq+0x84 [<c0105f74>] get_nsec_offset+0xe [<c0149949>] handle_bad_irq+0x10 [<c014aa5d>] handle_level_irq+0xd8 [<c0105b00>] do_IRQ+0x4d [<c023d76f>] xen_clear_irq_pending+0x3 [<c010412c>] hypervisor_callback+0x3c [<c01013a7>] hypercall_page+0x3a7 [<c0105f52>] xen_safe_halt+0x9f [<c01028ab>] xen_idle+0x1b [<c0102810>] cpu_idle+0xa8 -- Pasi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Pasi Kärkkäinen
2010-Mar-08 08:36 UTC
Re: [Xen-devel] Debian Lenny 2.6.26-2-xen-686 crashing as multi-vcpu domU, stack trace
On Mon, Mar 08, 2010 at 08:05:28AM +0000, Jan Beulich wrote:> >>> Pasi Kärkkäinen<pasik@iki.fi> 07.03.10 16:52 >>> > >Ok, another guest crashed, so here are the xenctx outputs for both vcpus: > > Which would require some cleaning up - afaict these are imprecise call > traces, which are pretty hard to analyze without the corresponding > binary. In any case both vCPU-s appear to try to acquire different > spin locks, the chances are good that this simply is an ABBA deadlock. > > But it''s certainly also suspicious that *both* have handle_bad_irq() > on their call stack. >I still have the guest up.. in the crashed state. Anything that I should try? -- Pasi> Jan > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > vcpu 0: > > eip: c0105c0f jiffies_to_st+0x17 > esp: ec8d3cd4 > eax: bc58158b ebx: 15e6990b ecx: 03020006 edx: bc58158b > esi: bc58158b edi: ec8d3cf4 ebp: c0378184 > cs: 00000061 ds: 0000007b fs: 000000d8 gs: 00000033 > > Stack: > 00000003 00000103 ec8d3cf4 c023d00c 00000000 00000001 00000000 00000000 > 00000006 c149d72c 00000086 c023f80f 00008685 c0378184 00008685 00000000 > 00000100 c02cbd87 ed4026c0 00000000 c0106444 095ff7f8 0354f7fa 00000000 > 001200fa 007a0001 00000000 00000000 00000000 00000000 c1499020 00000000 > > Code: > 89 c6 53 8b 1d 80 81 37 c0 0f ae e8 66 90 f6 c3 01 74 04 f3 90 <eb> ec a1 40 81 37 c0 89 f1 29 c1 > > Call Trace: > [<c0105c0f>] jiffies_to_st+0x17 <-- > [<c023d00c>] startup_pirq+0x46 > [<c023f80f>] uuid_show+0x4d > [<c02cbd87>] rwsem_down_read_failed+0x23 > [<c0106444>] timer_interrupt+0x37 > [<c026c4a8>] register_netdevice_notifier+0xeb > [<c0149949>] handle_bad_irq+0x10 > [<c014aa5d>] handle_level_irq+0xd8 > [<c0105b00>] do_IRQ+0x4d > [<c023d76f>] xen_clear_irq_pending+0x3 > [<c010412c>] hypervisor_callback+0x3c > [<c01332b2>] timekeeping_suspend+0x1c > [<c012269d>] sys_stime+0x8 > [<c0181d5d>] igrab+0x1d > [<c015032d>] generic_file_aio_read+0x4e0 > [<c016fe8e>] do_sync_write+0xb3 > [<c012ec98>] wake_bit_function+0x23 > [<c0265ded>] skb_copy_and_csum_bits+0x22f > [<c026966a>] netdev_create_hash+0x20 > [<c01b945b>] security_bprm_post_apply_creds+0x1 > [<c016fdcf>] do_sync_readv_writev+0xed > [<c017061e>] vfs_write+0x95 > [<c0170a6f>] sys_lseek+0x15 > [<c0103f76>] syscall_call+0x7 > > > vcpu 1: > > eip: c0105c0f jiffies_to_st+0x17 > esp: ed447d60 > eax: bc58158b ebx: 15e6990b ecx: 03020009 edx: bc58158b > esi: bc58158b edi: ed447d80 ebp: c14a4b40 > cs: 00000061 ds: 0000007b fs: 000000d8 gs: 00000000 > > Stack: > 00000003 00000107 ed447d80 c023d00c 00000000 00000001 00000000 00000000 > 00000009 c14a472c 000000fb c023f80f 0000fbfa c14a4b40 0000fbfa ed447dcc > ed4a15e0 c02cbd87 c03bfb40 c14a4b40 c01175ac ed4a15e0 ed42bb84 00000000 > 00000000 c01177c6 00000003 00000001 ed4c5fbc ed42bb84 00000001 00000000 > > Code: > 89 c6 53 8b 1d 80 81 37 c0 0f ae e8 66 90 f6 c3 01 74 04 f3 90 <eb> ec a1 40 81 37 c0 89 f1 29 c1 > > Call Trace: > [<c0105c0f>] jiffies_to_st+0x17 <-- > [<c023d00c>] startup_pirq+0x46 > [<c023f80f>] uuid_show+0x4d > [<c02cbd87>] rwsem_down_read_failed+0x23 > [<c01175ac>] sched_move_task+0x71 > [<c01177c6>] try_to_wake_up+0xbc > [<c012eca5>] wake_bit_function+0x30 > [<c0114b84>] sys_sched_get_priority_max+0x16 > [<c011684b>] print_cfs_rq+0x84 > [<c012c27d>] flush_cpu_workqueue+0xc > [<c012c5c5>] queue_work+0x28 > [<c012c620>] __cancel_work_timer+0x3 > [<c0106644>] timer_interrupt+0x237 > [<c011684b>] print_cfs_rq+0x84 > [<c0105f74>] get_nsec_offset+0xe > [<c0149949>] handle_bad_irq+0x10 > [<c014aa5d>] handle_level_irq+0xd8 > [<c0105b00>] do_IRQ+0x4d > [<c023d76f>] xen_clear_irq_pending+0x3 > [<c010412c>] hypervisor_callback+0x3c > [<c01013a7>] hypercall_page+0x3a7 > [<c0105f52>] xen_safe_halt+0x9f > [<c01028ab>] xen_idle+0x1b > [<c0102810>] cpu_idle+0xa8 > > > -- Pasi >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-Mar-08 08:54 UTC
Re: [Xen-devel] Debian Lenny 2.6.26-2-xen-686 crashing as multi-vcpu domU, stack trace
>>> Pasi Kärkkäinen<pasik@iki.fi> 08.03.10 09:36 >>> >I still have the guest up.. in the crashed state. >Anything that I should try?It''s not really something you should try, it''s really the analysis of the call stack that''s going to get you forward. In particular, after reconstructing the true call stack (i.e. with all false entries removed) and after determining which two locks are being acquired by the two vCPU-s, it should be possible to determine whether respectively the other vCPU is currently holding those locks. Plus this work would also show whether the handle_bad_irq() entries on the stack are directly or only indirectly (and hence only possibly) related to the issue. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel