James Harper
2010-Sep-03 00:20 UTC
[Xen-devel] BUG: soft lockup - CPU#1 stuck for 61s! [udevd:4865]
I''ve had "BUG: soft lockup - CPU#1 stuck for 61s! [udevd:4865]" happen twice now and the server fairly quickly (over the course of 5-10 minutes) becomes unusable after that happens. The call trace appears to be: Call Trace: [<ffffffff8100e8e9>] ? xen_force_evtchn_callback+0x9/0xa [<ffffffff8100ef72>] ? check_events+0x12/0x20 [<ffffffff8100ef19>] ? xen_irq_enable_direct_end+0x0/0x7 [<ffffffff8102fcc1>] ? do_page_fault+0x9e/0x27b [<ffffffff812b39a5>] ? page_fault+0x25/0x30 [<ffffffff8116f28d>] ? __put_user_4+0x1d/0x30 [<ffffffff81045265>] ? schedule_tail+0x92/0x96 [<ffffffff81011983>] ? ret_from_fork+0x13/0x80 That was under 2.6.32.17-g69a73fa. I''ve since upgraded to 2.6.32.18-ge6b9b2c but the changelogs don''t show anything obviously related to the crash (but as usual there are a lot of them so maybe I''ve missed something?) Is this a known problem under 2.6.32.17-g69a73fa that has since been fixed? Thanks James _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Sep-03 00:26 UTC
Re: [Xen-devel] BUG: soft lockup - CPU#1 stuck for 61s! [udevd:4865]
On 09/02/2010 05:20 PM, James Harper wrote:> I''ve had "BUG: soft lockup - CPU#1 stuck for 61s! [udevd:4865]" happen > twice now and the server fairly quickly (over the course of 5-10 > minutes) becomes unusable after that happens. The call trace appears to > be: > > Call Trace: > [<ffffffff8100e8e9>] ? xen_force_evtchn_callback+0x9/0xa > [<ffffffff8100ef72>] ? check_events+0x12/0x20 > [<ffffffff8100ef19>] ? xen_irq_enable_direct_end+0x0/0x7 > [<ffffffff8102fcc1>] ? do_page_fault+0x9e/0x27b > [<ffffffff812b39a5>] ? page_fault+0x25/0x30 > [<ffffffff8116f28d>] ? __put_user_4+0x1d/0x30 > [<ffffffff81045265>] ? schedule_tail+0x92/0x96 > [<ffffffff81011983>] ? ret_from_fork+0x13/0x80 > > That was under 2.6.32.17-g69a73fa. I''ve since upgraded to > 2.6.32.18-ge6b9b2c but the changelogs don''t show anything obviously > related to the crash (but as usual there are a lot of them so maybe I''ve > missed something?) > > Is this a known problem under 2.6.32.17-g69a73fa that has since been > fixed?Hard to know; nothing springs to mind right now. What else is going on at the time? What''s the full softlockup message? Is there any more context? There have been some fixes relating to long-standing interrupt migration bugs, but they''re after 69a73fa. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
James Harper
2010-Sep-03 00:35 UTC
RE: [Xen-devel] BUG: soft lockup - CPU#1 stuck for 61s! [udevd:4865]
> > On 09/02/2010 05:20 PM, James Harper wrote: > > I''ve had "BUG: soft lockup - CPU#1 stuck for 61s! [udevd:4865]"happen> > twice now and the server fairly quickly (over the course of 5-10 > > minutes) becomes unusable after that happens. The call trace appearsto> > be: > > > > Call Trace: > > [<ffffffff8100e8e9>] ? xen_force_evtchn_callback+0x9/0xa > > [<ffffffff8100ef72>] ? check_events+0x12/0x20 > > [<ffffffff8100ef19>] ? xen_irq_enable_direct_end+0x0/0x7 > > [<ffffffff8102fcc1>] ? do_page_fault+0x9e/0x27b > > [<ffffffff812b39a5>] ? page_fault+0x25/0x30 > > [<ffffffff8116f28d>] ? __put_user_4+0x1d/0x30 > > [<ffffffff81045265>] ? schedule_tail+0x92/0x96 > > [<ffffffff81011983>] ? ret_from_fork+0x13/0x80 > > > > That was under 2.6.32.17-g69a73fa. I''ve since upgraded to > > 2.6.32.18-ge6b9b2c but the changelogs don''t show anything obviously > > related to the crash (but as usual there are a lot of them so maybeI''ve> > missed something?) > > > > Is this a known problem under 2.6.32.17-g69a73fa that has since been > > fixed? > > Hard to know; nothing springs to mind right now. What else is goingon> at the time? What''s the full softlockup message? Is there any more > context?I just noticed that it coincided pretty much exactly when a backup runs. I take a snapshot in Dom0 then block-attach it to the linux DomU so it can be backed up from DomU which makes restores easier etc, rather than doing the backup in Dom0. I hadn''t noticed before because it''s the Windows DomU that people complain about first, not the linux DomU. Block-attach/detach would explain why udevd is involved too which makes more sense. The block-attach happens every hour on the hour. The full cut&paste of the first hang is: Sep 3 09:00:42 keechsvr2 kernel: [235893.079566] blkback: ring-ref 1442, event-channel 13, protocol 1 (x86_64-abi) Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] BUG: soft lockup - CPU#1 stuck for 61s! [udevd:4865] Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] Modules linked in: ipt_MASQUERADE xt_physdev pppoe pppox ipmi_devintf autofs4 tun act_police sch_ingress cls_u32 sch_sfq sch_cbq ppp_generic slhc ipt_ULO G xt_limit ipt_REJECT xt_state xt_dscp iptable_filter ipt_REDIRECT xt_conntrack iptable_nat xt_TCPMSS xt_tcpmss xt_TCPOPTSTRIP xt_tcpudp iptable_mangle ip_tables x_tables ipv6 8021q garp bridge stp llc n f_nat_ftp nf_conntrack_ftp ip_gre nf_nat_pptp nf_conntrack_pptp nf_conntrack_proto_gre nf_nat_proto_gre nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 xen_gntdev xen_evtchn xen_netback xen_blkback blkback_pagemap loop snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd tpm_tis tpm sd_mod soundcore ipmi_si tpm_bios hpilo snd_page_alloc ipmi_msghandler pcspkr psmouse serio_raw rng_core i5000_edac edac_c ore i5k_amb button evdev xfs exportfs dm_mirror dm_region_hash dm_log dm_snapshot dm_mod ide_cd_mod cdrom ata_piix ata_generic libata usb_storage scsi_mod usbhid hid piix ide_pci_generic bnx2 ehci_hcd id e_core uhci_hc Sep 3 09:02:13 keechsvr2 kernel: cciss thermal fan [last unloaded: scsi_wait_scan] Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] CPU 1: Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] Modules linked in: ipt_MASQUERADE xt_physdev pppoe pppox ipmi_devintf autofs4 tun act_police sch_ingress cls_u32 sch_sfq sch_cbq ppp_generic slhc ipt_ULO G xt_limit ipt_REJECT xt_state xt_dscp iptable_filter ipt_REDIRECT xt_conntrack iptable_nat xt_TCPMSS xt_tcpmss xt_TCPOPTSTRIP xt_tcpudp iptable_mangle ip_tables x_tables ipv6 8021q garp bridge stp llc n f_nat_ftp nf_conntrack_ftp ip_gre nf_nat_pptp nf_conntrack_pptp nf_conntrack_proto_gre nf_nat_proto_gre nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 xen_gntdev xen_evtchn xen_netback xen_blkback blkback_pagemap loop snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd tpm_tis tpm sd_mod soundcore ipmi_si tpm_bios hpilo snd_page_alloc ipmi_msghandler pcspkr psmouse serio_raw rng_core i5000_edac edac_c ore i5k_amb button evdev xfs exportfs dm_mirror dm_region_hash dm_log dm_snapshot dm_mod ide_cd_mod cdrom ata_piix ata_generic libata usb_storage scsi_mod usbhid hid piix ide_pci_generic bnx2 ehci_hcd id e_core uhci_hc Sep 3 09:02:13 keechsvr2 kernel: cciss thermal fan [last unloaded: scsi_wait_scan] Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] Pid: 4865, comm: udevd Not tainted 2.6.32.17-g69a73fa #1 ProLiant ML370 G5 Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RIP: e030:[<ffffffff8100922a>] [<ffffffff8100922a>] hypercall_page+0x22a/0x1001 Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RSP: e02b:ffff8800114d5dd0 EFLAGS: 00000246 Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RAX: 0000000000040000 RBX: 0000000000000001 RCX: ffffffff8100922a keechsvr2:/usr/local/src/linux-2.6-xen# head -100 /var/log/kern.log Sep 3 07:00:44 keechsvr2 kernel: [228694.767691] blkback: ring-ref 311, event-channel 13, protocol 1 (x86_64-abi) Sep 3 08:00:42 keechsvr2 kernel: [232293.072248] blkback: ring-ref 1334, event-channel 13, protocol 1 (x86_64-abi) Sep 3 09:00:42 keechsvr2 kernel: [235893.079566] blkback: ring-ref 1442, event-channel 13, protocol 1 (x86_64-abi) Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] BUG: soft lockup - CPU#1 stuck for 61s! [udevd:4865] Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] Modules linked in: ipt_MASQUERADE xt_physdev pppoe pppox ipmi_devintf autofs4 tun act_police sch_ingress cls_u32 sch_sfq sch_cbq ppp_generic slhc ipt_ULOG xt_limit ipt_REJECT xt_state xt_dscp iptable_filter ipt_REDIRECT xt_conntrack iptable_nat xt_TCPMSS xt_tcpmss xt_TCPOPTSTRIP xt_tcpudp iptable_mangle ip_tables x_tables ipv6 8021q garp bridge stp llc nf_nat_ftp nf_conntrack_ftp ip_gre nf_nat_pptp nf_conntrack_pptp nf_conntrack_proto_gre nf_nat_proto_gre nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 xen_gntdev xen_evtchn xen_netback xen_blkback blkback_pagemap loop snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd tpm_tis tpm sd_mod soundcore ipmi_si tpm_bios hpilo snd_page_alloc ipmi_msghandler pcspkr psmouse serio_raw rng_core i5000_edac edac_core i5k_amb button evdev xfs exportfs dm_mirror dm_region_hash dm_log dm_snapshot dm_mod ide_cd_mod cdrom ata_piix ata_generic libata usb_storage scsi_mod usbhid hid piix ide_pci_generic bnx2 ehci_hcd ide_core uhci_hc Sep 3 09:02:13 keechsvr2 kernel: cciss thermal fan [last unloaded: scsi_wait_scan] Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] CPU 1: Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] Modules linked in: ipt_MASQUERADE xt_physdev pppoe pppox ipmi_devintf autofs4 tun act_police sch_ingress cls_u32 sch_sfq sch_cbq ppp_generic slhc ipt_ULOG xt_limit ipt_REJECT xt_state xt_dscp iptable_filter ipt_REDIRECT xt_conntrack iptable_nat xt_TCPMSS xt_tcpmss xt_TCPOPTSTRIP xt_tcpudp iptable_mangle ip_tables x_tables ipv6 8021q garp bridge stp llc nf_nat_ftp nf_conntrack_ftp ip_gre nf_nat_pptp nf_conntrack_pptp nf_conntrack_proto_gre nf_nat_proto_gre nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 xen_gntdev xen_evtchn xen_netback xen_blkback blkback_pagemap loop snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd tpm_tis tpm sd_mod soundcore ipmi_si tpm_bios hpilo snd_page_alloc ipmi_msghandler pcspkr psmouse serio_raw rng_core i5000_edac edac_core i5k_amb button evdev xfs exportfs dm_mirror dm_region_hash dm_log dm_snapshot dm_mod ide_cd_mod cdrom ata_piix ata_generic libata usb_storage scsi_mod usbhid hid piix ide_pci_generic bnx2 ehci_hcd ide_core uhci_hc Sep 3 09:02:13 keechsvr2 kernel: cciss thermal fan [last unloaded: scsi_wait_scan] Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] Pid: 4865, comm: udevd Not tainted 2.6.32.17-g69a73fa #1 ProLiant ML370 G5 Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RIP: e030:[<ffffffff8100922a>] [<ffffffff8100922a>] hypercall_page+0x22a/0x1001 Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RSP: e02b:ffff8800114d5dd0 EFLAGS: 00000246 Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RAX: 0000000000040000 RBX: 0000000000000001 RCX: ffffffff8100922a Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RDX: ffffffff814293d0 RSI: 0000000000000000 RDI: 0000000000000000 Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RBP: 00007fbc628a9800 R08: 0000000000000000 R09: 0000000000000000 Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] R10: 0000000000000002 R11: 0000000000000246 R12: 0000000000000002 Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] R13: ffff8800114d5e88 R14: ffff880017f9cfc0 R15: ffff880017eed500 Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] FS: 00007fbc628a9770(0000) GS:ffff880002de8000(0000) knlGS:0000000000000000 Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] CR2: 00007fbc628a9800 CR3: 000000001e2d7000 CR4: 0000000000002660 Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] Call Trace: Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] [<ffffffff8100e8e9>] ? xen_force_evtchn_callback+0x9/0xa Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] [<ffffffff8100ef72>] ? check_events+0x12/0x20 Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] [<ffffffff8100ef19>] ? xen_irq_enable_direct_end+0x0/0x7 Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] [<ffffffff8102fcc1>] ? do_page_fault+0x9e/0x27b Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] [<ffffffff812b39a5>] ? page_fault+0x25/0x30 Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] [<ffffffff8116f28d>] ? __put_user_4+0x1d/0x30 Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] [<ffffffff81045265>] ? schedule_tail+0x92/0x96 Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] [<ffffffff81011983>] ? ret_from_fork+0x13/0x80 There''s another hang at 09:03:09 which I can post if you''d find it useful. If the block-attach happened at 09:00:42 and the crash happened at 09:02:13 then I assume that whatever udevd started doing started at 09:01:12 (eg 61 seconds earlier) which means that it''s more likely to be the block-detach at the end of the backup causing the hang not the block-attach. The backup runs pretty quick. I''m inclined to turn off the snapshot backup if this isn''t a known and resolved problem... Thanks James _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
James Harper
2010-Sep-05 23:44 UTC
RE: [Xen-devel] BUG: soft lockup - CPU#1 stuck for 61s! [udevd:4865]
This is still happening with 2.6.32.18-ge6b9b2c but I''m now not so sure if it''s a problem with blkback after all. My script does this: echo source=$source >>/tmp/attach-snapshot.log echo dest=$dest >>/tmp/attach-snapshot.log lvcreate --snapshot --name=$dest --size=1G $source lvscan >>/tmp/attach-snapshot.log xm block-attach $domname phy:$dest $destdev w /tmp/attach-snapshot.log shows "source=" and "dest=" but not a result of "lvscan", which would indicate that it''s the "lvcreate" that''s hung, not the "xm block-attach". Disk writes are still working so I know that it''s not just a missed buffer or anything. That would explain why udevd is hanging too - lvcreate would involve creating a new device while "xm block-attach" wouldn''t. I think I''m barking up the wrong tree on xen-devel unless anyone else has any suggestions? Thanks James> -----Original Message----- > From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel- > bounces@lists.xensource.com] On Behalf Of James Harper > Sent: Friday, 3 September 2010 10:36 > To: Jeremy Fitzhardinge > Cc: xen-devel@lists.xensource.com > Subject: RE: [Xen-devel] BUG: soft lockup - CPU#1 stuck for 61s![udevd:4865]> > > > > On 09/02/2010 05:20 PM, James Harper wrote: > > > I''ve had "BUG: soft lockup - CPU#1 stuck for 61s! [udevd:4865]" > happen > > > twice now and the server fairly quickly (over the course of 5-10 > > > minutes) becomes unusable after that happens. The call traceappears> to > > > be: > > > > > > Call Trace: > > > [<ffffffff8100e8e9>] ? xen_force_evtchn_callback+0x9/0xa > > > [<ffffffff8100ef72>] ? check_events+0x12/0x20 > > > [<ffffffff8100ef19>] ? xen_irq_enable_direct_end+0x0/0x7 > > > [<ffffffff8102fcc1>] ? do_page_fault+0x9e/0x27b > > > [<ffffffff812b39a5>] ? page_fault+0x25/0x30 > > > [<ffffffff8116f28d>] ? __put_user_4+0x1d/0x30 > > > [<ffffffff81045265>] ? schedule_tail+0x92/0x96 > > > [<ffffffff81011983>] ? ret_from_fork+0x13/0x80 > > > > > > That was under 2.6.32.17-g69a73fa. I''ve since upgraded to > > > 2.6.32.18-ge6b9b2c but the changelogs don''t show anythingobviously> > > related to the crash (but as usual there are a lot of them somaybe> I''ve > > > missed something?) > > > > > > Is this a known problem under 2.6.32.17-g69a73fa that has sincebeen> > > fixed? > > > > Hard to know; nothing springs to mind right now. What else is going > on > > at the time? What''s the full softlockup message? Is there any more > > context? > > I just noticed that it coincided pretty much exactly when a backupruns.> I take a snapshot in Dom0 then block-attach it to the linux DomU so it > can be backed up from DomU which makes restores easier etc, ratherthan> doing the backup in Dom0. I hadn''t noticed before because it''s the > Windows DomU that people complain about first, not the linux DomU. > Block-attach/detach would explain why udevd is involved too whichmakes> more sense. The block-attach happens every hour on the hour. The full > cut&paste of the first hang is: > > Sep 3 09:00:42 keechsvr2 kernel: [235893.079566] blkback: ring-ref > 1442, event-channel 13, protocol 1 (x86_64-abi) > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] BUG: soft lockup - > CPU#1 stuck for 61s! [udevd:4865] > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] Modules linked in: > ipt_MASQUERADE xt_physdev pppoe pppox ipmi_devintf autofs4 tun > act_police sch_ingress cls_u32 sch_sfq sch_cbq ppp_generic slhcipt_ULO> G xt_limit ipt_REJECT xt_state xt_dscp iptable_filter ipt_REDIRECT > xt_conntrack iptable_nat xt_TCPMSS xt_tcpmss xt_TCPOPTSTRIP xt_tcpudp > iptable_mangle ip_tables x_tables ipv6 8021q garp bridge stp llc n > f_nat_ftp nf_conntrack_ftp ip_gre nf_nat_pptp nf_conntrack_pptp > nf_conntrack_proto_gre nf_nat_proto_gre nf_nat nf_conntrack_ipv4 > nf_conntrack nf_defrag_ipv4 xen_gntdev xen_evtchn xen_netback > xen_blkback > blkback_pagemap loop snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd > tpm_tis tpm sd_mod soundcore ipmi_si tpm_bios hpilo snd_page_alloc > ipmi_msghandler pcspkr psmouse serio_raw rng_core i5000_edac edac_c > ore i5k_amb button evdev xfs exportfs dm_mirror dm_region_hash dm_log > dm_snapshot dm_mod ide_cd_mod cdrom ata_piix ata_generic libata > usb_storage scsi_mod usbhid hid piix ide_pci_generic bnx2 ehci_hcd id > e_core uhci_hc > Sep 3 09:02:13 keechsvr2 kernel: cciss thermal fan [last unloaded: > scsi_wait_scan] > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] CPU 1: > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] Modules linked in: > ipt_MASQUERADE xt_physdev pppoe pppox ipmi_devintf autofs4 tun > act_police sch_ingress cls_u32 sch_sfq sch_cbq ppp_generic slhcipt_ULO> G xt_limit ipt_REJECT xt_state xt_dscp iptable_filter ipt_REDIRECT > xt_conntrack iptable_nat xt_TCPMSS xt_tcpmss xt_TCPOPTSTRIP xt_tcpudp > iptable_mangle ip_tables x_tables ipv6 8021q garp bridge stp llc n > f_nat_ftp nf_conntrack_ftp ip_gre nf_nat_pptp nf_conntrack_pptp > nf_conntrack_proto_gre nf_nat_proto_gre nf_nat nf_conntrack_ipv4 > nf_conntrack nf_defrag_ipv4 xen_gntdev xen_evtchn xen_netback > xen_blkback > blkback_pagemap loop snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd > tpm_tis tpm sd_mod soundcore ipmi_si tpm_bios hpilo snd_page_alloc > ipmi_msghandler pcspkr psmouse serio_raw rng_core i5000_edac edac_c > ore i5k_amb button evdev xfs exportfs dm_mirror dm_region_hash dm_log > dm_snapshot dm_mod ide_cd_mod cdrom ata_piix ata_generic libata > usb_storage scsi_mod usbhid hid piix ide_pci_generic bnx2 ehci_hcd id > e_core uhci_hc > Sep 3 09:02:13 keechsvr2 kernel: cciss thermal fan [last unloaded: > scsi_wait_scan] > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] Pid: 4865, comm:udevd> Not tainted 2.6.32.17-g69a73fa #1 ProLiant ML370 G5 > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RIP: > e030:[<ffffffff8100922a>] [<ffffffff8100922a>] > hypercall_page+0x22a/0x1001 > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RSP: > e02b:ffff8800114d5dd0 EFLAGS: 00000246 > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RAX:0000000000040000> RBX: 0000000000000001 RCX: ffffffff8100922a > keechsvr2:/usr/local/src/linux-2.6-xen# head -100 /var/log/kern.log > Sep 3 07:00:44 keechsvr2 kernel: [228694.767691] blkback: ring-ref311,> event-channel 13, protocol 1 (x86_64-abi) > Sep 3 08:00:42 keechsvr2 kernel: [232293.072248] blkback: ring-ref > 1334, event-channel 13, protocol 1 (x86_64-abi) > Sep 3 09:00:42 keechsvr2 kernel: [235893.079566] blkback: ring-ref > 1442, event-channel 13, protocol 1 (x86_64-abi) > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] BUG: soft lockup - > CPU#1 stuck for 61s! [udevd:4865] > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] Modules linked in: > ipt_MASQUERADE xt_physdev pppoe pppox ipmi_devintf autofs4 tun > act_police sch_ingress cls_u32 sch_sfq sch_cbq ppp_generic slhcipt_ULOG> xt_limit ipt_REJECT xt_state xt_dscp iptable_filter ipt_REDIRECT > xt_conntrack iptable_nat xt_TCPMSS xt_tcpmss xt_TCPOPTSTRIP xt_tcpudp > iptable_mangle ip_tables x_tables ipv6 8021q garp bridge stp llc > nf_nat_ftp nf_conntrack_ftp ip_gre nf_nat_pptp nf_conntrack_pptp > nf_conntrack_proto_gre nf_nat_proto_gre nf_nat nf_conntrack_ipv4 > nf_conntrack nf_defrag_ipv4 xen_gntdev xen_evtchn xen_netback > xen_blkback blkback_pagemap loop snd_pcm_oss snd_mixer_oss snd_pcm > snd_timer snd tpm_tis tpm sd_mod soundcore ipmi_si tpm_bios hpilo > snd_page_alloc ipmi_msghandler pcspkr psmouse serio_raw rng_core > i5000_edac edac_core i5k_amb button evdev xfs exportfs dm_mirror > dm_region_hash dm_log dm_snapshot dm_mod ide_cd_mod cdrom ata_piix > ata_generic libata usb_storage scsi_mod usbhid hid piixide_pci_generic> bnx2 ehci_hcd ide_core uhci_hc > Sep 3 09:02:13 keechsvr2 kernel: cciss thermal fan [last unloaded: > scsi_wait_scan] > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] CPU 1: > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] Modules linked in: > ipt_MASQUERADE xt_physdev pppoe pppox ipmi_devintf autofs4 tun > act_police sch_ingress cls_u32 sch_sfq sch_cbq ppp_generic slhcipt_ULOG> xt_limit ipt_REJECT xt_state xt_dscp iptable_filter ipt_REDIRECT > xt_conntrack iptable_nat xt_TCPMSS xt_tcpmss xt_TCPOPTSTRIP xt_tcpudp > iptable_mangle ip_tables x_tables ipv6 8021q garp bridge stp llc > nf_nat_ftp nf_conntrack_ftp ip_gre nf_nat_pptp nf_conntrack_pptp > nf_conntrack_proto_gre nf_nat_proto_gre nf_nat nf_conntrack_ipv4 > nf_conntrack nf_defrag_ipv4 xen_gntdev xen_evtchn xen_netback > xen_blkback blkback_pagemap loop snd_pcm_oss snd_mixer_oss snd_pcm > snd_timer snd tpm_tis tpm sd_mod soundcore ipmi_si tpm_bios hpilo > snd_page_alloc ipmi_msghandler pcspkr psmouse serio_raw rng_core > i5000_edac edac_core i5k_amb button evdev xfs exportfs dm_mirror > dm_region_hash dm_log dm_snapshot dm_mod ide_cd_mod cdrom ata_piix > ata_generic libata usb_storage scsi_mod usbhid hid piixide_pci_generic> bnx2 ehci_hcd ide_core uhci_hc > Sep 3 09:02:13 keechsvr2 kernel: cciss thermal fan [last unloaded: > scsi_wait_scan] > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] Pid: 4865, comm:udevd> Not tainted 2.6.32.17-g69a73fa #1 ProLiant ML370 G5 > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RIP: > e030:[<ffffffff8100922a>] [<ffffffff8100922a>] > hypercall_page+0x22a/0x1001 > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RSP: > e02b:ffff8800114d5dd0 EFLAGS: 00000246 > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RAX:0000000000040000> RBX: 0000000000000001 RCX: ffffffff8100922a > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RDX:ffffffff814293d0> RSI: 0000000000000000 RDI: 0000000000000000 > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RBP:00007fbc628a9800> R08: 0000000000000000 R09: 0000000000000000 > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] R10:0000000000000002> R11: 0000000000000246 R12: 0000000000000002 > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] R13:ffff8800114d5e88> R14: ffff880017f9cfc0 R15: ffff880017eed500 > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] FS: > 00007fbc628a9770(0000) GS:ffff880002de8000(0000)knlGS:0000000000000000> Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] CS: e033 DS: 0000ES:> 0000 CR0: 000000008005003b > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] CR2:00007fbc628a9800> CR3: 000000001e2d7000 CR4: 0000000000002660 > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] DR0:0000000000000000> DR1: 0000000000000000 DR2: 0000000000000000 > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] DR3:0000000000000000> DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] Call Trace: > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505][<ffffffff8100e8e9>]> ? xen_force_evtchn_callback+0x9/0xa > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505][<ffffffff8100ef72>]> ? check_events+0x12/0x20 > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505][<ffffffff8100ef19>]> ? xen_irq_enable_direct_end+0x0/0x7 > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505][<ffffffff8102fcc1>]> ? do_page_fault+0x9e/0x27b > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505][<ffffffff812b39a5>]> ? page_fault+0x25/0x30 > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505][<ffffffff8116f28d>]> ? __put_user_4+0x1d/0x30 > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505][<ffffffff81045265>]> ? schedule_tail+0x92/0x96 > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505][<ffffffff81011983>]> ? ret_from_fork+0x13/0x80 > > There''s another hang at 09:03:09 which I can post if you''d find it > useful. > > If the block-attach happened at 09:00:42 and the crash happened at > 09:02:13 then I assume that whatever udevd started doing started at > 09:01:12 (eg 61 seconds earlier) which means that it''s more likely tobe> the block-detach at the end of the backup causing the hang not the > block-attach. The backup runs pretty quick. > > I''m inclined to turn off the snapshot backup if this isn''t a known and > resolved problem... > > Thanks > > James > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel