MadLoisae@gmx.net
2011-Apr-13 07:30 UTC
[Xen-devel] on starting HVM-domU whole system freezes with "soft lockup - CPU X stuck for XXs! [qemu-dm:...]"
Hi xen-devel, ---- I already posted this on xen-users - Todd Deshane asked me to post this here for feedback. He also asked if I can test xen 4.1 - I am looking if I can find and install backports but if not I''m afraid I think I''m not able to compiling on myself. ---- I am trying since a few days my first steps with xen. Hardware: Core2 T7200, Intel 945GME, 2GB RAM Software: debian squeeze, i686, with debian delivered xen 4.0.1 CPU and BIOS support hardware virtualisation: (XEN) HVM: VMX enabled I can successfully boot with xen my dom0 with squeeze-delivered i686 kernel. As soon as I start a HVM (paravirtualisation works without problems) my dom0 stucks immediately. I can see about 10 seconds after starting domU "nothing", then the first messages like "hrtimer: interrupt took 1739955444 ns" messages appear, then my disk gets timeouts. then kernel panics like below are starting - normally they are not written to messages-log, one time i had "luck": kernel: : [ 4815.144473] saa7146 (0) vpeirq: used 3 times >80% of buffer (1049604 bytes now) kernel: : [ 4815.144473] Modules linked in: tun xt_physdev loop ipt_REJECT ip6table_filter ip6_tables ebtable_nat ebtables bridge stp xen_evtchn xenfs nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs xt_recent ipt_MASQUERADE xt_tcpudp xt_state iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables x_tables fuse ext4 jbd2 crc16 it87 hwmon_vid coretemp tda10021 snd_hda_codec_via budget_av snd_hda_intel snd_hda_codec saa7146_vv snd_hwdep videodev v4l1_compat snd_pcm_oss snd_mixer_oss videobuf_dma_sg videobuf_core snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq budget_core i915 drm_kms_helper dvb_core snd_timer saa7146 snd_seq_device ttpci_eeprom drm rng_core pcspkr evdev i2c_i801 i2c_algo_bit snd i2c_core soundcore video output button snd_page_alloc processor acpi_processor ext3 jbd mbcache dm_mod sd_mod crc_t10dif ata_generic uhci_hcd ata_piix fan ehci_hcd libata scsi_mod e1000e usbcore nls_base thermal thermal_sys [last unloaded: scsi_wait_scan] kernel: : [ 4815.144473] kernel: : [ 4815.324579] saa7146 (0) saa7146_i2c_writeout [irq]: timed out waiting for end of xfer kernel: : [ 4815.324722] ata1: lost interrupt (Status 0x50) kernel: : [ 4815.324772] sd 0:0:0:0: [sda] Unhandled error code kernel: : [ 4815.324775] sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT kernel: : [ 4815.324780] sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 00 d3 41 9f 00 00 28 00 kernel: : [ 4815.324806] lost page write due to I/O error on sda1 kernel: : [ 4815.324817] lost page write due to I/O error on sda1 kernel: : [ 4815.324826] lost page write due to I/O error on sda1 kernel: : [ 4815.324834] lost page write due to I/O error on sda1 kernel: : [ 4815.324843] lost page write due to I/O error on sda1 kernel: : [ 4815.450483] Pid: 1337, comm: qemu-dm Not tainted (2.6.32-5-xen-686 #1) 945GM/E-ITE8712 kernel: : [ 4815.450483] EIP: 0061:[<c1002227>] EFLAGS: 00200246 CPU: 0 kernel: : [ 4815.450483] EIP is at hypercall_page+0x227/0x1001 kernel: : [ 4815.450483] EAX: 00040000 EBX: 00000000 ECX: 00000000 EDX: c357a7b4 kernel: : [ 4815.450483] ESI: 00000009 EDI: 00000028 EBP: c13959e4 ESP: ddb6defc kernel: : [ 4815.450483] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069 kernel: : [ 4815.450483] CR0: 8005003b CR2: 0807f9d0 CR3: 1da42000 CR4: 00002660 kernel: : [ 4815.450483] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 kernel: : [ 4815.450483] DR6: ffff0ff0 DR7: 00000400 kernel: : [ 4815.450483] Call Trace: kernel: : [ 4815.450483] [<c1006048>] ? xen_force_evtchn_callback+0xc/0x10 kernel: : [ 4815.450483] [<c1006778>] ? check_events+0x8/0xc kernel: : [ 4815.450483] [<c1006737>] ? xen_irq_enable_direct_end+0x0/0x1 kernel: : [ 4815.450483] [<c103c80b>] ? __do_softirq+0x4b/0x156 kernel: : [ 4815.450483] [<c103c947>] ? do_softirq+0x31/0x3c kernel: : [ 4815.450483] [<c103ca21>] ? irq_exit+0x26/0x58 kernel: : [ 4815.450483] [<c1199a16>] ? xen_evtchn_do_upcall+0x22/0x2c kernel: : [ 4815.653736] [<c1009b5f>] ? xen_do_upcall+0x7/0xc kernel: : [ 4815.653736] [<c104a74c>] ? sys_clock_gettime+0x46/0x7e kernel: : [ 4815.653736] [<c1008f9c>] ? syscall_call+0x7/0xb kernel: : [ 4815.676006] saa7146 (0) vpeirq: used 1 times >80% of buffer (1300396 bytes now) on monitor I can see frequently messages which look like: soft lockup - CPU X stuck for XXs! [qemu-dm:...] always i saw qemu-dm is listed with PID in this message. these messages I can never find in messages-log, i think the machine is too dead to write them down onto disc. the only way to get the machine back is to be fast enough after starting (means less than about 10 seconds) do a xm destroy <name>. otherwise the system needs to be powercycled, rarely ctrl-alt-delete on console initiates a reboot, most time this also does not work. can anybody tell me where I can search for the issue? currently I tried to change architecture to amd64, but as it is not only the kernel and xen itself this is not a "fast try", so hopefully somebody can help me here. attached: xm dmesg (from i686) in there I''ve limited memory of dom0 to 1GB because I tought maybe balooning causes the issue. my HVM machines never had more than 512MB configured RAM, but nevertheless the problem also occurs with 128 or 256MB configured. thank you for your investigations. best regards Alois _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Apr-14 13:09 UTC
Re: [Xen-devel] on starting HVM-domU whole system freezes with "soft lockup - CPU X stuck for XXs! [qemu-dm:...]"
On Wed, Apr 13, 2011 at 09:30:51AM +0200, MadLoisae@gmx.net wrote:> Hi xen-devel, > > ---- > I already posted this on xen-users - Todd Deshane asked me to post > this here for feedback. > He also asked if I can test xen 4.1 - I am looking if I can find and > install backports but if not I''m afraid I think I''m not able to > compiling on myself. > ---- > > I am trying since a few days my first steps with xen. > Hardware: Core2 T7200, Intel 945GME, 2GB RAM > Software: debian squeeze, i686, with debian delivered xen 4.0.1 > CPU and BIOS support hardware virtualisation: (XEN) HVM: VMX enabled > > I can successfully boot with xen my dom0 with squeeze-delivered i686 kernel. > As soon as I start a HVM (paravirtualisation works without problems) > my dom0 stucks immediately. > I can see about 10 seconds after starting domU "nothing", then the > first messages like "hrtimer: interrupt took 1739955444 ns" messages > appear, then my disk gets timeouts. then kernel panics like below > are starting - normally they are not written to messages-log, one > time i had "luck":That is indeed "lucky" as it looks as all the interrupts got disabled on your machine. And all the drivers started to hit their error handling code as they hit their timeouts. But the weird part is that this message got written to disk so the interrupts did get re-enabled.. Is this happening only on this machine? Can you run the attached code and see what happens when the guest starts? Also, how much memory do you give to your domain?> > kernel: : [ 4815.144473] saa7146 (0) vpeirq: used 3 times >80% of > buffer (1049604 bytes now) > kernel: : [ 4815.144473] Modules linked in: tun xt_physdev loop > ipt_REJECT ip6table_filter ip6_tables ebtable_nat ebtables bridge > stp xen_evtchn xenfs nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs > xt_recent ipt_MASQUERADE xt_tcpudp xt_state iptable_nat nf_nat > nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_filter > ip_tables x_tables fuse ext4 jbd2 crc16 it87 hwmon_vid coretemp > tda10021 snd_hda_codec_via budget_av snd_hda_intel snd_hda_codec > saa7146_vv snd_hwdep videodev v4l1_compat snd_pcm_oss snd_mixer_oss > videobuf_dma_sg videobuf_core snd_pcm snd_seq_midi snd_rawmidi > snd_seq_midi_event snd_seq budget_core i915 drm_kms_helper dvb_core > snd_timer saa7146 snd_seq_device ttpci_eeprom drm rng_core pcspkr > evdev i2c_i801 i2c_algo_bit snd i2c_core soundcore video output > button snd_page_alloc processor acpi_processor ext3 jbd mbcache > dm_mod sd_mod crc_t10dif ata_generic uhci_hcd ata_piix fan ehci_hcd > libata scsi_mod e1000e usbcore nls_base thermal thermal_sys [last > unloaded: scsi_wait_scan] > kernel: : [ 4815.144473] > kernel: : [ 4815.324579] saa7146 (0) saa7146_i2c_writeout [irq]: > timed out waiting for end of xfer > kernel: : [ 4815.324722] ata1: lost interrupt (Status 0x50) > kernel: : [ 4815.324772] sd 0:0:0:0: [sda] Unhandled error code > kernel: : [ 4815.324775] sd 0:0:0:0: [sda] Result: hostbyte=DID_OK > driverbyte=DRIVER_TIMEOUT > kernel: : [ 4815.324780] sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 00 > d3 41 9f 00 00 28 00 > kernel: : [ 4815.324806] lost page write due to I/O error on sda1 > kernel: : [ 4815.324817] lost page write due to I/O error on sda1 > kernel: : [ 4815.324826] lost page write due to I/O error on sda1 > kernel: : [ 4815.324834] lost page write due to I/O error on sda1 > kernel: : [ 4815.324843] lost page write due to I/O error on sda1 > kernel: : [ 4815.450483] Pid: 1337, comm: qemu-dm Not tainted > (2.6.32-5-xen-686 #1) 945GM/E-ITE8712 > kernel: : [ 4815.450483] EIP: 0061:[<c1002227>] EFLAGS: 00200246 CPU: 0 > kernel: : [ 4815.450483] EIP is at hypercall_page+0x227/0x1001 > kernel: : [ 4815.450483] EAX: 00040000 EBX: 00000000 ECX: 00000000 > EDX: c357a7b4 > kernel: : [ 4815.450483] ESI: 00000009 EDI: 00000028 EBP: c13959e4 > ESP: ddb6defc > kernel: : [ 4815.450483] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069 > kernel: : [ 4815.450483] CR0: 8005003b CR2: 0807f9d0 CR3: 1da42000 > CR4: 00002660 > kernel: : [ 4815.450483] DR0: 00000000 DR1: 00000000 DR2: 00000000 > DR3: 00000000 > kernel: : [ 4815.450483] DR6: ffff0ff0 DR7: 00000400 > kernel: : [ 4815.450483] Call Trace: > kernel: : [ 4815.450483] [<c1006048>] ? xen_force_evtchn_callback+0xc/0x10 > kernel: : [ 4815.450483] [<c1006778>] ? check_events+0x8/0xc > kernel: : [ 4815.450483] [<c1006737>] ? xen_irq_enable_direct_end+0x0/0x1 > kernel: : [ 4815.450483] [<c103c80b>] ? __do_softirq+0x4b/0x156 > kernel: : [ 4815.450483] [<c103c947>] ? do_softirq+0x31/0x3c > kernel: : [ 4815.450483] [<c103ca21>] ? irq_exit+0x26/0x58 > kernel: : [ 4815.450483] [<c1199a16>] ? xen_evtchn_do_upcall+0x22/0x2c > kernel: : [ 4815.653736] [<c1009b5f>] ? xen_do_upcall+0x7/0xc > kernel: : [ 4815.653736] [<c104a74c>] ? sys_clock_gettime+0x46/0x7e > kernel: : [ 4815.653736] [<c1008f9c>] ? syscall_call+0x7/0xb > kernel: : [ 4815.676006] saa7146 (0) vpeirq: used 1 times >80% of > buffer (1300396 bytes now) > > on monitor I can see frequently messages which look like: > soft lockup - CPU X stuck for XXs! [qemu-dm:...] > always i saw qemu-dm is listed with PID in this message. these > messages I can never find in messages-log, i think the machine is > too dead to write them down onto disc. > > the only way to get the machine back is to be fast enough after > starting (means less than about 10 seconds) do a xm destroy <name>. > otherwise the system needs to be powercycled, rarely ctrl-alt-delete > on console initiates a reboot, most time this also does not work. > > can anybody tell me where I can search for the issue? > currently I tried to change architecture to amd64, but as it is not > only the kernel and xen itself this is not a "fast try", so > hopefully somebody can help me here. > attached: xm dmesg (from i686) > in there I''ve limited memory of dom0 to 1GB because I tought maybe > balooning causes the issue. my HVM machines never had more than > 512MB configured RAM, but nevertheless the problem also occurs with > 128 or 256MB configured. > > thank you for your investigations._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
MadLoisae@gmx.net
2011-Apr-14 16:29 UTC
Re: [Xen-devel] on starting HVM-domU whole system freezes with "soft lockup - CPU X stuck for XXs! [qemu-dm:...]"
Hi Keir, Hi Konrad, thanks for your time. at first, Keir: I''ve added the lines to xen boot, attached a new xm dmesg. until the line "(XEN) HVM1: HVM Loader" all logs were generated from boot, the logs afterwards from starting/killing the domU-machine. also attached a dmesg from dom0. My config is still xen-4.1-amd64 with dom0 debian squeeze kernel i686. my hvm domain config is also attached here. I am sure that to this domU I''ve configured 512MB ram, my dom0 is limited to 1024MB ram, so there should be no bottleneck (host with 2048MB ram, if I do not limit dom0 memory with xen I can allocate there about 1800MB). nevertheless the issue also occurs with a 256MB ram domU. Konrad, I do not know if this is happening also to other machines, first time for me with xen, have never tried or seen on other hardware - or do you mean other domUs on my host? if last then yes, this happens if I use HVM, as soon as I try paravirt it works flawless. I have tried about 10 or 15 different configurations of HVM domUs, always the same problem. I ran your code, the output is attached. also xen ran this time with loglvl=all. If I was running only dom0 the first 4 lines in the output of your code always repeated, the two big output blocks with 8 lines were generated in the about 15 to 20 seconds stucking, afterwards the 4 lines like on beginning repeated again. hopefully this helps, just contact me if you need more information. Alois On 04/14/2011 03:09 PM, Konrad Rzeszutek Wilk wrote:> On Wed, Apr 13, 2011 at 09:30:51AM +0200, MadLoisae@gmx.net wrote: > >> Hi xen-devel, >> >> ---- >> I already posted this on xen-users - Todd Deshane asked me to post >> this here for feedback. >> He also asked if I can test xen 4.1 - I am looking if I can find and >> install backports but if not I''m afraid I think I''m not able to >> compiling on myself. >> ---- >> >> I am trying since a few days my first steps with xen. >> Hardware: Core2 T7200, Intel 945GME, 2GB RAM >> Software: debian squeeze, i686, with debian delivered xen 4.0.1 >> CPU and BIOS support hardware virtualisation: (XEN) HVM: VMX enabled >> >> I can successfully boot with xen my dom0 with squeeze-delivered i686 kernel. >> As soon as I start a HVM (paravirtualisation works without problems) >> my dom0 stucks immediately. >> I can see about 10 seconds after starting domU "nothing", then the >> first messages like "hrtimer: interrupt took 1739955444 ns" messages >> appear, then my disk gets timeouts. then kernel panics like below >> are starting - normally they are not written to messages-log, one >> time i had "luck": >> > That is indeed "lucky" as it looks as all the interrupts got disabled > on your machine. And all the drivers started to hit their error handling > code as they hit their timeouts. > > But the weird part is that this message got written to disk so the interrupts > did get re-enabled.. Is this happening only on this machine? > Can you run the attached code and see what happens when the > guest starts? > > Also, how much memory do you give to your domain? > > >> kernel: : [ 4815.144473] saa7146 (0) vpeirq: used 3 times>80% of >> buffer (1049604 bytes now) >> kernel: : [ 4815.144473] Modules linked in: tun xt_physdev loop >> ipt_REJECT ip6table_filter ip6_tables ebtable_nat ebtables bridge >> stp xen_evtchn xenfs nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs >> xt_recent ipt_MASQUERADE xt_tcpudp xt_state iptable_nat nf_nat >> nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_filter >> ip_tables x_tables fuse ext4 jbd2 crc16 it87 hwmon_vid coretemp >> tda10021 snd_hda_codec_via budget_av snd_hda_intel snd_hda_codec >> saa7146_vv snd_hwdep videodev v4l1_compat snd_pcm_oss snd_mixer_oss >> videobuf_dma_sg videobuf_core snd_pcm snd_seq_midi snd_rawmidi >> snd_seq_midi_event snd_seq budget_core i915 drm_kms_helper dvb_core >> snd_timer saa7146 snd_seq_device ttpci_eeprom drm rng_core pcspkr >> evdev i2c_i801 i2c_algo_bit snd i2c_core soundcore video output >> button snd_page_alloc processor acpi_processor ext3 jbd mbcache >> dm_mod sd_mod crc_t10dif ata_generic uhci_hcd ata_piix fan ehci_hcd >> libata scsi_mod e1000e usbcore nls_base thermal thermal_sys [last >> unloaded: scsi_wait_scan] >> kernel: : [ 4815.144473] >> kernel: : [ 4815.324579] saa7146 (0) saa7146_i2c_writeout [irq]: >> timed out waiting for end of xfer >> kernel: : [ 4815.324722] ata1: lost interrupt (Status 0x50) >> kernel: : [ 4815.324772] sd 0:0:0:0: [sda] Unhandled error code >> kernel: : [ 4815.324775] sd 0:0:0:0: [sda] Result: hostbyte=DID_OK >> driverbyte=DRIVER_TIMEOUT >> kernel: : [ 4815.324780] sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 00 >> d3 41 9f 00 00 28 00 >> kernel: : [ 4815.324806] lost page write due to I/O error on sda1 >> kernel: : [ 4815.324817] lost page write due to I/O error on sda1 >> kernel: : [ 4815.324826] lost page write due to I/O error on sda1 >> kernel: : [ 4815.324834] lost page write due to I/O error on sda1 >> kernel: : [ 4815.324843] lost page write due to I/O error on sda1 >> kernel: : [ 4815.450483] Pid: 1337, comm: qemu-dm Not tainted >> (2.6.32-5-xen-686 #1) 945GM/E-ITE8712 >> kernel: : [ 4815.450483] EIP: 0061:[<c1002227>] EFLAGS: 00200246 CPU: 0 >> kernel: : [ 4815.450483] EIP is at hypercall_page+0x227/0x1001 >> kernel: : [ 4815.450483] EAX: 00040000 EBX: 00000000 ECX: 00000000 >> EDX: c357a7b4 >> kernel: : [ 4815.450483] ESI: 00000009 EDI: 00000028 EBP: c13959e4 >> ESP: ddb6defc >> kernel: : [ 4815.450483] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069 >> kernel: : [ 4815.450483] CR0: 8005003b CR2: 0807f9d0 CR3: 1da42000 >> CR4: 00002660 >> kernel: : [ 4815.450483] DR0: 00000000 DR1: 00000000 DR2: 00000000 >> DR3: 00000000 >> kernel: : [ 4815.450483] DR6: ffff0ff0 DR7: 00000400 >> kernel: : [ 4815.450483] Call Trace: >> kernel: : [ 4815.450483] [<c1006048>] ? xen_force_evtchn_callback+0xc/0x10 >> kernel: : [ 4815.450483] [<c1006778>] ? check_events+0x8/0xc >> kernel: : [ 4815.450483] [<c1006737>] ? xen_irq_enable_direct_end+0x0/0x1 >> kernel: : [ 4815.450483] [<c103c80b>] ? __do_softirq+0x4b/0x156 >> kernel: : [ 4815.450483] [<c103c947>] ? do_softirq+0x31/0x3c >> kernel: : [ 4815.450483] [<c103ca21>] ? irq_exit+0x26/0x58 >> kernel: : [ 4815.450483] [<c1199a16>] ? xen_evtchn_do_upcall+0x22/0x2c >> kernel: : [ 4815.653736] [<c1009b5f>] ? xen_do_upcall+0x7/0xc >> kernel: : [ 4815.653736] [<c104a74c>] ? sys_clock_gettime+0x46/0x7e >> kernel: : [ 4815.653736] [<c1008f9c>] ? syscall_call+0x7/0xb >> kernel: : [ 4815.676006] saa7146 (0) vpeirq: used 1 times>80% of >> buffer (1300396 bytes now) >> >> on monitor I can see frequently messages which look like: >> soft lockup - CPU X stuck for XXs! [qemu-dm:...] >> always i saw qemu-dm is listed with PID in this message. these >> messages I can never find in messages-log, i think the machine is >> too dead to write them down onto disc. >> >> the only way to get the machine back is to be fast enough after >> starting (means less than about 10 seconds) do a xm destroy<name>. >> otherwise the system needs to be powercycled, rarely ctrl-alt-delete >> on console initiates a reboot, most time this also does not work. >> >> can anybody tell me where I can search for the issue? >> currently I tried to change architecture to amd64, but as it is not >> only the kernel and xen itself this is not a "fast try", so >> hopefully somebody can help me here. >> attached: xm dmesg (from i686) >> in there I''ve limited memory of dom0 to 1GB because I tought maybe >> balooning causes the issue. my HVM machines never had more than >> 512MB configured RAM, but nevertheless the problem also occurs with >> 128 or 256MB configured. >> >> thank you for your investigations. >>_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel