Hello, Similar to others I have freezeups on the system, it is consistent with high IO load. If the system runs (even with multiple) XenU it does not happen. But I can consistently force the situation to occur. Running 4 dd processes dumping 20GB each on a LVM/mdadm soft RAID5 volume it consistenly crashes in a DomU. Running without XEN I do not see the problem at all - (e.g. after about 3TB of read/write) nothing happened. Any suggestion would be very welcome. Marc [ .. more .. ] It appears to be very unpredictable of when it actually occurs, here are a few examples. Kind of odd that on Aug29th it always happened on the same second ;-{.> syslog.2:Aug 29 17:35:47 nwsc-xen-Q45 kernel: [ 2698.560009] BUG: soft lockup - CPU#0 stuck for 146s! [events/0:9] > syslog.2:Aug 29 17:35:47 nwsc-xen-Q45 kernel: [ 2698.561016] BUG: soft lockup - CPU#1 stuck for 146s! [rsyslogd:2024] > syslog.2:Aug 29 22:57:27 nwsc-xen-Q45 kernel: [ 4198.404353] BUG: soft lockup - CPU#0 stuck for 122s! [md1_raid5:1243] > syslog.2:Aug 29 23:07:27 nwsc-xen-Q45 kernel: [ 4798.336110] BUG: soft lockup - CPU#0 stuck for 101s! [xend:2583] > syslog.2:Aug 29 23:07:27 nwsc-xen-Q45 kernel: [ 4798.337007] BUG: soft lockup - CPU#1 stuck for 101s! [bdi-default:19] > syslog.2:Aug 29 23:12:27 nwsc-xen-Q45 kernel: [ 5098.304013] BUG: soft lockup - CPU#0 stuck for 136s! [blkback.5.xvdd1:7226] > syslog.2:Aug 29 23:12:27 nwsc-xen-Q45 kernel: [ 5098.305010] BUG: soft lockup - CPU#1 stuck for 136s! [sh:7262] > syslog.6:Aug 17 12:07:08 nwsc-xen-Q45 kernel: [ 2998.596016] BUG: soft lockup - CPU#0 stuck for 73s! [xend:2506] > syslog.6:Aug 17 12:07:08 nwsc-xen-Q45 kernel: [ 2998.597555] BUG: soft lockup - CPU#1 stuck for 73s! [md0_raid5:598] > syslog.6:Aug 17 12:17:08 nwsc-xen-Q45 kernel: [ 3598.534068] BUG: soft lockup - CPU#1 stuck for 150s! [xend:2506]It does not appear to relate to a specific process. (Those above are from Xen 4.0.1 with Debian 2.6.32-5-xen-amd64). This one is with Xen 4.1.2-rc2-pre/Debian 2.6.32-5-xen-amd64. Both are on Intel DQ45CB board with 4GB ram.> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348062] BUG: soft lockup - CPU#0 stuck for 79s! [xend:2767] > Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348073] Modules linked in: xt_tcpudp xt_physdev iptable_filter ip_tables x_ta bles ext4 jbd2 crc16 sata_sil24 hid_apple sky2 via_velocity crc_ccitt usb_storage raid456 md_mod async_raid6_recov async_ pq raid6_pq async_xor xor async_memcpy async_tx dm_mod ext3 jbd mbcache firewire_sbp2 loop sr_mod cdrom sg xenfs xen_evtc hn bridge stp 3w_9xxx usbhid hid sd_mod crc_t10dif snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device firewire_ohci psmouse i2c_i801 video firewire_core uhci_hcd ata_piix snd crc_itu_t output serio_raw evdev ahci pcspkr ehci_hcd i2c_core usbcor e nls_base e1000e button ata_generic soundcore snd_page_alloc libata thermal scsi_mod processor thermal_sys acpi_processor> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348219] CPU 0: > Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348222] Modules linked in: xt_tcpudp xt_physdev iptable_filter ip_tables x_ta bles ext4 jbd2 crc16 sata_sil24 hid_apple sky2 via_velocity crc_ccitt usb_storage raid456 md_mod async_raid6_recov async_ pq raid6_pq async_xor xor async_memcpy async_tx dm_mod ext3 jbd mbcache firewire_sbp2 loop sr_mod cdrom sg xenfs xen_evtc hn bridge stp 3w_9xxx usbhid hid sd_mod crc_t10dif snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device firewire_ohci psmouse i2c_i801 video firewire_core uhci_hcd ata_piix snd crc_itu_t output serio_raw evdev ahci pcspkr ehci_hcd i2c_core usbcor e nls_base e1000e button ata_generic soundcore snd_page_alloc libata thermal scsi_mod processor thermal_sys acpi_processor> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348318] Pid: 2767, comm: xend Not tainted 2.6.32-5-xen-amd64 #1 > Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348322] RIP: e033:[<00007fa4064c0289>] [<00007fa4064c0289>] 0x7fa4064c0289 > Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348330] RSP: e02b:00007fa402ee54a0 EFLAGS: 00000206 > Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348334] RAX: 0000000001c3a320 RBX: 0000000001f8ace0 RCX: 00007fa40650f844 > Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348338] RDX: ffffffffffffffe0 RSI: 0000000000000000 RDI: 00007fa4067a9e40 > Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348341] RBP: 0000000000000000 R08: 0000000000000008 R09: 0000000000000001 > Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348345] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fa4067a9e40 > Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348349] R13: 00007fa402ee555c R14: 00007fa402ee5548 R15: 00000000ffffffff > Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348356] FS: 00007fa402ee6700(0000) GS:ffff880002995000(0000) knlGS:000000000 0000000 > Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348360] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b > Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348363] CR2: 00007fb2ed832e28 CR3: 00000000bba8e000 CR4: 0000000000002660 > Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348367] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348371] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348375] Call Trace: > > Aug 31 13:07:51 nwsc-xen-Q45 init: Id "T1" respawning too fast: disabled for 5 minutes_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
All, Ha - finally - solved. Guess google is not the answer, searching the mailing list is. After much frustration I found the following: http://wiki.debian.org/Xen#A.27clocksource.2BAC8-0.3ATimewentbackwards.27 based on a post by Marco Marongiu http://my.opera.com/marcomarongiu/blog/2010/08/18/debugging-ntp-again-part-4-and-last For me lockup solution #2 worked: # DomU and Dom0 # in /etc/sysctl.conf clocksource=jiffies independent_wallclock=0 # then sysctl -p # in /etc/xen/*.conf extra="clocksource=jiffies" And voila - no more lockups, nothing with the motherboards (which I thought not to be the cause based on success with non-xen configurations) Not sure if this is a kernel or XEN problem though. Hope this helps others On 8/31/2011 2:42 PM, Mark Brown wrote:> Hello, > > Similar to others I have freezeups on the system, it is consistent with > high IO load. If the system runs (even with multiple) XenU it does not > happen. But I can consistently force the situation to occur. > > Running 4 dd processes dumping 20GB each on a LVM/mdadm soft RAID5 > volume it consistenly crashes in a DomU. Running without XEN I do not > see the problem at all - (e.g. after about 3TB of read/write) nothing > happened. > > Any suggestion would be very welcome. > > Marc > > [ .. more .. ] > It appears to be very unpredictable of when it actually occurs, here are > a few examples. Kind of odd that on Aug29th it always happened on the > same second ;-{. > >> syslog.2:Aug 29 17:35:47 nwsc-xen-Q45 kernel: [ 2698.560009] BUG: soft lockup - CPU#0 stuck for 146s! [events/0:9] >> syslog.2:Aug 29 17:35:47 nwsc-xen-Q45 kernel: [ 2698.561016] BUG: soft lockup - CPU#1 stuck for 146s! [rsyslogd:2024] >> syslog.2:Aug 29 22:57:27 nwsc-xen-Q45 kernel: [ 4198.404353] BUG: soft lockup - CPU#0 stuck for 122s! [md1_raid5:1243] >> syslog.2:Aug 29 23:07:27 nwsc-xen-Q45 kernel: [ 4798.336110] BUG: soft lockup - CPU#0 stuck for 101s! [xend:2583] >> syslog.2:Aug 29 23:07:27 nwsc-xen-Q45 kernel: [ 4798.337007] BUG: soft lockup - CPU#1 stuck for 101s! [bdi-default:19] >> syslog.2:Aug 29 23:12:27 nwsc-xen-Q45 kernel: [ 5098.304013] BUG: soft lockup - CPU#0 stuck for 136s! [blkback.5.xvdd1:7226] >> syslog.2:Aug 29 23:12:27 nwsc-xen-Q45 kernel: [ 5098.305010] BUG: soft lockup - CPU#1 stuck for 136s! [sh:7262] >> syslog.6:Aug 17 12:07:08 nwsc-xen-Q45 kernel: [ 2998.596016] BUG: soft lockup - CPU#0 stuck for 73s! [xend:2506] >> syslog.6:Aug 17 12:07:08 nwsc-xen-Q45 kernel: [ 2998.597555] BUG: soft lockup - CPU#1 stuck for 73s! [md0_raid5:598] >> syslog.6:Aug 17 12:17:08 nwsc-xen-Q45 kernel: [ 3598.534068] BUG: soft lockup - CPU#1 stuck for 150s! [xend:2506] > > It does not appear to relate to a specific process. (Those above are > from Xen 4.0.1 with Debian 2.6.32-5-xen-amd64). > > This one is with Xen 4.1.2-rc2-pre/Debian 2.6.32-5-xen-amd64. Both are > on Intel DQ45CB board with 4GB ram. > >> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348062] BUG: soft lockup - CPU#0 stuck for 79s! [xend:2767] >> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348073] Modules linked in: xt_tcpudp xt_physdev iptable_filter ip_tables x_ta bles ext4 jbd2 crc16 sata_sil24 hid_apple sky2 via_velocity crc_ccitt usb_storage raid456 md_mod async_raid6_recov async_ pq raid6_pq async_xor xor async_memcpy async_tx dm_mod ext3 jbd mbcache firewire_sbp2 loop sr_mod cdrom sg xenfs xen_evtc hn bridge stp 3w_9xxx usbhid hid sd_mod crc_t10dif snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device firewire_ohci psmouse i2c_i801 video firewire_core uhci_hcd ata_piix snd crc_itu_t output serio_raw evdev ahci pcspkr ehci_hcd i2c_core usbcor e nls_base e1000e button ata_generic soundcore snd_page_alloc libata thermal scsi_mod processor thermal_sys acpi_processo> r >> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348219] CPU 0: >> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348222] Modules linked in: xt_tcpudp xt_physdev iptable_filter ip_tables x_ta bles ext4 jbd2 crc16 sata_sil24 hid_apple sky2 via_velocity crc_ccitt usb_storage raid456 md_mod async_raid6_recov async_ pq raid6_pq async_xor xor async_memcpy async_tx dm_mod ext3 jbd mbcache firewire_sbp2 loop sr_mod cdrom sg xenfs xen_evtc hn bridge stp 3w_9xxx usbhid hid sd_mod crc_t10dif snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device firewire_ohci psmouse i2c_i801 video firewire_core uhci_hcd ata_piix snd crc_itu_t output serio_raw evdev ahci pcspkr ehci_hcd i2c_core usbcor e nls_base e1000e button ata_generic soundcore snd_page_alloc libata thermal scsi_mod processor thermal_sys acpi_processo> r >> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348318] Pid: 2767, comm: xend Not tainted 2.6.32-5-xen-amd64 #1 >> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348322] RIP: e033:[<00007fa4064c0289>] [<00007fa4064c0289>] 0x7fa4064c0289 >> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348330] RSP: e02b:00007fa402ee54a0 EFLAGS: 00000206 >> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348334] RAX: 0000000001c3a320 RBX: 0000000001f8ace0 RCX: 00007fa40650f844 >> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348338] RDX: ffffffffffffffe0 RSI: 0000000000000000 RDI: 00007fa4067a9e40 >> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348341] RBP: 0000000000000000 R08: 0000000000000008 R09: 0000000000000001 >> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348345] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fa4067a9e40 >> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348349] R13: 00007fa402ee555c R14: 00007fa402ee5548 R15: 00000000ffffffff >> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348356] FS: 00007fa402ee6700(0000) GS:ffff880002995000(0000) knlGS:000000000 0000000 >> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348360] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b >> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348363] CR2: 00007fb2ed832e28 CR3: 00000000bba8e000 CR4: 0000000000002660 >> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348367] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348371] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348375] Call Trace: >> >> Aug 31 13:07:51 nwsc-xen-Q45 init: Id "T1" respawning too fast: disabled for 5 minutes >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Hi, Are you saying this one worked? # in /etc/xen/*.conf extra="clocksource=jiffies" we have the same issue with one of our DomUs (CentOS) thanks Ian -----Original Message----- From: xen-users-bounces@lists.xensource.com [mailto:xen-users-bounces@lists.xensource.com] On Behalf Of Matthias Bannach Sent: 02 September 2011 02:12 To: mbrown@greenmountainservices.com Cc: xen-users@lists.xensource.com Subject: [Xen-users] Re: CPU soft lockup XEN 4.1rc (Solved) All, Ha - finally - solved. Guess google is not the answer, searching the mailing list is. After much frustration I found the following: http://wiki.debian.org/Xen#A.27clocksource.2BAC8-0.3ATimewentbackwards.2 7 based on a post by Marco Marongiu http://my.opera.com/marcomarongiu/blog/2010/08/18/debugging-ntp-again-pa rt-4-and-last For me lockup solution #2 worked: # DomU and Dom0 # in /etc/sysctl.conf clocksource=jiffies independent_wallclock=0 # then sysctl -p # in /etc/xen/*.conf extra="clocksource=jiffies" And voila - no more lockups, nothing with the motherboards (which I thought not to be the cause based on success with non-xen configurations) Not sure if this is a kernel or XEN problem though. Hope this helps others On 8/31/2011 2:42 PM, Mark Brown wrote:> Hello, > > Similar to others I have freezeups on the system, it is consistent > with high IO load. If the system runs (even with multiple) XenU it > does not happen. But I can consistently force the situation to occur. > > Running 4 dd processes dumping 20GB each on a LVM/mdadm soft RAID5 > volume it consistenly crashes in a DomU. Running without XEN I do not > see the problem at all - (e.g. after about 3TB of read/write) nothing > happened. > > Any suggestion would be very welcome. > > Marc > > [ .. more .. ] > It appears to be very unpredictable of when it actually occurs, here > are a few examples. Kind of odd that on Aug29th it always happened on > the same second ;-{. > >> syslog.2:Aug 29 17:35:47 nwsc-xen-Q45 kernel: [ 2698.560009] BUG: >> soft lockup - CPU#0 stuck for 146s! [events/0:9] syslog.2:Aug 29 >> 17:35:47 nwsc-xen-Q45 kernel: [ 2698.561016] BUG: soft lockup - CPU#1>> stuck for 146s! [rsyslogd:2024] syslog.2:Aug 29 22:57:27 nwsc-xen-Q45>> kernel: [ 4198.404353] BUG: soft lockup - CPU#0 stuck for 122s! >> [md1_raid5:1243] syslog.2:Aug 29 23:07:27 nwsc-xen-Q45 kernel: [ >> 4798.336110] BUG: soft lockup - CPU#0 stuck for 101s! [xend:2583] >> syslog.2:Aug 29 23:07:27 nwsc-xen-Q45 kernel: [ 4798.337007] BUG: >> soft lockup - CPU#1 stuck for 101s! [bdi-default:19] syslog.2:Aug 29 >> 23:12:27 nwsc-xen-Q45 kernel: [ 5098.304013] BUG: soft lockup - CPU#0>> stuck for 136s! [blkback.5.xvdd1:7226] syslog.2:Aug 29 23:12:27 >> nwsc-xen-Q45 kernel: [ 5098.305010] BUG: soft lockup - CPU#1 stuck >> for 136s! [sh:7262] syslog.6:Aug 17 12:07:08 nwsc-xen-Q45 kernel: [ >> 2998.596016] BUG: soft lockup - CPU#0 stuck for 73s! [xend:2506] >> syslog.6:Aug 17 12:07:08 nwsc-xen-Q45 kernel: [ 2998.597555] BUG: >> soft lockup - CPU#1 stuck for 73s! [md0_raid5:598] syslog.6:Aug 17 >> 12:17:08 nwsc-xen-Q45 kernel: [ 3598.534068] BUG: soft lockup - CPU#1>> stuck for 150s! [xend:2506] > > It does not appear to relate to a specific process. (Those above are > from Xen 4.0.1 with Debian 2.6.32-5-xen-amd64). > > This one is with Xen 4.1.2-rc2-pre/Debian 2.6.32-5-xen-amd64. Both are> on Intel DQ45CB board with 4GB ram. > >> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348062] BUG: soft lockup- CPU#0 stuck for 79s! [xend:2767]>> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348073] Modules linkedin: xt_tcpudp xt_physdev iptable_filter ip_tables x_ta bles ext4 jbd2 crc16 sata_sil24 hid_apple sky2 via_velocity crc_ccitt usb_storage raid456 md_mod async_raid6_recov async_ pq raid6_pq async_xor xor async_memcpy async_tx dm_mod ext3 jbd mbcache firewire_sbp2 loop sr_mod cdrom sg xenfs xen_evtc hn bridge stp 3w_9xxx usbhid hid sd_mod crc_t10dif snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device firewire_ohci psmouse i2c_i801 video firewire_core uhci_hcd ata_piix snd crc_itu_t output serio_raw evdev ahci pcspkr ehci_hcd i2c_core usbcor e nls_base e1000e button ata_generic soundcore snd_page_alloc libata thermal scsi_mod processor thermal_sys acpi_processo> r >> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348219] CPU 0: >> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348222] Modules linkedin: xt_tcpudp xt_physdev iptable_filter ip_tables x_ta bles ext4 jbd2 crc16 sata_sil24 hid_apple sky2 via_velocity crc_ccitt usb_storage raid456 md_mod async_raid6_recov async_ pq raid6_pq async_xor xor async_memcpy async_tx dm_mod ext3 jbd mbcache firewire_sbp2 loop sr_mod cdrom sg xenfs xen_evtc hn bridge stp 3w_9xxx usbhid hid sd_mod crc_t10dif snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device firewire_ohci psmouse i2c_i801 video firewire_core uhci_hcd ata_piix snd crc_itu_t output serio_raw evdev ahci pcspkr ehci_hcd i2c_core usbcor e nls_base e1000e button ata_generic soundcore snd_page_alloc libata thermal scsi_mod processor thermal_sys acpi_processo> r >> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348318] Pid: 2767, comm: >> xend Not tainted 2.6.32-5-xen-amd64 #1 Aug 31 13:05:41 nwsc-xen-Q45 >> kernel: [ 4039.348322] RIP: e033:[<00007fa4064c0289>] >> [<00007fa4064c0289>] 0x7fa4064c0289 Aug 31 13:05:41 nwsc-xen-Q45 >> kernel: [ 4039.348330] RSP: e02b:00007fa402ee54a0 EFLAGS: 00000206 >> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348334] RAX: >> 0000000001c3a320 RBX: 0000000001f8ace0 RCX: 00007fa40650f844 Aug 31 >> 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348338] RDX: ffffffffffffffe0RSI: 0000000000000000 RDI: 00007fa4067a9e40 Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348341] RBP: 0000000000000000 R08: 0000000000000008 R09: 0000000000000001 Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348345] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fa4067a9e40 Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348349] R13: 00007fa402ee555c R14: 00007fa402ee5548 R15: 00000000ffffffff>> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348356] FS:00007fa402ee6700(0000) GS:ffff880002995000(0000) knlGS:000000000 0000000>> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348360] CS: e033 DS: >> 0000 ES: 0000 CR0: 000000008005003b Aug 31 13:05:41 nwsc-xen-Q45 >> kernel: [ 4039.348363] CR2: 00007fb2ed832e28 CR3: 00000000bba8e000 >> CR4: 0000000000002660 Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ >> 4039.348367] DR0: 0000000000000000 DR1: 0000000000000000 DR2:0000000000000000 Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348371] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348375] Call Trace:>> >> Aug 31 13:07:51 nwsc-xen-Q45 init: Id "T1" respawning too fast: >> disabled for 5 minutes >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Ian, yes - it does. Usually the DomU would crash after about 4-20 GB of heavy IO. After the changed configuration (see below) I was able to transfer > 1TB of data and it yet has to crash. My guess is that somehow the clock-time gets affected by some (?marginal) value and causes the lockup. Thanks a lot to Marco Marongiu for the detailed and well written post. Marc On 9/2/2011 5:57 AM, Ian Tobin wrote:> Hi, > > Are you saying this one worked? > > # in /etc/xen/*.conf > extra="clocksource=jiffies" > > we have the same issue with one of our DomUs (CentOS) > > thanks > > Ian > > > > -----Original Message----- > From: xen-users-bounces@lists.xensource.com > [mailto:xen-users-bounces@lists.xensource.com] On Behalf Of Matthias > Bannach > Sent: 02 September 2011 02:12 > To: mbrown@greenmountainservices.com > Cc: xen-users@lists.xensource.com > Subject: [Xen-users] Re: CPU soft lockup XEN 4.1rc (Solved) > > All, > > Ha - finally - solved. Guess google is not the answer, searching the > mailing list is. After much frustration I found the following: > > http://wiki.debian.org/Xen#A.27clocksource.2BAC8-0.3ATimewentbackwards.2 > 7 > > based on a post by Marco Marongiu > > http://my.opera.com/marcomarongiu/blog/2010/08/18/debugging-ntp-again-pa > rt-4-and-last > > For me lockup solution #2 worked: > > # DomU and Dom0 > # in /etc/sysctl.conf > clocksource=jiffies > independent_wallclock=0 > # then sysctl -p > > # in /etc/xen/*.conf > extra="clocksource=jiffies" > > And voila - no more lockups, nothing with the motherboards (which I > thought not to be the cause based on success with non-xen > configurations) > > Not sure if this is a kernel or XEN problem though. > > Hope this helps others > > On 8/31/2011 2:42 PM, Mark Brown wrote: >> Hello, >> >> Similar to others I have freezeups on the system, it is consistent >> with high IO load. If the system runs (even with multiple) XenU it >> does not happen. But I can consistently force the situation to occur. >> >> Running 4 dd processes dumping 20GB each on a LVM/mdadm soft RAID5 >> volume it consistenly crashes in a DomU. Running without XEN I do not >> see the problem at all - (e.g. after about 3TB of read/write) nothing >> happened. >> >> Any suggestion would be very welcome. >> >> Marc >> >> [ .. more .. ] >> It appears to be very unpredictable of when it actually occurs, here >> are a few examples. Kind of odd that on Aug29th it always happened on >> the same second ;-{. >> >>> syslog.2:Aug 29 17:35:47 nwsc-xen-Q45 kernel: [ 2698.560009] BUG: >>> soft lockup - CPU#0 stuck for 146s! [events/0:9] syslog.2:Aug 29 >>> 17:35:47 nwsc-xen-Q45 kernel: [ 2698.561016] BUG: soft lockup - CPU#1 > >>> stuck for 146s! [rsyslogd:2024] syslog.2:Aug 29 22:57:27 nwsc-xen-Q45 > >>> kernel: [ 4198.404353] BUG: soft lockup - CPU#0 stuck for 122s! >>> [md1_raid5:1243] syslog.2:Aug 29 23:07:27 nwsc-xen-Q45 kernel: [ >>> 4798.336110] BUG: soft lockup - CPU#0 stuck for 101s! [xend:2583] >>> syslog.2:Aug 29 23:07:27 nwsc-xen-Q45 kernel: [ 4798.337007] BUG: >>> soft lockup - CPU#1 stuck for 101s! [bdi-default:19] syslog.2:Aug 29 >>> 23:12:27 nwsc-xen-Q45 kernel: [ 5098.304013] BUG: soft lockup - CPU#0 > >>> stuck for 136s! [blkback.5.xvdd1:7226] syslog.2:Aug 29 23:12:27 >>> nwsc-xen-Q45 kernel: [ 5098.305010] BUG: soft lockup - CPU#1 stuck >>> for 136s! [sh:7262] syslog.6:Aug 17 12:07:08 nwsc-xen-Q45 kernel: [ >>> 2998.596016] BUG: soft lockup - CPU#0 stuck for 73s! [xend:2506] >>> syslog.6:Aug 17 12:07:08 nwsc-xen-Q45 kernel: [ 2998.597555] BUG: >>> soft lockup - CPU#1 stuck for 73s! [md0_raid5:598] syslog.6:Aug 17 >>> 12:17:08 nwsc-xen-Q45 kernel: [ 3598.534068] BUG: soft lockup - CPU#1 > >>> stuck for 150s! [xend:2506] >> >> It does not appear to relate to a specific process. (Those above are >> from Xen 4.0.1 with Debian 2.6.32-5-xen-amd64). >> >> This one is with Xen 4.1.2-rc2-pre/Debian 2.6.32-5-xen-amd64. Both are > >> on Intel DQ45CB board with 4GB ram. >> >>> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348062] BUG: soft lockup > - CPU#0 stuck for 79s! [xend:2767] >>> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348073] Modules linked > in: xt_tcpudp xt_physdev iptable_filter ip_tables x_ta > bles ext4 jbd2 crc16 sata_sil24 hid_apple sky2 via_velocity crc_ccitt > usb_storage raid456 md_mod async_raid6_recov async_ > pq raid6_pq async_xor xor async_memcpy async_tx dm_mod ext3 jbd mbcache > firewire_sbp2 loop sr_mod cdrom sg xenfs xen_evtc hn > bridge stp 3w_9xxx usbhid hid sd_mod crc_t10dif snd_hda_codec_analog > snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss > snd_mixer_oss snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event > snd_seq snd_timer snd_seq_device firewire_ohci psmouse > i2c_i801 video firewire_core uhci_hcd ata_piix snd crc_itu_t output > serio_raw evdev ahci pcspkr ehci_hcd i2c_core usbcor > e nls_base e1000e button ata_generic soundcore snd_page_alloc libata > thermal scsi_mod processor thermal_sys acpi_processo > >> r >>> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348219] CPU 0: >>> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348222] Modules linked > in: xt_tcpudp xt_physdev iptable_filter ip_tables x_ta > bles ext4 jbd2 crc16 sata_sil24 hid_apple sky2 via_velocity crc_ccitt > usb_storage raid456 md_mod async_raid6_recov async_ > pq raid6_pq async_xor xor async_memcpy async_tx dm_mod ext3 jbd mbcache > firewire_sbp2 loop sr_mod cdrom sg xenfs xen_evtc hn > bridge stp 3w_9xxx usbhid hid sd_mod crc_t10dif snd_hda_codec_analog > snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss > snd_mixer_oss snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event > snd_seq snd_timer snd_seq_device firewire_ohci psmouse > i2c_i801 video firewire_core uhci_hcd ata_piix snd crc_itu_t output > serio_raw evdev ahci pcspkr ehci_hcd i2c_core usbcor > e nls_base e1000e button ata_generic soundcore snd_page_alloc libata > thermal scsi_mod processor thermal_sys acpi_processo > >> r >>> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348318] Pid: 2767, comm: >>> xend Not tainted 2.6.32-5-xen-amd64 #1 Aug 31 13:05:41 nwsc-xen-Q45 >>> kernel: [ 4039.348322] RIP: e033:[<00007fa4064c0289>] >>> [<00007fa4064c0289>] 0x7fa4064c0289 Aug 31 13:05:41 nwsc-xen-Q45 >>> kernel: [ 4039.348330] RSP: e02b:00007fa402ee54a0 EFLAGS: 00000206 >>> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348334] RAX: >>> 0000000001c3a320 RBX: 0000000001f8ace0 RCX: 00007fa40650f844 Aug 31 >>> 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348338] RDX: ffffffffffffffe0 > RSI: 0000000000000000 RDI: 00007fa4067a9e40 Aug 31 13:05:41 nwsc-xen-Q45 > kernel: [ 4039.348341] RBP: 0000000000000000 R08: 0000000000000008 R09: > 0000000000000001 Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348345] > R10: 0000000000000000 R11: 0000000000000246 R12: 00007fa4067a9e40 Aug 31 > 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348349] R13: 00007fa402ee555c R14: > 00007fa402ee5548 R15: 00000000ffffffff >>> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348356] FS: > 00007fa402ee6700(0000) GS:ffff880002995000(0000) knlGS:000000000 > 0000000 >>> Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348360] CS: e033 DS: >>> 0000 ES: 0000 CR0: 000000008005003b Aug 31 13:05:41 nwsc-xen-Q45 >>> kernel: [ 4039.348363] CR2: 00007fb2ed832e28 CR3: 00000000bba8e000 >>> CR4: 0000000000002660 Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ >>> 4039.348367] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 Aug 31 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348371] > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Aug 31 > 13:05:41 nwsc-xen-Q45 kernel: [ 4039.348375] Call Trace: >>> >>> Aug 31 13:07:51 nwsc-xen-Q45 init: Id "T1" respawning too fast: >>> disabled for 5 minutes >> > > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users > > >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users