Thaddeus Hogan
2009-Nov-05 16:01 UTC
[Xen-users] Dom0 Crash - High Network Load - Debian Lenny Xen 3.2-1
I''m looking for a place to start with a problem I am having where I think high network load is crashing my Xen host. Any help you can offer is greatly appreciated! I started having an issue with my dom0 crashing when under very high network load. I discovered this when I ran a large backup (1.7 TB) on a domU. I ran the backup with my Bacula setup that I have been using for over a year. The client is a domU. The dom0 on that host is running the Bacula storage daemon, which accepts data for backup over the network and writes it to some locally attached device. In my case the dom0 has two eSATA attached drives that are used for backups and all domUs on that host write their backups over the network to that storage. When I ran the backup I was sustaining about 500mbit from the domU to the dom0. After 4.5 hours Nagios reported that the whole host and all domUs had dropped off the network. When I looked at the console on the host it was hung, the screen was blank, and I couldn''t backscroll or see any console messages. The next day I tried the backup again. I was sustaining about 500mbit of network traffic from the DomU to the dom0 again, and after 45 minutes the host crashed. I had the console already connected to a KVM and was able to look at it immediately. Again the screen was blank and there were no console messages accessible. So, here is the setup: The Xen host is a Debian Lenny system running Xen 3.2-1: vast:~# uname -a Linux vast 2.6.26-2-xen-amd64 #1 SMP Fri May 29 00:30:34 UTC 2009 x86_64 GNU/Linux This host is on consumer grade hardware, an AMD Athlon x2 4450e on a board with an AMD SB700/SM800 chipset. PCI manifest: 00:00.0 Host bridge: ATI Technologies Inc RX780/RX790 Chipset Host Bridge 00:02.0 PCI bridge: ATI Technologies Inc RD790 PCI to PCI bridge (external gfx0 port A) 00:09.0 PCI bridge: ATI Technologies Inc RD790 PCI to PCI bridge (PCI express gpp port E) 00:0a.0 PCI bridge: ATI Technologies Inc RD790 PCI to PCI bridge (PCI express gpp port F) 00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA Controller [AHCI mode] 00:12.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller 00:12.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller 00:12.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller 00:13.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller 00:13.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller 00:13.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller 00:14.0 SMBus: ATI Technologies Inc SBx00 SMBus Controller (rev 3a) 00:14.1 IDE interface: ATI Technologies Inc SB700/SB800 IDE Controller 00:14.2 Audio device: ATI Technologies Inc SBx00 Azalia (Intel HDA) 00:14.3 ISA bridge: ATI Technologies Inc SB700/SB800 LPC host controller 00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge 00:14.5 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI2 Controller 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 01:00.0 VGA compatible controller: nVidia Corporation G72 [GeForce 7300 SE] (rev a1) 02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 01) 03:00.0 SATA controller: JMicron Technologies, Inc. JMicron 20360/20363 AHCI Controller (rev 03) 03:00.1 IDE interface: JMicron Technologies, Inc. JMicron 20360/20363 AHCI Controller (rev 03) 04:05.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10) Networking Configuration: Notes: br0 is an internal VLAN. DomUs are bridged to it as well as a physical network connection. vast:~# ifconfig -a br0 Link encap:Ethernet HWaddr 00:1f:e2:0a:e3:05 inet addr:192.168.1.3 Bcast:192.168.1.255 Mask:255.255.255.0 inet6 addr: fe80::21f:e2ff:fe0a:e305/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:451568 errors:0 dropped:0 overruns:0 frame:0 TX packets:235845 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:92830085 (88.5 MiB) TX bytes:44937509 (42.8 MiB) br1 Link encap:Ethernet HWaddr 00:1e:2a:cc:60:c5 inet6 addr: fe80::21e:2aff:fecc:60c5/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:3897027 errors:0 dropped:0 overruns:0 frame:0 TX packets:6 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:193067739 (184.1 MiB) TX bytes:468 (468.0 B) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:13725 errors:0 dropped:0 overruns:0 frame:0 TX packets:13725 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:833520 (813.9 KiB) TX bytes:833520 (813.9 KiB) peth0 Link encap:Ethernet HWaddr 00:1f:e2:0a:e3:05 inet6 addr: fe80::21f:e2ff:fe0a:e305/64 Scope:Link UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1 RX packets:5924446 errors:0 dropped:0 overruns:0 frame:0 TX packets:8362451 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1496704915 (1.3 GiB) TX bytes:9190427081 (8.5 GiB) Interrupt:17 Base address:0xc000 peth1 Link encap:Ethernet HWaddr 00:1e:2a:cc:60:c5 inet6 addr: fe80::21e:2aff:fecc:60c5/64 Scope:Link UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1 RX packets:15374016 errors:0 dropped:0 overruns:0 frame:0 TX packets:8330546 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:12532694766 (11.6 GiB) TX bytes:1340211710 (1.2 GiB) Interrupt:20 Base address:0xec00 veth0 Link encap:Ethernet HWaddr 00:00:00:00:00:00 BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) veth1 Link encap:Ethernet HWaddr 00:00:00:00:00:00 BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) veth2 Link encap:Ethernet HWaddr 00:00:00:00:00:00 BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) veth3 Link encap:Ethernet HWaddr 00:00:00:00:00:00 BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) vif0.0 Link encap:Ethernet HWaddr fe:ff:ff:ff:ff:ff BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) vif0.1 Link encap:Ethernet HWaddr fe:ff:ff:ff:ff:ff BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) vif0.2 Link encap:Ethernet HWaddr fe:ff:ff:ff:ff:ff BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) vif0.3 Link encap:Ethernet HWaddr fe:ff:ff:ff:ff:ff BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) vif1.0 Link encap:Ethernet HWaddr fe:ff:ff:ff:ff:ff inet6 addr: fe80::fcff:ffff:feff:ffff/64 Scope:Link UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1 RX packets:8227355 errors:0 dropped:0 overruns:0 frame:0 TX packets:15373213 errors:0 dropped:110 overruns:0 carrier:0 collisions:0 txqueuelen:32 RX bytes:1202074959 (1.1 GiB) TX bytes:12531316572 (11.6 GiB) vif1.1 Link encap:Ethernet HWaddr fe:ff:ff:ff:ff:ff inet6 addr: fe80::fcff:ffff:feff:ffff/64 Scope:Link UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1 RX packets:11546727 errors:0 dropped:0 overruns:0 frame:0 TX packets:8314451 errors:0 dropped:48 overruns:0 carrier:0 collisions:0 txqueuelen:32 RX bytes:12080578087 (11.2 GiB) TX bytes:1308547773 (1.2 GiB) vif2.0 Link encap:Ethernet HWaddr fe:ff:ff:ff:ff:ff inet6 addr: fe80::fcff:ffff:feff:ffff/64 Scope:Link UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1 RX packets:2172180 errors:0 dropped:0 overruns:0 frame:0 TX packets:3769946 errors:0 dropped:25 overruns:0 carrier:0 collisions:0 txqueuelen:32 RX bytes:1366566815 (1.2 GiB) TX bytes:4641322133 (4.3 GiB) vif3.0 Link encap:Ethernet HWaddr fe:ff:ff:ff:ff:ff inet6 addr: fe80::fcff:ffff:feff:ffff/64 Scope:Link UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1 RX packets:171483 errors:0 dropped:0 overruns:0 frame:0 TX packets:384392 errors:0 dropped:57 overruns:0 carrier:0 collisions:0 txqueuelen:32 RX bytes:24409539 (23.2 MiB) TX bytes:80670352 (76.9 MiB) vif4.0 Link encap:Ethernet HWaddr fe:ff:ff:ff:ff:ff inet6 addr: fe80::fcff:ffff:feff:ffff/64 Scope:Link UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1 RX packets:1930732 errors:0 dropped:0 overruns:0 frame:0 TX packets:2111307 errors:0 dropped:64 overruns:0 carrier:0 collisions:0 txqueuelen:32 RX bytes:291907043 (278.3 MiB) TX bytes:422220590 (402.6 MiB) Backup client domU configuration: vast:~# cat /etc/xen/vm_store.cfg name = ''store'' kernel = ''/vm/vmboot/u904_64/vmlinuz-2.6.28-15-server'' ramdisk = ''/vm/vmboot/u904_64/initrd.img-2.6.28-15-server'' root = ''/dev/xvda ro'' vcpus = 1 memory = 1024 disk = [ ''phy:/dev/sysvg/vm_store_root,xvda,w'', ''phy:/dev/sysvg/vm_store_swap,xvdb,w'', ''phy:/dev/datavg/datalv,xvdc,w'', ''phy:/dev/sysvg/bananaw7lv,xvde,w'' ] vif = [ ''bridge=br0,mac=00:16:3e:00:00:01'' ] on_shutdown = ''destroy'' on_reboot = ''restart'' on_crash = ''rename-restart'' Dom0 Module List: vast:~# lsmod Module Size Used by nf_conntrack_ipv4 19352 2 xt_state 6656 2 nf_conntrack 71440 2 nf_conntrack_ipv4,xt_state xt_physdev 6928 6 iptable_filter 7424 1 ip_tables 21264 1 iptable_filter x_tables 25096 3 xt_state,xt_physdev,ip_tables bridge 53800 0 netloop 9088 0 crc32c 6400 0 libcrc32c 7168 1 crc32c ipv6 289352 22 ib_iser 34296 0 rdma_cm 31860 1 ib_iser ib_cm 41512 1 rdma_cm iw_cm 13448 1 rdma_cm ib_sa 25344 2 rdma_cm,ib_cm ib_mad 39208 2 ib_cm,ib_sa ib_core 59392 6 ib_iser,rdma_cm,ib_cm,iw_cm,ib_sa,ib_mad ib_addr 10888 1 rdma_cm iscsi_tcp 21764 0 libiscsi 32768 2 ib_iser,iscsi_tcp scsi_transport_iscsi 36256 4 ib_iser,iscsi_tcp,libiscsi fuse 54464 1 loop 19724 0 serio_raw 9860 0 psmouse 42396 0 parport_pc 31016 0 parport 42416 1 parport_pc pcspkr 7040 0 k8temp 9088 0 snd_hda_intel 436824 0 snd_pcm 83720 1 snd_hda_intel snd_timer 26640 1 snd_pcm snd 64072 3 snd_hda_intel,snd_pcm,snd_timer soundcore 12192 1 snd snd_page_alloc 13072 2 snd_hda_intel,snd_pcm i2c_piix4 13072 0 i2c_core 27936 1 i2c_piix4 button 11680 0 evdev 14592 0 ext3 125328 5 jbd 54568 1 ext3 mbcache 13188 1 ext3 dm_mirror 21120 0 dm_log 14212 1 dm_mirror dm_snapshot 19400 2 dm_mod 59376 49 dm_mirror,dm_log,dm_snapshot raid456 127008 1 async_xor 8448 1 raid456 async_memcpy 6912 1 raid456 async_tx 11764 3 raid456,async_xor,async_memcpy xor 10384 2 raid456,async_xor raid1 24576 1 md_mod 81700 4 raid456,raid1 ide_cd_mod 36360 0 cdrom 37928 1 ide_cd_mod ide_disk 16640 6 sd_mod 29376 8 atiixp 8324 0 [permanent] jmicron 6912 0 [permanent] ide_pci_generic 9220 0 [permanent] ide_core 129308 5 ide_cd_mod,ide_disk,atiixp,jmicron,ide_pci_generic ehci_hcd 36492 0 ata_generic 10116 0 ahci 33164 4 ohci_hcd 25732 0 libata 165728 2 ata_generic,ahci scsi_mod 161272 6 ib_iser,iscsi_tcp,libiscsi,scsi_transport_iscsi,sd_mod,libata dock 14240 1 libata r8169 31748 0 thermal 22816 0 processor 42436 1 thermal fan 9352 0 thermal_sys 17728 3 thermal,processor,fan _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Pasi Kärkkäinen
2009-Nov-05 18:35 UTC
Re: [Xen-users] Dom0 Crash - High Network Load - Debian Lenny Xen 3.2-1
On Thu, Nov 05, 2009 at 10:01:37AM -0600, Thaddeus Hogan wrote:> > I''m looking for a place to start with a problem I am having where I think > high network load is crashing my Xen host. Any help you can offer is > greatly appreciated! > > I started having an issue with my dom0 crashing when under very high > network load. I discovered this when I ran a large backup (1.7 TB) on a > domU. > > I ran the backup with my Bacula setup that I have been using for over a > year. The client is a domU. The dom0 on that host is running the Bacula > storage daemon, which accepts data for backup over the network and writes > it to some locally attached device. In my case the dom0 has two eSATA > attached drives that are used for backups and all domUs on that host write > their backups over the network to that storage. > > When I ran the backup I was sustaining about 500mbit from the domU to the > dom0. After 4.5 hours Nagios reported that the whole host and all domUs had > dropped off the network. When I looked at the console on the host it was > hung, the screen was blank, and I couldn''t backscroll or see any console > messages. > > The next day I tried the backup again. I was sustaining about 500mbit of > network traffic from the DomU to the dom0 again, and after 45 minutes the > host crashed. I had the console already connected to a KVM and was able to > look at it immediately. Again the screen was blank and there were no > console messages accessible. >Set up a serial console so you can capture the (error/crash) messages.. -- Pasi _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Andrea Janna
2010-Feb-12 22:06 UTC
Re: [Xen-users] Dom0 Crash - High Network Load - Debian Lenny Xen 3.2-1
-------- Original message -------- From: Thaddeus Hogan <thaddeus@thogan.com> Date: 05/11/2009 17.01> I''m looking for a place to start with a problem I am having where I think > high network load is crashing my Xen host. Any help you can offer is > greatly appreciated! > > I started having an issue with my dom0 crashing when under very high > network load. I discovered this when I ran a large backup (1.7 TB) on a > domU. >I had a similar problem last month. I''m running a Debian Lenny dom0 with 3 Lenny domUs. Kernel 2.6.26 and all software packages are Lenny releases. I use Bacula for backups on a DAT 72 tape device. Dom0 is running Bacula storage daemon, which manage the DAT device itself. If I run a domU backup (Bacula file daemon running in domU and sending data to dom0 over IP, Xen bridged networking) the system becomes unstable after several minutes of backup. Sometimes dom0 crashes and reboots. Sometimes domUs IP network stops working. If I don''t perform domU backup the system is stable. If I run a backup of another computer (Bacula file daemon running on a Windows box), Lenny computer remains stable. So I suppose issue is related to higher disk activity when Bacula file daemon is running in domU. Dom0 and DomUs share the same physical disk, a SATA soft raid5 array. When I run a backup of a Windows computer data is sent to Bacula storage daemon in dom0 via IP ethernet network and Bacula storage daemon writes that data on tape. I didn''t have time to investigate further, cause I needed the system working for production. I solved this issue installing Xen 3.4 (back-ported from Debian Squeeze) and Suse Linux Enterprise 11 kernel (http://wiki.xensource.com/xenwiki/XenDom0Kernels) on dom0 only. Be aware that on Suse kernel some Xen networking features are compiled as modules, so you need to load them before starting domUs. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Pasi Kärkkäinen
2010-Feb-16 11:57 UTC
Re: [Xen-users] Dom0 Crash - High Network Load - Debian Lenny Xen 3.2-1
On Fri, Feb 12, 2010 at 11:06:59PM +0100, Andrea Janna wrote:> -------- Original message -------- > From: Thaddeus Hogan <thaddeus@thogan.com> > Date: 05/11/2009 17.01 >> I''m looking for a place to start with a problem I am having where I think >> high network load is crashing my Xen host. Any help you can offer is >> greatly appreciated! >> >> I started having an issue with my dom0 crashing when under very high >> network load. I discovered this when I ran a large backup (1.7 TB) on a >> domU. >> > > I had a similar problem last month. I''m running a Debian Lenny dom0 with > 3 Lenny domUs. Kernel 2.6.26 and all software packages are Lenny > releases. > I use Bacula for backups on a DAT 72 tape device. Dom0 is running Bacula > storage daemon, which manage the DAT device itself. > If I run a domU backup (Bacula file daemon running in domU and sending > data to dom0 over IP, Xen bridged networking) the system becomes > unstable after several minutes of backup. Sometimes dom0 crashes and > reboots. Sometimes domUs IP network stops working. > If I don''t perform domU backup the system is stable. > If I run a backup of another computer (Bacula file daemon running on a > Windows box), Lenny computer remains stable. So I suppose issue is > related to higher disk activity when Bacula file daemon is running in > domU. Dom0 and DomUs share the same physical disk, a SATA soft raid5 > array. When I run a backup of a Windows computer data is sent to Bacula > storage daemon in dom0 via IP ethernet network and Bacula storage daemon > writes that data on tape. > I didn''t have time to investigate further, cause I needed the system > working for production.Wondering if these would have helped: http://wiki.xensource.com/xenwiki/XenBestPractices> I solved this issue installing Xen 3.4 (back-ported from Debian Squeeze) > and Suse Linux Enterprise 11 kernel > (http://wiki.xensource.com/xenwiki/XenDom0Kernels) on dom0 only. > Be aware that on Suse kernel some Xen networking features are compiled > as modules, so you need to load them before starting domUs. >Yeah, SLES11 Xen kernel should be much more stable than lenny''s. -- Pasi _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users