Thaddeus Hogan
2009-Nov-05 16:01 UTC
[Xen-users] Dom0 Crash - High Network Load - Debian Lenny Xen 3.2-1
I''m looking for a place to start with a problem I am having where I
think
high network load is crashing my Xen host. Any help you can offer is
greatly appreciated!
I started having an issue with my dom0 crashing when under very high
network load. I discovered this when I ran a large backup (1.7 TB) on a
domU.
I ran the backup with my Bacula setup that I have been using for over a
year. The client is a domU. The dom0 on that host is running the Bacula
storage daemon, which accepts data for backup over the network and writes
it to some locally attached device. In my case the dom0 has two eSATA
attached drives that are used for backups and all domUs on that host write
their backups over the network to that storage.
When I ran the backup I was sustaining about 500mbit from the domU to the
dom0. After 4.5 hours Nagios reported that the whole host and all domUs had
dropped off the network. When I looked at the console on the host it was
hung, the screen was blank, and I couldn''t backscroll or see any
console
messages.
The next day I tried the backup again. I was sustaining about 500mbit of
network traffic from the DomU to the dom0 again, and after 45 minutes the
host crashed. I had the console already connected to a KVM and was able to
look at it immediately. Again the screen was blank and there were no
console messages accessible.
So, here is the setup:
The Xen host is a Debian Lenny system running Xen 3.2-1:
vast:~# uname -a
Linux vast 2.6.26-2-xen-amd64 #1 SMP Fri May 29 00:30:34 UTC 2009 x86_64
GNU/Linux
This host is on consumer grade hardware, an AMD Athlon x2 4450e on a board
with an AMD SB700/SM800 chipset.
PCI manifest:
00:00.0 Host bridge: ATI Technologies Inc RX780/RX790 Chipset Host Bridge
00:02.0 PCI bridge: ATI Technologies Inc RD790 PCI to PCI bridge (external
gfx0 port A)
00:09.0 PCI bridge: ATI Technologies Inc RD790 PCI to PCI bridge (PCI
express gpp port E)
00:0a.0 PCI bridge: ATI Technologies Inc RD790 PCI to PCI bridge (PCI
express gpp port F)
00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA Controller
[AHCI mode]
00:12.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0
Controller
00:12.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller
00:12.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI
Controller
00:13.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0
Controller
00:13.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller
00:13.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI
Controller
00:14.0 SMBus: ATI Technologies Inc SBx00 SMBus Controller (rev 3a)
00:14.1 IDE interface: ATI Technologies Inc SB700/SB800 IDE Controller
00:14.2 Audio device: ATI Technologies Inc SBx00 Azalia (Intel HDA)
00:14.3 ISA bridge: ATI Technologies Inc SB700/SB800 LPC host controller
00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge
00:14.5 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI2
Controller
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
Miscellaneous Control
01:00.0 VGA compatible controller: nVidia Corporation G72 [GeForce 7300
SE] (rev a1)
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B
PCI Express Gigabit Ethernet controller (rev 01)
03:00.0 SATA controller: JMicron Technologies, Inc. JMicron 20360/20363
AHCI Controller (rev 03)
03:00.1 IDE interface: JMicron Technologies, Inc. JMicron 20360/20363 AHCI
Controller (rev 03)
04:05.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169
Gigabit Ethernet (rev 10)
Networking Configuration:
Notes: br0 is an internal VLAN. DomUs are bridged to it as well as a
physical network connection.
vast:~# ifconfig -a
br0 Link encap:Ethernet HWaddr 00:1f:e2:0a:e3:05
inet addr:192.168.1.3 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::21f:e2ff:fe0a:e305/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:451568 errors:0 dropped:0 overruns:0 frame:0
TX packets:235845 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:92830085 (88.5 MiB) TX bytes:44937509 (42.8 MiB)
br1 Link encap:Ethernet HWaddr 00:1e:2a:cc:60:c5
inet6 addr: fe80::21e:2aff:fecc:60c5/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:3897027 errors:0 dropped:0 overruns:0 frame:0
TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:193067739 (184.1 MiB) TX bytes:468 (468.0 B)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:13725 errors:0 dropped:0 overruns:0 frame:0
TX packets:13725 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:833520 (813.9 KiB) TX bytes:833520 (813.9 KiB)
peth0 Link encap:Ethernet HWaddr 00:1f:e2:0a:e3:05
inet6 addr: fe80::21f:e2ff:fe0a:e305/64 Scope:Link
UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1
RX packets:5924446 errors:0 dropped:0 overruns:0 frame:0
TX packets:8362451 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1496704915 (1.3 GiB) TX bytes:9190427081 (8.5 GiB)
Interrupt:17 Base address:0xc000
peth1 Link encap:Ethernet HWaddr 00:1e:2a:cc:60:c5
inet6 addr: fe80::21e:2aff:fecc:60c5/64 Scope:Link
UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1
RX packets:15374016 errors:0 dropped:0 overruns:0 frame:0
TX packets:8330546 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:12532694766 (11.6 GiB) TX bytes:1340211710 (1.2 GiB)
Interrupt:20 Base address:0xec00
veth0 Link encap:Ethernet HWaddr 00:00:00:00:00:00
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
veth1 Link encap:Ethernet HWaddr 00:00:00:00:00:00
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
veth2 Link encap:Ethernet HWaddr 00:00:00:00:00:00
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
veth3 Link encap:Ethernet HWaddr 00:00:00:00:00:00
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
vif0.0 Link encap:Ethernet HWaddr fe:ff:ff:ff:ff:ff
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
vif0.1 Link encap:Ethernet HWaddr fe:ff:ff:ff:ff:ff
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
vif0.2 Link encap:Ethernet HWaddr fe:ff:ff:ff:ff:ff
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
vif0.3 Link encap:Ethernet HWaddr fe:ff:ff:ff:ff:ff
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
vif1.0 Link encap:Ethernet HWaddr fe:ff:ff:ff:ff:ff
inet6 addr: fe80::fcff:ffff:feff:ffff/64 Scope:Link
UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1
RX packets:8227355 errors:0 dropped:0 overruns:0 frame:0
TX packets:15373213 errors:0 dropped:110 overruns:0 carrier:0
collisions:0 txqueuelen:32
RX bytes:1202074959 (1.1 GiB) TX bytes:12531316572 (11.6 GiB)
vif1.1 Link encap:Ethernet HWaddr fe:ff:ff:ff:ff:ff
inet6 addr: fe80::fcff:ffff:feff:ffff/64 Scope:Link
UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1
RX packets:11546727 errors:0 dropped:0 overruns:0 frame:0
TX packets:8314451 errors:0 dropped:48 overruns:0 carrier:0
collisions:0 txqueuelen:32
RX bytes:12080578087 (11.2 GiB) TX bytes:1308547773 (1.2 GiB)
vif2.0 Link encap:Ethernet HWaddr fe:ff:ff:ff:ff:ff
inet6 addr: fe80::fcff:ffff:feff:ffff/64 Scope:Link
UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1
RX packets:2172180 errors:0 dropped:0 overruns:0 frame:0
TX packets:3769946 errors:0 dropped:25 overruns:0 carrier:0
collisions:0 txqueuelen:32
RX bytes:1366566815 (1.2 GiB) TX bytes:4641322133 (4.3 GiB)
vif3.0 Link encap:Ethernet HWaddr fe:ff:ff:ff:ff:ff
inet6 addr: fe80::fcff:ffff:feff:ffff/64 Scope:Link
UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1
RX packets:171483 errors:0 dropped:0 overruns:0 frame:0
TX packets:384392 errors:0 dropped:57 overruns:0 carrier:0
collisions:0 txqueuelen:32
RX bytes:24409539 (23.2 MiB) TX bytes:80670352 (76.9 MiB)
vif4.0 Link encap:Ethernet HWaddr fe:ff:ff:ff:ff:ff
inet6 addr: fe80::fcff:ffff:feff:ffff/64 Scope:Link
UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1
RX packets:1930732 errors:0 dropped:0 overruns:0 frame:0
TX packets:2111307 errors:0 dropped:64 overruns:0 carrier:0
collisions:0 txqueuelen:32
RX bytes:291907043 (278.3 MiB) TX bytes:422220590 (402.6 MiB)
Backup client domU configuration:
vast:~# cat /etc/xen/vm_store.cfg
name = ''store''
kernel = ''/vm/vmboot/u904_64/vmlinuz-2.6.28-15-server''
ramdisk = ''/vm/vmboot/u904_64/initrd.img-2.6.28-15-server''
root = ''/dev/xvda ro''
vcpus = 1
memory = 1024
disk = [ ''phy:/dev/sysvg/vm_store_root,xvda,w'',
''phy:/dev/sysvg/vm_store_swap,xvdb,w'',
''phy:/dev/datavg/datalv,xvdc,w'',
''phy:/dev/sysvg/bananaw7lv,xvde,w'' ]
vif = [ ''bridge=br0,mac=00:16:3e:00:00:01'' ]
on_shutdown = ''destroy''
on_reboot = ''restart''
on_crash = ''rename-restart''
Dom0 Module List:
vast:~# lsmod
Module Size Used by
nf_conntrack_ipv4 19352 2
xt_state 6656 2
nf_conntrack 71440 2 nf_conntrack_ipv4,xt_state
xt_physdev 6928 6
iptable_filter 7424 1
ip_tables 21264 1 iptable_filter
x_tables 25096 3 xt_state,xt_physdev,ip_tables
bridge 53800 0
netloop 9088 0
crc32c 6400 0
libcrc32c 7168 1 crc32c
ipv6 289352 22
ib_iser 34296 0
rdma_cm 31860 1 ib_iser
ib_cm 41512 1 rdma_cm
iw_cm 13448 1 rdma_cm
ib_sa 25344 2 rdma_cm,ib_cm
ib_mad 39208 2 ib_cm,ib_sa
ib_core 59392 6 ib_iser,rdma_cm,ib_cm,iw_cm,ib_sa,ib_mad
ib_addr 10888 1 rdma_cm
iscsi_tcp 21764 0
libiscsi 32768 2 ib_iser,iscsi_tcp
scsi_transport_iscsi 36256 4 ib_iser,iscsi_tcp,libiscsi
fuse 54464 1
loop 19724 0
serio_raw 9860 0
psmouse 42396 0
parport_pc 31016 0
parport 42416 1 parport_pc
pcspkr 7040 0
k8temp 9088 0
snd_hda_intel 436824 0
snd_pcm 83720 1 snd_hda_intel
snd_timer 26640 1 snd_pcm
snd 64072 3 snd_hda_intel,snd_pcm,snd_timer
soundcore 12192 1 snd
snd_page_alloc 13072 2 snd_hda_intel,snd_pcm
i2c_piix4 13072 0
i2c_core 27936 1 i2c_piix4
button 11680 0
evdev 14592 0
ext3 125328 5
jbd 54568 1 ext3
mbcache 13188 1 ext3
dm_mirror 21120 0
dm_log 14212 1 dm_mirror
dm_snapshot 19400 2
dm_mod 59376 49 dm_mirror,dm_log,dm_snapshot
raid456 127008 1
async_xor 8448 1 raid456
async_memcpy 6912 1 raid456
async_tx 11764 3 raid456,async_xor,async_memcpy
xor 10384 2 raid456,async_xor
raid1 24576 1
md_mod 81700 4 raid456,raid1
ide_cd_mod 36360 0
cdrom 37928 1 ide_cd_mod
ide_disk 16640 6
sd_mod 29376 8
atiixp 8324 0 [permanent]
jmicron 6912 0 [permanent]
ide_pci_generic 9220 0 [permanent]
ide_core 129308 5
ide_cd_mod,ide_disk,atiixp,jmicron,ide_pci_generic
ehci_hcd 36492 0
ata_generic 10116 0
ahci 33164 4
ohci_hcd 25732 0
libata 165728 2 ata_generic,ahci
scsi_mod 161272 6
ib_iser,iscsi_tcp,libiscsi,scsi_transport_iscsi,sd_mod,libata
dock 14240 1 libata
r8169 31748 0
thermal 22816 0
processor 42436 1 thermal
fan 9352 0
thermal_sys 17728 3 thermal,processor,fan
_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users
Pasi Kärkkäinen
2009-Nov-05 18:35 UTC
Re: [Xen-users] Dom0 Crash - High Network Load - Debian Lenny Xen 3.2-1
On Thu, Nov 05, 2009 at 10:01:37AM -0600, Thaddeus Hogan wrote:> > I''m looking for a place to start with a problem I am having where I think > high network load is crashing my Xen host. Any help you can offer is > greatly appreciated! > > I started having an issue with my dom0 crashing when under very high > network load. I discovered this when I ran a large backup (1.7 TB) on a > domU. > > I ran the backup with my Bacula setup that I have been using for over a > year. The client is a domU. The dom0 on that host is running the Bacula > storage daemon, which accepts data for backup over the network and writes > it to some locally attached device. In my case the dom0 has two eSATA > attached drives that are used for backups and all domUs on that host write > their backups over the network to that storage. > > When I ran the backup I was sustaining about 500mbit from the domU to the > dom0. After 4.5 hours Nagios reported that the whole host and all domUs had > dropped off the network. When I looked at the console on the host it was > hung, the screen was blank, and I couldn''t backscroll or see any console > messages. > > The next day I tried the backup again. I was sustaining about 500mbit of > network traffic from the DomU to the dom0 again, and after 45 minutes the > host crashed. I had the console already connected to a KVM and was able to > look at it immediately. Again the screen was blank and there were no > console messages accessible. >Set up a serial console so you can capture the (error/crash) messages.. -- Pasi _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Andrea Janna
2010-Feb-12 22:06 UTC
Re: [Xen-users] Dom0 Crash - High Network Load - Debian Lenny Xen 3.2-1
-------- Original message -------- From: Thaddeus Hogan <thaddeus@thogan.com> Date: 05/11/2009 17.01> I''m looking for a place to start with a problem I am having where I think > high network load is crashing my Xen host. Any help you can offer is > greatly appreciated! > > I started having an issue with my dom0 crashing when under very high > network load. I discovered this when I ran a large backup (1.7 TB) on a > domU. >I had a similar problem last month. I''m running a Debian Lenny dom0 with 3 Lenny domUs. Kernel 2.6.26 and all software packages are Lenny releases. I use Bacula for backups on a DAT 72 tape device. Dom0 is running Bacula storage daemon, which manage the DAT device itself. If I run a domU backup (Bacula file daemon running in domU and sending data to dom0 over IP, Xen bridged networking) the system becomes unstable after several minutes of backup. Sometimes dom0 crashes and reboots. Sometimes domUs IP network stops working. If I don''t perform domU backup the system is stable. If I run a backup of another computer (Bacula file daemon running on a Windows box), Lenny computer remains stable. So I suppose issue is related to higher disk activity when Bacula file daemon is running in domU. Dom0 and DomUs share the same physical disk, a SATA soft raid5 array. When I run a backup of a Windows computer data is sent to Bacula storage daemon in dom0 via IP ethernet network and Bacula storage daemon writes that data on tape. I didn''t have time to investigate further, cause I needed the system working for production. I solved this issue installing Xen 3.4 (back-ported from Debian Squeeze) and Suse Linux Enterprise 11 kernel (http://wiki.xensource.com/xenwiki/XenDom0Kernels) on dom0 only. Be aware that on Suse kernel some Xen networking features are compiled as modules, so you need to load them before starting domUs. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Pasi Kärkkäinen
2010-Feb-16 11:57 UTC
Re: [Xen-users] Dom0 Crash - High Network Load - Debian Lenny Xen 3.2-1
On Fri, Feb 12, 2010 at 11:06:59PM +0100, Andrea Janna wrote:> -------- Original message -------- > From: Thaddeus Hogan <thaddeus@thogan.com> > Date: 05/11/2009 17.01 >> I''m looking for a place to start with a problem I am having where I think >> high network load is crashing my Xen host. Any help you can offer is >> greatly appreciated! >> >> I started having an issue with my dom0 crashing when under very high >> network load. I discovered this when I ran a large backup (1.7 TB) on a >> domU. >> > > I had a similar problem last month. I''m running a Debian Lenny dom0 with > 3 Lenny domUs. Kernel 2.6.26 and all software packages are Lenny > releases. > I use Bacula for backups on a DAT 72 tape device. Dom0 is running Bacula > storage daemon, which manage the DAT device itself. > If I run a domU backup (Bacula file daemon running in domU and sending > data to dom0 over IP, Xen bridged networking) the system becomes > unstable after several minutes of backup. Sometimes dom0 crashes and > reboots. Sometimes domUs IP network stops working. > If I don''t perform domU backup the system is stable. > If I run a backup of another computer (Bacula file daemon running on a > Windows box), Lenny computer remains stable. So I suppose issue is > related to higher disk activity when Bacula file daemon is running in > domU. Dom0 and DomUs share the same physical disk, a SATA soft raid5 > array. When I run a backup of a Windows computer data is sent to Bacula > storage daemon in dom0 via IP ethernet network and Bacula storage daemon > writes that data on tape. > I didn''t have time to investigate further, cause I needed the system > working for production.Wondering if these would have helped: http://wiki.xensource.com/xenwiki/XenBestPractices> I solved this issue installing Xen 3.4 (back-ported from Debian Squeeze) > and Suse Linux Enterprise 11 kernel > (http://wiki.xensource.com/xenwiki/XenDom0Kernels) on dom0 only. > Be aware that on Suse kernel some Xen networking features are compiled > as modules, so you need to load them before starting domUs. >Yeah, SLES11 Xen kernel should be much more stable than lenny''s. -- Pasi _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users