Eric van Blokland
2010-Oct-18 12:44 UTC
RE: [Xen-users] xen randomly crashes all VMs hosted on iSCSI NAS array
I''ve seen this happening in the past, when iSCSI disks became
inaccessible. Hasn''t occurred for quite a while though (while I know I
made these disk inaccessible quite a few times), however, your system appears to
be up to date.
If it is caused by disks becoming inaccessible, you should see something about
it in dmesg, "connection .... timeout".
Van: xen-users-bounces@lists.xensource.com
[mailto:xen-users-bounces@lists.xensource.com] Namens VPS Lime
Verzonden: maandag 18 oktober 2010 16:32
Aan: xen-users@lists.xensource.com
Onderwerp: [Xen-users] xen randomly crashes all VMs hosted on iSCSI NAS array
I inherited a xen server that is setup to have all the VM images hosted on an
iSCSI mounted NAS array. We been experiencing a random (about every 2-3 days)
issue where xen would crash all the VMs, leaving nothing but the Domain0
running. What appears to be happening is something causes the iSCI mount to
hiccup. Running "vgchange -a y" and restarting all the VMs brings
everything up. Nothing appears to be wrong with the NAS array - there are a
dozen other servers attached to it that never have a problem. The xend log does
not have anything useful in it and I''m at a loss to figure out what is
causing this. The only suggestion I''ve heard is maybe the memory usage
is too high and it is causing the box to be unstable. If anyone has any
suggestions or any additional logs I should be looking at, I''d really
appreciate it.
Host OS: CentOS 5.5
Xen kernel: xen.gz-2.6.18-194.11.4.el5
iSCSI libraries: iscsi-initiator-utils-6.2.0.871-0.16.el5
Memory on server: 32G
Total memory allocated for VMs running paravirt: 19,384 M
Total memory allocated for VMs running HVM: 2,688 M
Results of xm top:
xentop - 10:11:06 Xen 3.1.2-194.11.4.el5
39 domains: 1 running, 38 blocked, 0 paused, 0 crashed, 0 dying, 0 shutdown
Mem: 25165116k total, 25150528k used, 14588k free CPUs: 8 @ 1995MHz
NAME STATE CPU(sec) CPU(%) MEM(k) MEM(%) MAXMEM(k) MAXMEM(%) VCPUS
NETS NETTX(k) NETRX(k) VBDS VBD_OO VBD_RD VBD_WR SSID
Domain-0 -----r 1583 17.1 3220540 12.8 no limit n/a
8 32 1932 32747 0 0 0 0 0
_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users
Eric van Blokland
2010-Oct-18 13:31 UTC
RE: [Xen-users] xen randomly crashes all VMs hosted on iSCSI NAS array
Not sure if this is the cause of your issue. Because I just see messages of
VM''s getting started. Nothing about why they could have crashed.
Be sure to check it''s really the VMs crashing. Perhaps the entire
server just rebooted. If not, try to get dmesg from when the VMs crashed. You
can also do "xm dmesg" to see if the hypervisor has anything to tell
you.
About the memory squeeze. I believe this has to do with Dom0 running low on
memory, not sure though. You could try giving Dom0 a reasonable fixed amount of
memory.
Also be sure you''re not over allocating memory. (Not sure if you even
can in Xen, I guess you might, never tried).
Van: xen-users-bounces@lists.xensource.com
[mailto:xen-users-bounces@lists.xensource.com] Namens VPS Lime
Verzonden: maandag 18 oktober 2010 17:16
CC: xen-users@lists.xensource.com
Onderwerp: Re: [Xen-users] xen randomly crashes all VMs hosted on iSCSI NAS
array
Good suggestion on dmesg. The "memory squeeze in netback driver"
seems like a likely culprit. There is a bug
(http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=762) dating back several
years on this issue with some suggestions and other responses that did not work.
Has anyone come up with a reliable fix for this on CentOS 5.5?
xen_net: Memory squeeze in netback driver.
xen_net: Memory squeeze in netback driver.
device xen3.128 entered promiscuous mode
ADDRCONF(NETDEV_UP): xen3.128: link is not ready
printk: 60 messages suppressed.
xen_net: Memory squeeze in netback driver.
blkback: ring-ref 8, event-channel 15, protocol 1 (x86_64-abi)
printk: 11 messages suppressed.
xen_net: Memory squeeze in netback driver.
ADDRCONF(NETDEV_CHANGE): xen3.120: link becomes ready
xenbr1: topology change detected, propagating
xenbr1: port 36(xen3.120) entering forwarding state
ADDRCONF(NETDEV_CHANGE): xen3.123: link becomes ready
xenbr1: topology change detected, propagating
xenbr1: port 41(xen3.123) entering forwarding state
device tap2 entered promiscuous mode
xenbr1: topology change detected, propagating
xenbr1: port 43(tap2) entering forwarding state
device xen1-112 entered promiscuous mode
ADDRCONF(NETDEV_UP): xen1-112: link is not ready
tap2: no IPv6 routers present
device tap5 entered promiscuous mode
xenbr1: topology change detected, propagating
xenbr1: port 45(tap5) entering forwarding state
device xen3.109 entered promiscuous mode
ADDRCONF(NETDEV_UP): xen3.109: link is not ready
tap5: no IPv6 routers present
printk: 8 messages suppressed.
xen_net: Memory squeeze in netback driver.
xen_net: Memory squeeze in netback driver.
xenbr1: port 46(xen3.109) entering disabled state
device xen3.109 left promiscuous mode
xenbr1: port 46(xen3.109) entering disabled state
xenbr1: port 45(tap5) entering disabled state
device tap5 left promiscuous mode
xenbr1: port 45(tap5) entering disabled state
device xen3.129 entered promiscuous mode
ADDRCONF(NETDEV_UP): xen3.129: link is not ready
blkback: ring-ref 8, event-channel 15, protocol 1 (x86_64-abi)
ADDRCONF(NETDEV_CHANGE): xen3.129: link becomes ready
xenbr1: topology change detected, propagating
xenbr1: port 45(xen3.129) entering forwarding state
nfs: server 10.1.1.45 not responding, still trying
nfs: server 10.1.1.45 not responding, still trying
nfs: server 10.1.1.45 OK
On Mon, Oct 18, 2010 at 8:44 AM, Eric van Blokland
<Eric@footsteps.nl<mailto:Eric@footsteps.nl>> wrote:
I''ve seen this happening in the past, when iSCSI disks became
inaccessible. Hasn''t occurred for quite a while though (while I know I
made these disk inaccessible quite a few times), however, your system appears to
be up to date.
If it is caused by disks becoming inaccessible, you should see something about
it in dmesg, "connection .... timeout".
Van:
xen-users-bounces@lists.xensource.com<mailto:xen-users-bounces@lists.xensource.com>
[mailto:xen-users-bounces@lists.xensource.com<mailto:xen-users-bounces@lists.xensource.com>]
Namens VPS Lime
Verzonden: maandag 18 oktober 2010 16:32
Aan: xen-users@lists.xensource.com<mailto:xen-users@lists.xensource.com>
Onderwerp: [Xen-users] xen randomly crashes all VMs hosted on iSCSI NAS array
I inherited a xen server that is setup to have all the VM images hosted on an
iSCSI mounted NAS array. We been experiencing a random (about every 2-3 days)
issue where xen would crash all the VMs, leaving nothing but the Domain0
running. What appears to be happening is something causes the iSCI mount to
hiccup. Running "vgchange -a y" and restarting all the VMs brings
everything up. Nothing appears to be wrong with the NAS array - there are a
dozen other servers attached to it that never have a problem. The xend log does
not have anything useful in it and I''m at a loss to figure out what is
causing this. The only suggestion I''ve heard is maybe the memory usage
is too high and it is causing the box to be unstable. If anyone has any
suggestions or any additional logs I should be looking at, I''d really
appreciate it.
Host OS: CentOS 5.5
Xen kernel: xen.gz-2.6.18-194.11.4.el5
iSCSI libraries: iscsi-initiator-utils-6.2.0.871-0.16.el5
Memory on server: 32G
Total memory allocated for VMs running paravirt: 19,384 M
Total memory allocated for VMs running HVM: 2,688 M
Results of xm top:
xentop - 10:11:06 Xen 3.1.2-194.11.4.el5
39 domains: 1 running, 38 blocked, 0 paused, 0 crashed, 0 dying, 0 shutdown
Mem: 25165116k total, 25150528k used, 14588k free CPUs: 8 @ 1995MHz
NAME STATE CPU(sec) CPU(%) MEM(k) MEM(%) MAXMEM(k) MAXMEM(%) VCPUS
NETS NETTX(k) NETRX(k) VBDS VBD_OO VBD_RD VBD_WR SSID
Domain-0 -----r 1583 17.1 3220540 12.8 no limit n/a
8 32 1932 32747 0 0 0 0 0
_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users
VPS Lime
2010-Oct-18 14:31 UTC
[Xen-users] xen randomly crashes all VMs hosted on iSCSI NAS array
I inherited a xen server that is setup to have all the VM images hosted on
an iSCSI mounted NAS array. We been experiencing a random (about every 2-3
days) issue where xen would crash all the VMs, leaving nothing but the
Domain0 running. What appears to be happening is something causes the iSCI
mount to hiccup. Running "vgchange -a y" and restarting all the VMs
brings
everything up. Nothing appears to be wrong with the NAS array - there are a
dozen other servers attached to it that never have a problem. The xend log
does not have anything useful in it and I''m at a loss to figure out
what is
causing this. The only suggestion I''ve heard is maybe the memory usage
is
too high and it is causing the box to be unstable. If anyone has any
suggestions or any additional logs I should be looking at, I''d really
appreciate it.
Host OS: CentOS 5.5
Xen kernel: xen.gz-2.6.18-194.11.4.el5
iSCSI libraries: iscsi-initiator-utils-6.2.0.871-0.16.el5
Memory on server: 32G
Total memory allocated for VMs running paravirt: 19,384 M
Total memory allocated for VMs running HVM: 2,688 M
Results of xm top:
xentop - 10:11:06 Xen 3.1.2-194.11.4.el5
39 domains: 1 running, 38 blocked, 0 paused, 0 crashed, 0 dying, 0 shutdown
Mem: 25165116k total, 25150528k used, 14588k free CPUs: 8 @ 1995MHz
NAME STATE CPU(sec) CPU(%) MEM(k) MEM(%) MAXMEM(k) MAXMEM(%)
VCPUS NETS NETTX(k) NETRX(k) VBDS VBD_OO VBD_RD VBD_WR SSID
Domain-0 -----r 1583 17.1 3220540 12.8 no limit n/a 8
32 1932 32747 0 0 0 0 0
_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users
VPS Lime
2010-Oct-18 15:15 UTC
Re: [Xen-users] xen randomly crashes all VMs hosted on iSCSI NAS array
Good suggestion on dmesg. The "memory squeeze in netback driver" seems like a likely culprit. There is a bug ( http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=762) dating back several years on this issue with some suggestions and other responses that did not work. Has anyone come up with a reliable fix for this on CentOS 5.5? xen_net: Memory squeeze in netback driver. xen_net: Memory squeeze in netback driver. device xen3.128 entered promiscuous mode ADDRCONF(NETDEV_UP): xen3.128: link is not ready printk: 60 messages suppressed. xen_net: Memory squeeze in netback driver. blkback: ring-ref 8, event-channel 15, protocol 1 (x86_64-abi) printk: 11 messages suppressed. xen_net: Memory squeeze in netback driver. ADDRCONF(NETDEV_CHANGE): xen3.120: link becomes ready xenbr1: topology change detected, propagating xenbr1: port 36(xen3.120) entering forwarding state ADDRCONF(NETDEV_CHANGE): xen3.123: link becomes ready xenbr1: topology change detected, propagating xenbr1: port 41(xen3.123) entering forwarding state device tap2 entered promiscuous mode xenbr1: topology change detected, propagating xenbr1: port 43(tap2) entering forwarding state device xen1-112 entered promiscuous mode ADDRCONF(NETDEV_UP): xen1-112: link is not ready tap2: no IPv6 routers present device tap5 entered promiscuous mode xenbr1: topology change detected, propagating xenbr1: port 45(tap5) entering forwarding state device xen3.109 entered promiscuous mode ADDRCONF(NETDEV_UP): xen3.109: link is not ready tap5: no IPv6 routers present printk: 8 messages suppressed. xen_net: Memory squeeze in netback driver. xen_net: Memory squeeze in netback driver. xenbr1: port 46(xen3.109) entering disabled state device xen3.109 left promiscuous mode xenbr1: port 46(xen3.109) entering disabled state xenbr1: port 45(tap5) entering disabled state device tap5 left promiscuous mode xenbr1: port 45(tap5) entering disabled state device xen3.129 entered promiscuous mode ADDRCONF(NETDEV_UP): xen3.129: link is not ready blkback: ring-ref 8, event-channel 15, protocol 1 (x86_64-abi) ADDRCONF(NETDEV_CHANGE): xen3.129: link becomes ready xenbr1: topology change detected, propagating xenbr1: port 45(xen3.129) entering forwarding state nfs: server 10.1.1.45 not responding, still trying nfs: server 10.1.1.45 not responding, still trying nfs: server 10.1.1.45 OK On Mon, Oct 18, 2010 at 8:44 AM, Eric van Blokland <Eric@footsteps.nl>wrote:> I’ve seen this happening in the past, when iSCSI disks became > inaccessible. Hasn’t occurred for quite a while though (while I know I made > these disk inaccessible quite a few times), however, your system appears to > be up to date. > > > > If it is caused by disks becoming inaccessible, you should see something > about it in dmesg, “connection …. timeout". > > > > *Van:* xen-users-bounces@lists.xensource.com [mailto: > xen-users-bounces@lists.xensource.com] *Namens *VPS Lime > *Verzonden:* maandag 18 oktober 2010 16:32 > *Aan:* xen-users@lists.xensource.com > *Onderwerp:* [Xen-users] xen randomly crashes all VMs hosted on iSCSI NAS > array > > > > I inherited a xen server that is setup to have all the VM images hosted on > an iSCSI mounted NAS array. We been experiencing a random (about every 2-3 > days) issue where xen would crash all the VMs, leaving nothing but the > Domain0 running. What appears to be happening is something causes the iSCI > mount to hiccup. Running "vgchange -a y" and restarting all the VMs brings > everything up. Nothing appears to be wrong with the NAS array - there are a > dozen other servers attached to it that never have a problem. The xend log > does not have anything useful in it and I''m at a loss to figure out what is > causing this. The only suggestion I''ve heard is maybe the memory usage is > too high and it is causing the box to be unstable. If anyone has any > suggestions or any additional logs I should be looking at, I''d really > appreciate it. > > > > Host OS: CentOS 5.5 > > Xen kernel: xen.gz-2.6.18-194.11.4.el5 > > iSCSI libraries: iscsi-initiator-utils-6.2.0.871-0.16.el5 > > Memory on server: 32G > > Total memory allocated for VMs running paravirt: 19,384 M > > Total memory allocated for VMs running HVM: 2,688 M > > > > Results of xm top: > > xentop - 10:11:06 Xen 3.1.2-194.11.4.el5 > > 39 domains: 1 running, 38 blocked, 0 paused, 0 crashed, 0 dying, 0 shutdown > > Mem: 25165116k total, 25150528k used, 14588k free CPUs: 8 @ 1995MHz > > NAME STATE CPU(sec) CPU(%) MEM(k) MEM(%) MAXMEM(k) MAXMEM(%) > VCPUS NETS NETTX(k) NETRX(k) VBDS VBD_OO VBD_RD VBD_WR SSID > > Domain-0 -----r 1583 17.1 3220540 12.8 no limit > n/a 8 32 1932 32747 0 0 0 0 0 > > > > >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users