Eric van Blokland
2010-Oct-18 12:44 UTC
RE: [Xen-users] xen randomly crashes all VMs hosted on iSCSI NAS array
I''ve seen this happening in the past, when iSCSI disks became inaccessible. Hasn''t occurred for quite a while though (while I know I made these disk inaccessible quite a few times), however, your system appears to be up to date. If it is caused by disks becoming inaccessible, you should see something about it in dmesg, "connection .... timeout". Van: xen-users-bounces@lists.xensource.com [mailto:xen-users-bounces@lists.xensource.com] Namens VPS Lime Verzonden: maandag 18 oktober 2010 16:32 Aan: xen-users@lists.xensource.com Onderwerp: [Xen-users] xen randomly crashes all VMs hosted on iSCSI NAS array I inherited a xen server that is setup to have all the VM images hosted on an iSCSI mounted NAS array. We been experiencing a random (about every 2-3 days) issue where xen would crash all the VMs, leaving nothing but the Domain0 running. What appears to be happening is something causes the iSCI mount to hiccup. Running "vgchange -a y" and restarting all the VMs brings everything up. Nothing appears to be wrong with the NAS array - there are a dozen other servers attached to it that never have a problem. The xend log does not have anything useful in it and I''m at a loss to figure out what is causing this. The only suggestion I''ve heard is maybe the memory usage is too high and it is causing the box to be unstable. If anyone has any suggestions or any additional logs I should be looking at, I''d really appreciate it. Host OS: CentOS 5.5 Xen kernel: xen.gz-2.6.18-194.11.4.el5 iSCSI libraries: iscsi-initiator-utils-6.2.0.871-0.16.el5 Memory on server: 32G Total memory allocated for VMs running paravirt: 19,384 M Total memory allocated for VMs running HVM: 2,688 M Results of xm top: xentop - 10:11:06 Xen 3.1.2-194.11.4.el5 39 domains: 1 running, 38 blocked, 0 paused, 0 crashed, 0 dying, 0 shutdown Mem: 25165116k total, 25150528k used, 14588k free CPUs: 8 @ 1995MHz NAME STATE CPU(sec) CPU(%) MEM(k) MEM(%) MAXMEM(k) MAXMEM(%) VCPUS NETS NETTX(k) NETRX(k) VBDS VBD_OO VBD_RD VBD_WR SSID Domain-0 -----r 1583 17.1 3220540 12.8 no limit n/a 8 32 1932 32747 0 0 0 0 0 _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Eric van Blokland
2010-Oct-18 13:31 UTC
RE: [Xen-users] xen randomly crashes all VMs hosted on iSCSI NAS array
Not sure if this is the cause of your issue. Because I just see messages of VM''s getting started. Nothing about why they could have crashed. Be sure to check it''s really the VMs crashing. Perhaps the entire server just rebooted. If not, try to get dmesg from when the VMs crashed. You can also do "xm dmesg" to see if the hypervisor has anything to tell you. About the memory squeeze. I believe this has to do with Dom0 running low on memory, not sure though. You could try giving Dom0 a reasonable fixed amount of memory. Also be sure you''re not over allocating memory. (Not sure if you even can in Xen, I guess you might, never tried). Van: xen-users-bounces@lists.xensource.com [mailto:xen-users-bounces@lists.xensource.com] Namens VPS Lime Verzonden: maandag 18 oktober 2010 17:16 CC: xen-users@lists.xensource.com Onderwerp: Re: [Xen-users] xen randomly crashes all VMs hosted on iSCSI NAS array Good suggestion on dmesg. The "memory squeeze in netback driver" seems like a likely culprit. There is a bug (http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=762) dating back several years on this issue with some suggestions and other responses that did not work. Has anyone come up with a reliable fix for this on CentOS 5.5? xen_net: Memory squeeze in netback driver. xen_net: Memory squeeze in netback driver. device xen3.128 entered promiscuous mode ADDRCONF(NETDEV_UP): xen3.128: link is not ready printk: 60 messages suppressed. xen_net: Memory squeeze in netback driver. blkback: ring-ref 8, event-channel 15, protocol 1 (x86_64-abi) printk: 11 messages suppressed. xen_net: Memory squeeze in netback driver. ADDRCONF(NETDEV_CHANGE): xen3.120: link becomes ready xenbr1: topology change detected, propagating xenbr1: port 36(xen3.120) entering forwarding state ADDRCONF(NETDEV_CHANGE): xen3.123: link becomes ready xenbr1: topology change detected, propagating xenbr1: port 41(xen3.123) entering forwarding state device tap2 entered promiscuous mode xenbr1: topology change detected, propagating xenbr1: port 43(tap2) entering forwarding state device xen1-112 entered promiscuous mode ADDRCONF(NETDEV_UP): xen1-112: link is not ready tap2: no IPv6 routers present device tap5 entered promiscuous mode xenbr1: topology change detected, propagating xenbr1: port 45(tap5) entering forwarding state device xen3.109 entered promiscuous mode ADDRCONF(NETDEV_UP): xen3.109: link is not ready tap5: no IPv6 routers present printk: 8 messages suppressed. xen_net: Memory squeeze in netback driver. xen_net: Memory squeeze in netback driver. xenbr1: port 46(xen3.109) entering disabled state device xen3.109 left promiscuous mode xenbr1: port 46(xen3.109) entering disabled state xenbr1: port 45(tap5) entering disabled state device tap5 left promiscuous mode xenbr1: port 45(tap5) entering disabled state device xen3.129 entered promiscuous mode ADDRCONF(NETDEV_UP): xen3.129: link is not ready blkback: ring-ref 8, event-channel 15, protocol 1 (x86_64-abi) ADDRCONF(NETDEV_CHANGE): xen3.129: link becomes ready xenbr1: topology change detected, propagating xenbr1: port 45(xen3.129) entering forwarding state nfs: server 10.1.1.45 not responding, still trying nfs: server 10.1.1.45 not responding, still trying nfs: server 10.1.1.45 OK On Mon, Oct 18, 2010 at 8:44 AM, Eric van Blokland <Eric@footsteps.nl<mailto:Eric@footsteps.nl>> wrote: I''ve seen this happening in the past, when iSCSI disks became inaccessible. Hasn''t occurred for quite a while though (while I know I made these disk inaccessible quite a few times), however, your system appears to be up to date. If it is caused by disks becoming inaccessible, you should see something about it in dmesg, "connection .... timeout". Van: xen-users-bounces@lists.xensource.com<mailto:xen-users-bounces@lists.xensource.com> [mailto:xen-users-bounces@lists.xensource.com<mailto:xen-users-bounces@lists.xensource.com>] Namens VPS Lime Verzonden: maandag 18 oktober 2010 16:32 Aan: xen-users@lists.xensource.com<mailto:xen-users@lists.xensource.com> Onderwerp: [Xen-users] xen randomly crashes all VMs hosted on iSCSI NAS array I inherited a xen server that is setup to have all the VM images hosted on an iSCSI mounted NAS array. We been experiencing a random (about every 2-3 days) issue where xen would crash all the VMs, leaving nothing but the Domain0 running. What appears to be happening is something causes the iSCI mount to hiccup. Running "vgchange -a y" and restarting all the VMs brings everything up. Nothing appears to be wrong with the NAS array - there are a dozen other servers attached to it that never have a problem. The xend log does not have anything useful in it and I''m at a loss to figure out what is causing this. The only suggestion I''ve heard is maybe the memory usage is too high and it is causing the box to be unstable. If anyone has any suggestions or any additional logs I should be looking at, I''d really appreciate it. Host OS: CentOS 5.5 Xen kernel: xen.gz-2.6.18-194.11.4.el5 iSCSI libraries: iscsi-initiator-utils-6.2.0.871-0.16.el5 Memory on server: 32G Total memory allocated for VMs running paravirt: 19,384 M Total memory allocated for VMs running HVM: 2,688 M Results of xm top: xentop - 10:11:06 Xen 3.1.2-194.11.4.el5 39 domains: 1 running, 38 blocked, 0 paused, 0 crashed, 0 dying, 0 shutdown Mem: 25165116k total, 25150528k used, 14588k free CPUs: 8 @ 1995MHz NAME STATE CPU(sec) CPU(%) MEM(k) MEM(%) MAXMEM(k) MAXMEM(%) VCPUS NETS NETTX(k) NETRX(k) VBDS VBD_OO VBD_RD VBD_WR SSID Domain-0 -----r 1583 17.1 3220540 12.8 no limit n/a 8 32 1932 32747 0 0 0 0 0 _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
VPS Lime
2010-Oct-18 14:31 UTC
[Xen-users] xen randomly crashes all VMs hosted on iSCSI NAS array
I inherited a xen server that is setup to have all the VM images hosted on an iSCSI mounted NAS array. We been experiencing a random (about every 2-3 days) issue where xen would crash all the VMs, leaving nothing but the Domain0 running. What appears to be happening is something causes the iSCI mount to hiccup. Running "vgchange -a y" and restarting all the VMs brings everything up. Nothing appears to be wrong with the NAS array - there are a dozen other servers attached to it that never have a problem. The xend log does not have anything useful in it and I''m at a loss to figure out what is causing this. The only suggestion I''ve heard is maybe the memory usage is too high and it is causing the box to be unstable. If anyone has any suggestions or any additional logs I should be looking at, I''d really appreciate it. Host OS: CentOS 5.5 Xen kernel: xen.gz-2.6.18-194.11.4.el5 iSCSI libraries: iscsi-initiator-utils-6.2.0.871-0.16.el5 Memory on server: 32G Total memory allocated for VMs running paravirt: 19,384 M Total memory allocated for VMs running HVM: 2,688 M Results of xm top: xentop - 10:11:06 Xen 3.1.2-194.11.4.el5 39 domains: 1 running, 38 blocked, 0 paused, 0 crashed, 0 dying, 0 shutdown Mem: 25165116k total, 25150528k used, 14588k free CPUs: 8 @ 1995MHz NAME STATE CPU(sec) CPU(%) MEM(k) MEM(%) MAXMEM(k) MAXMEM(%) VCPUS NETS NETTX(k) NETRX(k) VBDS VBD_OO VBD_RD VBD_WR SSID Domain-0 -----r 1583 17.1 3220540 12.8 no limit n/a 8 32 1932 32747 0 0 0 0 0 _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
VPS Lime
2010-Oct-18 15:15 UTC
Re: [Xen-users] xen randomly crashes all VMs hosted on iSCSI NAS array
Good suggestion on dmesg. The "memory squeeze in netback driver" seems like a likely culprit. There is a bug ( http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=762) dating back several years on this issue with some suggestions and other responses that did not work. Has anyone come up with a reliable fix for this on CentOS 5.5? xen_net: Memory squeeze in netback driver. xen_net: Memory squeeze in netback driver. device xen3.128 entered promiscuous mode ADDRCONF(NETDEV_UP): xen3.128: link is not ready printk: 60 messages suppressed. xen_net: Memory squeeze in netback driver. blkback: ring-ref 8, event-channel 15, protocol 1 (x86_64-abi) printk: 11 messages suppressed. xen_net: Memory squeeze in netback driver. ADDRCONF(NETDEV_CHANGE): xen3.120: link becomes ready xenbr1: topology change detected, propagating xenbr1: port 36(xen3.120) entering forwarding state ADDRCONF(NETDEV_CHANGE): xen3.123: link becomes ready xenbr1: topology change detected, propagating xenbr1: port 41(xen3.123) entering forwarding state device tap2 entered promiscuous mode xenbr1: topology change detected, propagating xenbr1: port 43(tap2) entering forwarding state device xen1-112 entered promiscuous mode ADDRCONF(NETDEV_UP): xen1-112: link is not ready tap2: no IPv6 routers present device tap5 entered promiscuous mode xenbr1: topology change detected, propagating xenbr1: port 45(tap5) entering forwarding state device xen3.109 entered promiscuous mode ADDRCONF(NETDEV_UP): xen3.109: link is not ready tap5: no IPv6 routers present printk: 8 messages suppressed. xen_net: Memory squeeze in netback driver. xen_net: Memory squeeze in netback driver. xenbr1: port 46(xen3.109) entering disabled state device xen3.109 left promiscuous mode xenbr1: port 46(xen3.109) entering disabled state xenbr1: port 45(tap5) entering disabled state device tap5 left promiscuous mode xenbr1: port 45(tap5) entering disabled state device xen3.129 entered promiscuous mode ADDRCONF(NETDEV_UP): xen3.129: link is not ready blkback: ring-ref 8, event-channel 15, protocol 1 (x86_64-abi) ADDRCONF(NETDEV_CHANGE): xen3.129: link becomes ready xenbr1: topology change detected, propagating xenbr1: port 45(xen3.129) entering forwarding state nfs: server 10.1.1.45 not responding, still trying nfs: server 10.1.1.45 not responding, still trying nfs: server 10.1.1.45 OK On Mon, Oct 18, 2010 at 8:44 AM, Eric van Blokland <Eric@footsteps.nl>wrote:> I’ve seen this happening in the past, when iSCSI disks became > inaccessible. Hasn’t occurred for quite a while though (while I know I made > these disk inaccessible quite a few times), however, your system appears to > be up to date. > > > > If it is caused by disks becoming inaccessible, you should see something > about it in dmesg, “connection …. timeout". > > > > *Van:* xen-users-bounces@lists.xensource.com [mailto: > xen-users-bounces@lists.xensource.com] *Namens *VPS Lime > *Verzonden:* maandag 18 oktober 2010 16:32 > *Aan:* xen-users@lists.xensource.com > *Onderwerp:* [Xen-users] xen randomly crashes all VMs hosted on iSCSI NAS > array > > > > I inherited a xen server that is setup to have all the VM images hosted on > an iSCSI mounted NAS array. We been experiencing a random (about every 2-3 > days) issue where xen would crash all the VMs, leaving nothing but the > Domain0 running. What appears to be happening is something causes the iSCI > mount to hiccup. Running "vgchange -a y" and restarting all the VMs brings > everything up. Nothing appears to be wrong with the NAS array - there are a > dozen other servers attached to it that never have a problem. The xend log > does not have anything useful in it and I''m at a loss to figure out what is > causing this. The only suggestion I''ve heard is maybe the memory usage is > too high and it is causing the box to be unstable. If anyone has any > suggestions or any additional logs I should be looking at, I''d really > appreciate it. > > > > Host OS: CentOS 5.5 > > Xen kernel: xen.gz-2.6.18-194.11.4.el5 > > iSCSI libraries: iscsi-initiator-utils-6.2.0.871-0.16.el5 > > Memory on server: 32G > > Total memory allocated for VMs running paravirt: 19,384 M > > Total memory allocated for VMs running HVM: 2,688 M > > > > Results of xm top: > > xentop - 10:11:06 Xen 3.1.2-194.11.4.el5 > > 39 domains: 1 running, 38 blocked, 0 paused, 0 crashed, 0 dying, 0 shutdown > > Mem: 25165116k total, 25150528k used, 14588k free CPUs: 8 @ 1995MHz > > NAME STATE CPU(sec) CPU(%) MEM(k) MEM(%) MAXMEM(k) MAXMEM(%) > VCPUS NETS NETTX(k) NETRX(k) VBDS VBD_OO VBD_RD VBD_WR SSID > > Domain-0 -----r 1583 17.1 3220540 12.8 no limit > n/a 8 32 1932 32747 0 0 0 0 0 > > > > >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users