My Centos 5.3 cluster has major network stability issues (these are the dom0''s) when running xen 3.4 and 3.4.1. Things were rock solid when running 3.3.1 and they were again when we downgraded back to it. Are there any known network issues with the 3.4.x group? Did something change with 3.4 with how networking is done that we might have needed to change? The guest networking worked fine afaik, but every hour or so, the dom0 would get fenced by another cluster member and its more than likely because of some type of networking change (maybe like the network service getting restarted or whatever). I couldn''t get any useful info out of the logs, but only happened on nodes that were running 3.4.x and it only happened if guests were running on it. Everything was fine if no guests were running. BTW, we tried changing kernels and that didn''t make a difference either. Current with centos is 2.6.18-128.7.1.el5xen, but our cluster was the most stable with 2.6.18-92.1.22.el5xen #1 SMP Tue Dec 16 12:26:32 EST 2008 x86_64 x86_64 x86_64 GNU/Linux and xen 3.3.1. Obviously for security and performance reasons we want to stay current Any help is sincerelly appreciated! -Mark _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Hi Mark, Are you using the xen network-bridge script or ifcfg scripts to setup your networking? Are guest VM''s and CMAN node info passing on the same interface? Cheers, Steve On Sun, Sep 13, 2009 at 6:57 AM, Mark Chaney <macscr@macscr.com> wrote:> My Centos 5.3 cluster has major network stability issues (these are the > dom0''s) when running xen 3.4 and 3.4.1. Things were rock solid when running > 3.3.1 and they were again when we downgraded back to it. Are there any known > network issues with the 3.4.x group? Did something change with 3.4 with how > networking is done that we might have needed to change? The guest networking > worked fine afaik, but every hour or so, the dom0 would get fenced by > another cluster member and its more than likely because of some type of > networking change (maybe like the network service getting restarted or > whatever). I couldn’t get any useful info out of the logs, but only happened > on nodes that were running 3.4.x and it only happened if guests were running > on it. Everything was fine if no guests were running. BTW, we tried changing > kernels and that didn’t make a difference either. Current with centos is > 2.6.18-128.7.1.el5xen, but our cluster was the most stable with > 2.6.18-92.1.22.el5xen #1 SMP Tue Dec 16 12:26:32 EST 2008 x86_64 x86_64 > x86_64 GNU/Linux and xen 3.3.1. Obviously for security and performance > reasons we want to stay current > > > > Any help is sincerelly appreciated! > > > > -Mark > > > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Steve, yes, I am using the network bridge. Here is part of xend-config.sxp: (network-script network-bridge-wrapper) (vif-script vif-bridge) [root <at> wheeljack ~]# cat /etc/xen/scripts/network-bridge-wrapper #!/bin/sh /etc/xen/scripts/network-bridge $1 netdev=eth0 /etc/xen/scripts/network-bridge $1 netdev=eth1 ifconfig is giving me: eth0, eth1, eth2, eth3,peth0, peth1, lo, and virbr0 Eth0 is used for backup and cman communication only, its only, its on a separate vlan. Eth1 is for WAN communnication Eth2,Eth3 is for iSCSI only and guests dont have direct access to it. Snippet from /var/log/messages: Sep 13 01:28:32 wheeljack kernel: device vif1.0 entered promiscuous mode Sep 13 01:28:32 wheeljack kernel: physdev match: using --physdev-out in the OUTPUT, FORWARD and POSTROUTING chains for non-bridged traffic is deprecated and breaks other things, it will be removed in January 2007. See Documentation/feature-removal-schedule.txt for details. This doesn''t affect you in case you''re using it for purely bridged traffic. Sep 13 01:28:32 wheeljack logger: /etc/xen/scripts/vif-bridge: iptables setup failed. This may affect guest networking. Sep 13 01:28:37 wheeljack kernel: blkback: ring-ref 8, event-channel 6, protocol 1 (x86_64-abi) Sep 13 01:28:37 wheeljack kernel: eth1: topology change detected, propagating Sep 13 01:28:37 wheeljack kernel: eth1: port 2(vif1.0) entering forwarding state I know it says the error is with vif-bridge, but that''s stock, so I don''t know what could be wrong with it. I dont get these errors witht he Xen 3.3.1 and Kernel i mentioned earlier. Ive gone as far as disabling iptables and still have the same issues with stability. You did bring up an idea though, i havent tried disabling access to eth0 from the guests, but again, what changed in the xen versions that would cause such a problem if it was that. I swear there must have been a change that has been poorly documented. Anyway, thanks for the help, I sincerelly appreciate it. -Mark From: Stephen Ross [mailto:stephen.ross1986@googlemail.com] Sent: Sunday, September 13, 2009 4:25 AM To: Mark Chaney Cc: Xen-users@lists.xensource.com Subject: Re: [Xen-users] 3.4.x networking Hi Mark, Are you using the xen network-bridge script or ifcfg scripts to setup your networking? Are guest VM''s and CMAN node info passing on the same interface? Cheers, Steve On Sun, Sep 13, 2009 at 6:57 AM, Mark Chaney <macscr@macscr.com> wrote: My Centos 5.3 cluster has major network stability issues (these are the dom0''s) when running xen 3.4 and 3.4.1. Things were rock solid when running 3.3.1 and they were again when we downgraded back to it. Are there any known network issues with the 3.4.x group? Did something change with 3.4 with how networking is done that we might have needed to change? The guest networking worked fine afaik, but every hour or so, the dom0 would get fenced by another cluster member and its more than likely because of some type of networking change (maybe like the network service getting restarted or whatever). I couldn''t get any useful info out of the logs, but only happened on nodes that were running 3.4.x and it only happened if guests were running on it. Everything was fine if no guests were running. BTW, we tried changing kernels and that didn''t make a difference either. Current with centos is 2.6.18-128.7.1.el5xen, but our cluster was the most stable with 2.6.18-92.1.22.el5xen #1 SMP Tue Dec 16 12:26:32 EST 2008 x86_64 x86_64 x86_64 GNU/Linux and xen 3.3.1. Obviously for security and performance reasons we want to stay current Any help is sincerelly appreciated! -Mark _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Sun, Sep 13, 2009 at 08:30:53AM -0500, Mark Chaney wrote:> Snippet from /var/log/messages: > > Sep 13 01:28:32 wheeljack kernel: device vif1.0 entered promiscuous mode > > Sep 13 01:28:32 wheeljack kernel: physdev match: using --physdev-out in the > OUTPUT, FORWARD and POSTROUTING chains for non-bridged traffic is deprecated > and breaks other things, it will be removed in January 2007. See > Documentation/feature-removal-schedule.txt for details. This doesn''t affect > you in case you''re using it for purely bridged traffic. > > Sep 13 01:28:32 wheeljack logger: /etc/xen/scripts/vif-bridge: iptables > setup failed. This may affect guest networking. > > Sep 13 01:28:37 wheeljack kernel: blkback: ring-ref 8, event-channel 6, > protocol 1 (x86_64-abi) > > Sep 13 01:28:37 wheeljack kernel: eth1: topology change detected, > propagating > > Sep 13 01:28:37 wheeljack kernel: eth1: port 2(vif1.0) entering forwarding > state > > > > I know it says the error is with vif-bridge, but that''s stock, so I don''t > know what could be wrong with it. I dont get these errors witht he Xen 3.3.1 > and Kernel i mentioned earlier.Did you diff vif-bridge script between xen 3.3.1 and xen 3.4.1 versions? What are the differences? What''s the failing iptables command? Please paste the whole command here, including the parameters.> Ive gone as far as disabling iptables and > still have the same issues with stability. You did bring up an idea though, > i havent tried disabling access to eth0 from the guests, but again, what > changed in the xen versions that would cause such a problem if it was that. > I swear there must have been a change that has been poorly documented. >Please paste the errors (log messages) about fencing. Does the eth0 traffic work when the server gets fenced? Do you have errors or log messages in "dmesg" or /var/log/messages when that fencing happens?> > > Anyway, thanks for the help, I sincerelly appreciate it. > > > > -Mark >-- Pasi> > > From: Stephen Ross [mailto:stephen.ross1986@googlemail.com] > Sent: Sunday, September 13, 2009 4:25 AM > To: Mark Chaney > Cc: Xen-users@lists.xensource.com > Subject: Re: [Xen-users] 3.4.x networking > > > > Hi Mark, > > > > Are you using the xen network-bridge script or ifcfg scripts to setup your > networking? Are guest VM''s and CMAN node info passing on the same interface? > > > > Cheers, > > > Steve > > On Sun, Sep 13, 2009 at 6:57 AM, Mark Chaney <macscr@macscr.com> wrote: > > My Centos 5.3 cluster has major network stability issues (these are the > dom0''s) when running xen 3.4 and 3.4.1. Things were rock solid when running > 3.3.1 and they were again when we downgraded back to it. Are there any known > network issues with the 3.4.x group? Did something change with 3.4 with how > networking is done that we might have needed to change? The guest networking > worked fine afaik, but every hour or so, the dom0 would get fenced by > another cluster member and its more than likely because of some type of > networking change (maybe like the network service getting restarted or > whatever). I couldn''t get any useful info out of the logs, but only happened > on nodes that were running 3.4.x and it only happened if guests were running > on it. Everything was fine if no guests were running. BTW, we tried changing > kernels and that didn''t make a difference either. Current with centos is > 2.6.18-128.7.1.el5xen, but our cluster was the most stable with > 2.6.18-92.1.22.el5xen #1 SMP Tue Dec 16 12:26:32 EST 2008 x86_64 x86_64 > x86_64 GNU/Linux and xen 3.3.1. Obviously for security and performance > reasons we want to stay current > > > > Any help is sincerelly appreciated! > > > > -Mark > > > > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users > > >> _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Sunday 13 September 2009 14:46:26 Pasi Kärkkäinen wrote:> > I know it says the error is with vif-bridge, but that''s stock, so I don''t > > know what could be wrong with it. I dont get these errors witht he Xen > > 3.3.1 and Kernel i mentioned earlier. > > Did you diff vif-bridge script between xen 3.3.1 and xen 3.4.1 versions? > What are the differences? > > What''s the failing iptables command? Please paste the whole command here, > including the parameters.frob_iptables in /etc/xen/scripts/vif-common.sh has changed, mostly by the addition of: iptables "$c" FORWARD -m state --state RELATED,ESTABLISHED -m physdev \ --physdev-out "$vif" -j ACCEPT 2>/dev/null I found this not only caused the probably innocuous physdev depreciation warning, but caused the conntrack modules to be loaded due to the state check. The default conntrack size was far too small for me. So either remove iptables, or comment out "handle_iptable" from /etc/xen/scripts/vif-bridge, I''ve no need for Xen to change iptables. -- Mike Williams _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
While it seems a bit dirty, what I have been addng to my firewall, which is CSF (from configservers.net), I have a bash script that runs after the normal rules are loaded to iptables: [root@wheeljack ~]# cat /etc/csf/csfpost.sh #!/bin/sh # add iptables commands here /sbin/iptables -A FORWARD -m physdev --physdev-in peth1 --physdev-out vif+ -j ACCEPT /sbin/iptables -A FORWARD -m physdev --physdev-out peth1 --physdev-in vif+ -j ACCEPT /sbin/iptables -A FORWARD -m physdev --physdev-in peth0 --physdev-out vif+ -j ACCEPT /sbin/iptables -A FORWARD -m physdev --physdev-out peth0 --physdev-in vif+ -j ACCEPT /sbin/iptables -A FORWARD -m physdev --physdev-out vif+ --physdev-in vif+ -j ACCEPT /sbin/iptables -A FORWARD -m physdev --physdev-in vif+ --physdev-out vif+ -j ACCEPT /sbin/iptables -A FORWARD -m physdev --physdev-out tap+ --physdev-in tap+ -j ACCEPT /sbin/iptables -A FORWARD -m physdev --physdev-in tap+ --physdev-out tap+ -j ACCEPT /sbin/iptables -A FORWARD -m physdev --physdev-out vif+ --physdev-in tap+ -j ACCEPT /sbin/iptables -A FORWARD -m physdev --physdev-in vif+ --physdev-out tap+ -j ACCEPT /sbin/iptables -A FORWARD -m physdev --physdev-out tap+ --physdev-in vif+ -j ACCEPT /sbin/iptables -A FORWARD -m physdev --physdev-in tap+ --physdev-out vif+ -j ACCEPT It appears to work for the most part. Som im guessing I should be able to comment out the handle_iptable for vif-bridge. BUT, according to my diff findings the vif-bridge for 3.3.1 and 3.4.1 are exactly the same. -----Original Message----- From: xen-users-bounces@lists.xensource.com [mailto:xen-users-bounces@lists.xensource.com] On Behalf Of Mike Williams Sent: Sunday, September 13, 2009 9:25 AM To: xen-users@lists.xensource.com Subject: Re: [Xen-users] 3.4.x networking On Sunday 13 September 2009 14:46:26 Pasi Kärkkäinen wrote:> > I know it says the error is with vif-bridge, but that''s stock, so Idon''t> > know what could be wrong with it. I dont get these errors witht he Xen > > 3.3.1 and Kernel i mentioned earlier. > > Did you diff vif-bridge script between xen 3.3.1 and xen 3.4.1 versions? > What are the differences? > > What''s the failing iptables command? Please paste the whole command here, > including the parameters.frob_iptables in /etc/xen/scripts/vif-common.sh has changed, mostly by the addition of: iptables "$c" FORWARD -m state --state RELATED,ESTABLISHED -m physdev \ --physdev-out "$vif" -j ACCEPT 2>/dev/null I found this not only caused the probably innocuous physdev depreciation warning, but caused the conntrack modules to be loaded due to the state check. The default conntrack size was far too small for me. So either remove iptables, or comment out "handle_iptable" from /etc/xen/scripts/vif-bridge, I''ve no need for Xen to change iptables. -- Mike Williams _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Sun, Sep 13, 2009 at 03:09:54PM -0500, Mark Chaney wrote:> > It appears to work for the most part. Som im guessing I should be able to > comment out the handle_iptable for vif-bridge. BUT, according to my diff > findings the vif-bridge for 3.3.1 and 3.4.1 are exactly the same. >Well like Mike already pointed out the changes are in /etc/xen/scripts/vif-common.sh.> > On Sunday 13 September 2009 14:46:26 Pasi Kärkkäinen wrote: > > > I know it says the error is with vif-bridge, but that''s stock, so I > don''t > > > know what could be wrong with it. I dont get these errors witht he Xen > > > 3.3.1 and Kernel i mentioned earlier. > > > > Did you diff vif-bridge script between xen 3.3.1 and xen 3.4.1 versions? > > What are the differences? > > > > What''s the failing iptables command? Please paste the whole command here, > > including the parameters. >!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!> frob_iptables in /etc/xen/scripts/vif-common.sh has changed, mostly by the > addition of: > > iptables "$c" FORWARD -m state --state RELATED,ESTABLISHED -m physdev \ > --physdev-out "$vif" -j ACCEPT 2>/dev/null > > I found this not only caused the probably innocuous physdev depreciation > warning, but caused the conntrack modules to be loaded due to the state > check. > The default conntrack size was far too small for me. > So either remove iptables, or comment out "handle_iptable" from > /etc/xen/scripts/vif-bridge, I''ve no need for Xen to change iptables. >!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! -- Pasi> -- > Mike Williams >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users