I''ve read and experimented extensively and being in desperate need of "finishing" this setup and getting it deployed live, would like to see if anyone has any suggestions on the last hangup we seem to have. Two SuperMicro 1U servers with dual quad-core CPUs and 16GB RAM each. CentOS 5.2 x86_64 and it''s xen implementation. The only thing non "stock" CentOS at this point are the Intel IGB drivers. The RHEL/CentOS drivers for Intel IGB appear to have a bug with DHCP over a bridged interface which the latest drivers downloaded straight from Intel cured for us. Anyway, both are attached to shared FC storage and are doing RHCS with both IP and disk-based quorum. CLVMD with a shared VG for creating LV''s in as containers for VMs. That part is all working very good. Each DOM0 has 2 physical NICs and both are bridged. Additionally we added a virbr0 as a bridged per-DOM0 local network as well. When any VM boots up it can ping and traceroute on any of it''s respective networks perfectly. Inbound/outbound data flow of any kind appears perfect as well. Once a VM is migrated or live-migrated to the other DOM0 though the ability to ping or traceroute ceases. Sessions via ssh or httpd either inbound or outbound continue to work fine though. When a VM boots I see this in dmesg: netfront: Initialising virtual ethernet driver. netfront: device eth0 has flipping receive path. I read something about a CRC problem and had each of them do "ethtool -K eth{n} tx off" but don''t think that was necessary in this instance, I''ve never seen any error messages about CRC errors. The described problem and solution I followed was not heavily detailed and it was just an attempt to see if that helped with the problem. The following was added to the end of /etc/sysctl.conf on both DOM0''s only (per the excellent wiki article): net.ipv4.icmp_echo_ignore_broadcasts = 1 net.ipv4.conf.all.accept_redirects = 0 net.ipv4.conf.all.send_redirects = 0 The other oddity about this is that a VM started on server1 and live migrated to server2, a running ping only pauses a short while then picks right back up and continues to be successful. Migrating it back to server1 or initially starting a VM on server2 and migrating it to server1 is where the ping "stuck" issue comes into play. We were very careful and documented well as we installed both boxes, in an attempt to keep them as identical as possible. I fear this behavior proves that''s not the case though, ugh... After migrating from 2 to 1 and then trying a ping (and waiting a good logn while before ctrl-c''ing this): PING 192.168.77.1 (192.168.77.1) 56(84) bytes of data. 64 bytes from 192.168.77.1: icmp_seq=1 ttl=64 time=0.000 ms --- 192.168.77.1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.000/0.000/0.000/0.000 ms Very strange... Additionally a "service network restart" at this point results in all interfaces going down, loopback being reinitialized and then it hangs on trying to bring up eth0. I can ctrl-c it three times as it pauses on each interface, then "ifconfig" and see all the IPs are still there. Still can''t ping but can "telnet google.com 80" for instance. Odd... So anyway, any pointers or suggestions you might have, would be greatly appreciated... Thanks. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Fri, Jan 09, 2009 at 02:17:34PM -0500, Wendell Dingus wrote:> > I''ve read and experimented extensively and being in desperate need of "finishing" this setup and getting it deployed live, would like to see if anyone has any suggestions on the last hangup we seem to have. > > Two SuperMicro 1U servers with dual quad-core CPUs and 16GB RAM each. CentOS 5.2 x86_64 and it''s xen implementation. The only thing non "stock" CentOS at this point are the Intel IGB drivers. The RHEL/CentOS drivers for Intel IGB appear to have a bug with DHCP over a bridged interface which the latest drivers downloaded straight from Intel cured for us. > > Anyway, both are attached to shared FC storage and are doing RHCS with both IP and disk-based quorum. CLVMD with a shared VG for creating LV''s in as containers for VMs. That part is all working very good. > > Each DOM0 has 2 physical NICs and both are bridged. Additionally we added a virbr0 as a bridged per-DOM0 local network as well. > > When any VM boots up it can ping and traceroute on any of it''s respective networks perfectly. Inbound/outbound data flow of any kind appears perfect as well. Once a VM is migrated or live-migrated to the other DOM0 though the ability to ping or traceroute ceases. Sessions via ssh or httpd either inbound or outbound continue to work fine though. > > When a VM boots I see this in dmesg: > netfront: Initialising virtual ethernet driver. > netfront: device eth0 has flipping receive path. > > I read something about a CRC problem and had each of them do "ethtool -K eth{n} tx off" but don''t think that was necessary in this instance, I''ve never seen any error messages about CRC errors. The described problem and solution I followed was not heavily detailed and it was just an attempt to see if that helped with the problem. > > The following was added to the end of /etc/sysctl.conf on both DOM0''s only (per the excellent wiki article): > net.ipv4.icmp_echo_ignore_broadcasts = 1 > net.ipv4.conf.all.accept_redirects = 0 > net.ipv4.conf.all.send_redirects = 0 > > The other oddity about this is that a VM started on server1 and live migrated to server2, a running ping only pauses a short while then picks right back up and continues to be successful. Migrating it back to server1 or initially starting a VM on server2 and migrating it to server1 is where the ping "stuck" issue comes into play. We were very careful and documented well as we installed both boxes, in an attempt to keep them as identical as possible. I fear this behavior proves that''s not the case though, ugh... > > After migrating from 2 to 1 and then trying a ping (and waiting a good logn while before ctrl-c''ing this): > PING 192.168.77.1 (192.168.77.1) 56(84) bytes of data. > 64 bytes from 192.168.77.1: icmp_seq=1 ttl=64 time=0.000 ms > > --- 192.168.77.1 ping statistics --- > 1 packets transmitted, 1 received, 0% packet loss, time 0ms > rtt min/avg/max/mdev = 0.000/0.000/0.000/0.000 ms > > Very strange... Additionally a "service network restart" at this point results in all interfaces going down, loopback being reinitialized and then it hangs on trying to bring up eth0. I can ctrl-c it three times as it pauses on each interface, then "ifconfig" and see all the IPs are still there. Still can''t ping but can "telnet google.com 80" for instance. Odd... > > So anyway, any pointers or suggestions you might have, would be greatly appreciated... >https://www.redhat.com/archives/rhelv5-announce/2008-October/msg00000.html Some entries from the RHEL 5.3 beta changelog: + Timer problems after migration were fixed + Lengthy network outage after migrations was fixed Dunno if it''s that what you''re seeing.. -- Pasi _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Hello, El Viernes 09 Enero 2009, Wendell Dingus escribió:> I''ve read and experimented extensively and being in desperate need of > "finishing" this setup and getting it deployed live, would like to see if > anyone has any suggestions on the last hangup we seem to have. > > Two SuperMicro 1U servers with dual quad-core CPUs and 16GB RAM each. > CentOS 5.2 x86_64 and it''s xen implementation. The only thing non "stock" > CentOS at this point are the Intel IGB drivers. The RHEL/CentOS drivers for > Intel IGB appear to have a bug with DHCP over a bridged interface which the > latest drivers downloaded straight from Intel cured for us. > > Anyway, both are attached to shared FC storage and are doing RHCS with both > IP and disk-based quorum. CLVMD with a shared VG for creating LV''s in as > containers for VMs. That part is all working very good. > > Each DOM0 has 2 physical NICs and both are bridged. Additionally we added a > virbr0 as a bridged per-DOM0 local network as well. > > When any VM boots up it can ping and traceroute on any of it''s respective > networks perfectly. Inbound/outbound data flow of any kind appears perfect > as well. Once a VM is migrated or live-migrated to the other DOM0 though > the ability to ping or traceroute ceases. Sessions via ssh or httpd either > inbound or outbound continue to work fine though. > > When a VM boots I see this in dmesg: > netfront: Initialising virtual ethernet driver. > netfront: device eth0 has flipping receive path. > > I read something about a CRC problem and had each of them do "ethtool -K > eth{n} tx off" but don''t think that was necessary in this instance, I''ve > never seen any error messages about CRC errors. The described problem and > solution I followed was not heavily detailed and it was just an attempt to > see if that helped with the problem. > > The following was added to the end of /etc/sysctl.conf on both DOM0''s only > (per the excellent wiki article): net.ipv4.icmp_echo_ignore_broadcasts = 1 > net.ipv4.conf.all.accept_redirects = 0 > net.ipv4.conf.all.send_redirects = 0 > > The other oddity about this is that a VM started on server1 and live > migrated to server2, a running ping only pauses a short while then picks > right back up and continues to be successful. Migrating it back to server1 > or initially starting a VM on server2 and migrating it to server1 is where > the ping "stuck" issue comes into play. We were very careful and documented > well as we installed both boxes, in an attempt to keep them as identical as > possible. I fear this behavior proves that''s not the case though, ugh... > > After migrating from 2 to 1 and then trying a ping (and waiting a good logn > while before ctrl-c''ing this): PING 192.168.77.1 (192.168.77.1) 56(84) > bytes of data. > 64 bytes from 192.168.77.1: icmp_seq=1 ttl=64 time=0.000 ms > > --- 192.168.77.1 ping statistics --- > 1 packets transmitted, 1 received, 0% packet loss, time 0ms > rtt min/avg/max/mdev = 0.000/0.000/0.000/0.000 ms >Your ping shows 0% packet loss, that looks like a reverse dns problem it used to happen to me, althought not with VMs but with phisycal servers (not that it should matter...)> Very strange... Additionally a "service network restart" at this point > results in all interfaces going down, loopback being reinitialized and then > it hangs on trying to bring up eth0. I can ctrl-c it three times as it > pauses on each interface, then "ifconfig" and see all the IPs are still > there. Still can''t ping but can "telnet google.com 80" for instance. Odd...This looks like a MAC address change: CentOS/RHEL not being able to stop the interfaces. In the VMs, check /etc/sysconfig/network-scripts/ifcfg-eth0, etc for a line starting with HWADDR= and remove it, or make the MAC static in the VM config.> So anyway, any pointers or suggestions you might have, would be greatly > appreciated... > > Thanks.HTH, -- Ricardo J. Barberis Senior SysAdmin - I+D Dattatec.com :: Soluciones de Web Hosting Su Hosting hecho Simple..! ------------------------------------------ Nota de confidencialidad: Este mensaje y los archivos adjuntos al mismo son confidenciales, de uso exclusivo para el destinatario del mismo. La divulgación y/o uso del mismo sin autorización por parte de Dattatec.com queda prohibida. Dattatec.com no se hace responsable del mensaje por la falsificación y/o alteración del mismo. De no ser Ud. el destinatario del mismo y lo ha recibido por error, por favor notifique al remitente y elimínelo de su sistema. Confidentiality Note: This message and any attachments (the message) are confidential and intended solely for the addressees. Any unauthorised use or dissemination is prohibited by Dattatec.com. Dattatec.com shall not be liable for the message if altered or falsified. If you are not the intended addressee of this message, please cancel it immediately and inform the sender. Nota de Confidencialidade: Esta mensagem e seus eventuais anexos podem conter dados confidenciais ou privilegiados. Se você os recebeu por engano ou não é um dos destinatários aos quais ela foi endereçada, por favor destrua-a e a todos os seus eventuais anexos ou copias realizadas, imediatamente. É proibida a retenção, distribuição, divulgação ou utilização de quaisquer informações aqui contidas. Por favor, informe-nos sobre o recebimento indevido desta mensagem, retornando-a para o autor. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users