thr3ads.net - Xen users - [Xen-users] Network Issues on Migration [Jan 2009]

If this information is useful, please help other people find it:
Share via:

Wendell Dingus

2009-Jan-09 19:17 UTC

[Xen-users] Network Issues on Migration

I''ve read and experimented extensively and being in desperate need of
"finishing" this setup and getting it deployed live, would like to see
if anyone has any suggestions on the last hangup we seem to have.

Two SuperMicro 1U servers with dual quad-core CPUs and 16GB RAM each. CentOS 5.2
x86_64 and it''s xen implementation. The only thing non
"stock" CentOS at this point are the Intel IGB drivers. The
RHEL/CentOS drivers for Intel IGB appear to have a bug with DHCP over a bridged
interface which the latest drivers downloaded straight from Intel cured for us.

Anyway, both are attached to shared FC storage and are doing RHCS with both IP
and disk-based quorum. CLVMD with a shared VG for creating LV''s in as
containers for VMs. That part is all working very good.

Each DOM0 has 2 physical NICs and both are bridged. Additionally we added a
virbr0 as a bridged per-DOM0 local network as well.

When any VM boots up it can ping and traceroute on any of it''s
respective networks perfectly. Inbound/outbound data flow of any kind appears
perfect as well. Once a VM is migrated or live-migrated to the other DOM0 though
the ability to ping or traceroute ceases. Sessions via ssh or httpd either
inbound or outbound continue to work fine though.

When a VM boots I see this in dmesg: 
netfront: Initialising virtual ethernet driver. 
netfront: device eth0 has flipping receive path. 

I read something about a CRC problem and had each of them do "ethtool -K
eth{n} tx off" but don''t think that was necessary in this
instance, I''ve never seen any error messages about CRC errors. The
described problem and solution I followed was not heavily detailed and it was
just an attempt to see if that helped with the problem.

The following was added to the end of /etc/sysctl.conf on both DOM0''s
only (per the excellent wiki article):
net.ipv4.icmp_echo_ignore_broadcasts = 1 
net.ipv4.conf.all.accept_redirects = 0 
net.ipv4.conf.all.send_redirects = 0 

The other oddity about this is that a VM started on server1 and live migrated to
server2, a running ping only pauses a short while then picks right back up and
continues to be successful. Migrating it back to server1 or initially starting a
VM on server2 and migrating it to server1 is where the ping "stuck"
issue comes into play. We were very careful and documented well as we installed
both boxes, in an attempt to keep them as identical as possible. I fear this
behavior proves that''s not the case though, ugh...

After migrating from 2 to 1 and then trying a ping (and waiting a good logn
while before ctrl-c''ing this):
PING 192.168.77.1 (192.168.77.1) 56(84) bytes of data. 
64 bytes from 192.168.77.1: icmp_seq=1 ttl=64 time=0.000 ms 

--- 192.168.77.1 ping statistics --- 
1 packets transmitted, 1 received, 0% packet loss, time 0ms 
rtt min/avg/max/mdev = 0.000/0.000/0.000/0.000 ms 

Very strange... Additionally a "service network restart" at this point
results in all interfaces going down, loopback being reinitialized and then it
hangs on trying to bring up eth0. I can ctrl-c it three times as it pauses on
each interface, then "ifconfig" and see all the IPs are still there.
Still can''t ping but can "telnet google.com 80" for instance.
Odd...

So anyway, any pointers or suggestions you might have, would be greatly
appreciated...

Thanks. 


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Pasi Kärkkäinen

2009-Jan-09 19:57 UTC

head link

Re: [Xen-users] Network Issues on Migration

On Fri, Jan 09, 2009 at 02:17:34PM -0500, Wendell Dingus
wrote:> 
> I''ve read and experimented extensively and being in desperate need
of "finishing" this setup and getting it deployed live, would like to
see if anyone has any suggestions on the last hangup we seem to have.
> 
> Two SuperMicro 1U servers with dual quad-core CPUs and 16GB RAM each.
CentOS 5.2 x86_64 and it''s xen implementation. The only thing non
"stock" CentOS at this point are the Intel IGB drivers. The
RHEL/CentOS drivers for Intel IGB appear to have a bug with DHCP over a bridged
interface which the latest drivers downloaded straight from Intel cured for us.
> 
> Anyway, both are attached to shared FC storage and are doing RHCS with both
IP and disk-based quorum. CLVMD with a shared VG for creating LV''s in
as containers for VMs. That part is all working very good.
> 
> Each DOM0 has 2 physical NICs and both are bridged. Additionally we added a
virbr0 as a bridged per-DOM0 local network as well.
> 
> When any VM boots up it can ping and traceroute on any of it''s
respective networks perfectly. Inbound/outbound data flow of any kind appears
perfect as well. Once a VM is migrated or live-migrated to the other DOM0 though
the ability to ping or traceroute ceases. Sessions via ssh or httpd either
inbound or outbound continue to work fine though.
> 
> When a VM boots I see this in dmesg: 
> netfront: Initialising virtual ethernet driver. 
> netfront: device eth0 has flipping receive path. 
> 
> I read something about a CRC problem and had each of them do "ethtool
-K eth{n} tx off" but don''t think that was necessary in this
instance, I''ve never seen any error messages about CRC errors. The
described problem and solution I followed was not heavily detailed and it was
just an attempt to see if that helped with the problem.
> 
> The following was added to the end of /etc/sysctl.conf on both
DOM0''s only (per the excellent wiki article):
> net.ipv4.icmp_echo_ignore_broadcasts = 1 
> net.ipv4.conf.all.accept_redirects = 0 
> net.ipv4.conf.all.send_redirects = 0 
> 
> The other oddity about this is that a VM started on server1 and live
migrated to server2, a running ping only pauses a short while then picks right
back up and continues to be successful. Migrating it back to server1 or
initially starting a VM on server2 and migrating it to server1 is where the ping
"stuck" issue comes into play. We were very careful and documented
well as we installed both boxes, in an attempt to keep them as identical as
possible. I fear this behavior proves that''s not the case though,
ugh...
> 
> After migrating from 2 to 1 and then trying a ping (and waiting a good logn
while before ctrl-c''ing this):
> PING 192.168.77.1 (192.168.77.1) 56(84) bytes of data. 
> 64 bytes from 192.168.77.1: icmp_seq=1 ttl=64 time=0.000 ms 
> 
> --- 192.168.77.1 ping statistics --- 
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms 
> rtt min/avg/max/mdev = 0.000/0.000/0.000/0.000 ms 
> 
> Very strange... Additionally a "service network restart" at this
point results in all interfaces going down, loopback being reinitialized and
then it hangs on trying to bring up eth0. I can ctrl-c it three times as it
pauses on each interface, then "ifconfig" and see all the IPs are
still there. Still can''t ping but can "telnet google.com 80"
for instance. Odd...
> 
> So anyway, any pointers or suggestions you might have, would be greatly
appreciated...
> 
https://www.redhat.com/archives/rhelv5-announce/2008-October/msg00000.html

Some entries from the RHEL 5.3 beta changelog:

+ Timer problems after migration were fixed
+ Lengthy network outage after migrations was fixed

Dunno if it''s that what you''re seeing.. 

-- Pasi

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Ricardo J. Barberis

2009-Jan-12 20:43 UTC

head link

Re: [Xen-users] Network Issues on Migration

Hello,

El Viernes 09 Enero 2009, Wendell Dingus escribió:> I''ve read and experimented extensively and being in desperate need
of
> "finishing" this setup and getting it deployed live, would like
to see if
> anyone has any suggestions on the last hangup we seem to have.
>
> Two SuperMicro 1U servers with dual quad-core CPUs and 16GB RAM each.
> CentOS 5.2 x86_64 and it''s xen implementation. The only thing non
"stock"
> CentOS at this point are the Intel IGB drivers. The RHEL/CentOS drivers for
> Intel IGB appear to have a bug with DHCP over a bridged interface which the
> latest drivers downloaded straight from Intel cured for us.
>
> Anyway, both are attached to shared FC storage and are doing RHCS with both
> IP and disk-based quorum. CLVMD with a shared VG for creating LV''s
in as
> containers for VMs. That part is all working very good.
>
> Each DOM0 has 2 physical NICs and both are bridged. Additionally we added a
> virbr0 as a bridged per-DOM0 local network as well.
>
> When any VM boots up it can ping and traceroute on any of it''s
respective
> networks perfectly. Inbound/outbound data flow of any kind appears perfect
> as well. Once a VM is migrated or live-migrated to the other DOM0 though
> the ability to ping or traceroute ceases. Sessions via ssh or httpd either
> inbound or outbound continue to work fine though.
>
> When a VM boots I see this in dmesg:
> netfront: Initialising virtual ethernet driver.
> netfront: device eth0 has flipping receive path.
>
> I read something about a CRC problem and had each of them do "ethtool
-K
> eth{n} tx off" but don''t think that was necessary in this
instance, I''ve
> never seen any error messages about CRC errors. The described problem and
> solution I followed was not heavily detailed and it was just an attempt to
> see if that helped with the problem.
>
> The following was added to the end of /etc/sysctl.conf on both
DOM0''s only
> (per the excellent wiki article): net.ipv4.icmp_echo_ignore_broadcasts = 1
> net.ipv4.conf.all.accept_redirects = 0
> net.ipv4.conf.all.send_redirects = 0
>
> The other oddity about this is that a VM started on server1 and live
> migrated to server2, a running ping only pauses a short while then picks
> right back up and continues to be successful. Migrating it back to server1
> or initially starting a VM on server2 and migrating it to server1 is where
> the ping "stuck" issue comes into play. We were very careful and
documented
> well as we installed both boxes, in an attempt to keep them as identical as
> possible. I fear this behavior proves that''s not the case though,
ugh...
>
> After migrating from 2 to 1 and then trying a ping (and waiting a good logn
> while before ctrl-c''ing this): PING 192.168.77.1 (192.168.77.1)
56(84)
> bytes of data.
> 64 bytes from 192.168.77.1: icmp_seq=1 ttl=64 time=0.000 ms
>
> --- 192.168.77.1 ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 0.000/0.000/0.000/0.000 ms
>
Your ping shows 0% packet loss, that looks like a reverse dns problem it used 
to happen to me, althought not with VMs but with phisycal servers (not that 
it should matter...)
> Very strange... Additionally a "service network restart" at this
point
> results in all interfaces going down, loopback being reinitialized and then
> it hangs on trying to bring up eth0. I can ctrl-c it three times as it
> pauses on each interface, then "ifconfig" and see all the IPs are
still
> there. Still can''t ping but can "telnet google.com 80"
for instance. Odd...
This looks like a MAC address change: CentOS/RHEL not being able to stop the 
interfaces.

In the VMs, check /etc/sysconfig/network-scripts/ifcfg-eth0, etc for a line 
starting with HWADDR= and remove it, or make the MAC static in the VM config.
> So anyway, any pointers or suggestions you might have, would be greatly
> appreciated...
>
> Thanks.
HTH,
-- 
Ricardo J. Barberis 
Senior SysAdmin - I+D 
Dattatec.com :: Soluciones de Web Hosting 
Su Hosting hecho Simple..! 

------------------------------------------

Nota de confidencialidad: Este mensaje y los archivos adjuntos al mismo
son confidenciales, de uso exclusivo para el destinatario del mismo. La
divulgación y/o uso del mismo sin autorización por parte de Dattatec.com
queda prohibida. Dattatec.com no se hace responsable del mensaje por la
falsificación y/o alteración del mismo.
De no ser Ud. el destinatario del mismo y lo ha recibido por error, por
favor notifique al remitente y elimínelo de su sistema.

Confidentiality Note: This message and any attachments (the message) are
confidential and intended solely for the addressees. Any unauthorised use
or dissemination is prohibited by Dattatec.com. Dattatec.com shall not be
liable for the message if altered or falsified.
If you are not the intended addressee of this message, please cancel it
immediately and inform the sender.

Nota de Confidencialidade: Esta mensagem e seus eventuais anexos podem
conter dados confidenciais ou privilegiados. Se você os recebeu por engano
ou não é um dos destinatários aos quais ela foi endereçada, por favor
destrua-a e a todos os seus eventuais anexos ou copias realizadas,
imediatamente.
É proibida a retenção, distribuição, divulgação ou utilização de quaisquer
informações aqui contidas. Por favor, informe-nos sobre o recebimento
indevido desta mensagem, retornando-a para o autor.

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Xen users - Jan 2009 - Network Issues on Migration

[Xen-users] Network Issues on Migration

Re: [Xen-users] Network Issues on Migration

Re: [Xen-users] Network Issues on Migration