Martins Lazdans
2011-Jun-05 14:14 UTC
[Xen-users] dom0 networking getting screwd at random periods
Hello! I having a problem with dom0 (running Debian lenny, latest patches applied) networking getting stopped after random periods of time - it''s been 18 hours and it''s been 5 hours. dom0 got two NIC (peth0, peth1), two subnets (a, b), configured two bridges (eth0, eth1 - each subnet resides on different NIC). dom0 have each NIC configured with dedicated IP form correspongin subnets. So, eth0 serves subnet a, while eth1 is serving subnet b. In xend-config.sxp: (network-script network-bridge-wrapper) and network-bridge-wrapper is just: #!/bin/sh /etc/xen/scripts/network-bridge "$@" netdev=eth0 /etc/xen/scripts/network-bridge "$@" netdev=eth1 After random periods of time, eth0 goes down along with all domU networking attached to it. I''m unnable to connect or ping dom0 neither on peth0 nor peth1 nor any of domU attached to that bridge. However, domU using eth1 bridge are working just fine. I can SSH to domU and then SSH to dom0 on peth1 IP address (remember, I could not do that form "outside"). After I do `/etc/xen/scripts/network-bridge stop` and then start, dom0 networking comes back. I just do `brctl addif eth0 vifX.0` to attach domUs back and all works fine till next such event. Anyone had experienced such thing? I''ve been running this server for some three years w/o problems. This problem started around 31 of may. Maybe it has something to do with any of latest updates? There are no network spikes, no high load avg, nothing in logfiles, nothing on xm dmesg, ne domU restarts. I''ve been searching Google for some days now, and nothing comes up. Basicly, I''ve got the same issue as described here: http://lists.xensource.com/archives/html/xen-users/2008-03/msg00609.html Btw, I''ve got very similar issue twice with other sarver, differend hardware, different data center, CentOS 5.6, Xen 3.4.2. However, that server was using only one NIC so I''m not aware if this was excacly the same problem. I ran into this problem twice, about two months ago. However there is no errors since then which made me believe problem was solved with some OS updates? Back then I thought it''s beeing hardware error as I had this server deployed only 3 months before incident. My config is below. Many thanks! # uname -a Linux doom 2.6.26-2-xen-amd64 #1 SMP Tue Jan 25 06:13:50 UTC 2011 x86_64 GNU/Linux # xm info host : doom version : #1 SMP Tue Jan 25 06:13:50 UTC 2011 machine : x86_64 nr_cpus : 8 nr_nodes : 1 cores_per_socket : 4 threads_per_core : 1 cpu_mhz : 2200 hw_caps : 178bf3ff:efd3fbff:00000000:00000110:00802001:00000000:000007ff total_memory : 32767 free_memory : 2116 node_to_cpu : node0:0-7 xen_major : 3 xen_minor : 2 xen_extra : -1 xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 xen_scheduler : credit xen_pagesize : 4096 platform_params : virt_start=0xffff800000000000 xen_changeset : unavailable cc_compiler : gcc version 4.3.1 (Debian 4.3.1-2) cc_compile_by : waldi cc_compile_domain : debian.org cc_compile_date : Sat Jun 28 09:32:18 UTC 2008 xend_config_format : 4 -- Martins Lazdans _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Martins Lazdans
2011-Jun-15 09:26 UTC
Re: [Xen-users] dom0 networking getting screwd at random periods
Does anyone have any hint where to dig? Right now I wrote a script that checks ping to gateway and in case of timeouts restarts bridge and reattaches domUs to it. Right now bridge is stopping working like every 3 minutes. Can this be hardware related? One guy replied to me directly having the same problem on CentOS (I guess it was CentOS). Martins Lazdans wrote:> Hello! > > I having a problem with dom0 (running Debian lenny, latest patches > applied) networking getting stopped after random periods of time - it''s > been 18 hours and it''s been 5 hours. dom0 got two NIC (peth0, peth1), > two subnets (a, b), configured two bridges (eth0, eth1 - each subnet > resides on different NIC). dom0 have each NIC configured with dedicated > IP form correspongin subnets. So, eth0 serves subnet a, while eth1 is > serving subnet b. > > In xend-config.sxp: > (network-script network-bridge-wrapper) and network-bridge-wrapper is just: > > #!/bin/sh > /etc/xen/scripts/network-bridge "$@" netdev=eth0 > /etc/xen/scripts/network-bridge "$@" netdev=eth1 > > After random periods of time, eth0 goes down along with all domU > networking attached to it. I''m unnable to connect or ping dom0 neither > on peth0 nor peth1 nor any of domU attached to that bridge. > > However, domU using eth1 bridge are working just fine. I can SSH to domU > and then SSH to dom0 on peth1 IP address (remember, I could not do that > form "outside"). > > After I do `/etc/xen/scripts/network-bridge stop` and then start, dom0 > networking comes back. I just do `brctl addif eth0 vifX.0` to attach > domUs back and all works fine till next such event. > > Anyone had experienced such thing? I''ve been running this server for > some three years w/o problems. This problem started around 31 of may. > Maybe it has something to do with any of latest updates? > > There are no network spikes, no high load avg, nothing in logfiles, > nothing on xm dmesg, ne domU restarts. I''ve been searching Google for > some days now, and nothing comes up. > > Basicly, I''ve got the same issue as described here: > http://lists.xensource.com/archives/html/xen-users/2008-03/msg00609.html > > Btw, I''ve got very similar issue twice with other sarver, differend > hardware, different data center, CentOS 5.6, Xen 3.4.2. However, that > server was using only one NIC so I''m not aware if this was excacly the > same problem. I ran into this problem twice, about two months ago. > However there is no errors since then which made me believe problem was > solved with some OS updates? Back then I thought it''s beeing hardware > error as I had this server deployed only 3 months before incident. > > My config is below. > > Many thanks! > > # uname -a > Linux doom 2.6.26-2-xen-amd64 #1 SMP Tue Jan 25 06:13:50 UTC 2011 x86_64 > GNU/Linux > > # xm info > host : doom > version : #1 SMP Tue Jan 25 06:13:50 UTC 2011 > machine : x86_64 > nr_cpus : 8 > nr_nodes : 1 > cores_per_socket : 4 > threads_per_core : 1 > cpu_mhz : 2200 > hw_caps : > 178bf3ff:efd3fbff:00000000:00000110:00802001:00000000:000007ff > total_memory : 32767 > free_memory : 2116 > node_to_cpu : node0:0-7 > xen_major : 3 > xen_minor : 2 > xen_extra : -1 > xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 > hvm-3.0-x86_32p hvm-3.0-x86_64 > xen_scheduler : credit > xen_pagesize : 4096 > platform_params : virt_start=0xffff800000000000 > xen_changeset : unavailable > cc_compiler : gcc version 4.3.1 (Debian 4.3.1-2) > cc_compile_by : waldi > cc_compile_domain : debian.org > cc_compile_date : Sat Jun 28 09:32:18 UTC 2008 > xend_config_format : 4 > >-- Martins Lazdans _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Todd Deshane
2011-Jun-16 00:11 UTC
Re: [Xen-users] dom0 networking getting screwd at random periods
On Wed, Jun 15, 2011 at 5:26 AM, Martins Lazdans <marrtins@dqdp.net> wrote:> Does anyone have any hint where to dig? Right now I wrote a script that > checks ping to gateway and in case of timeouts restarts bridge and > reattaches domUs to it. Right now bridge is stopping working like every 3 > minutes. Can this be hardware related? >You may want to consider setting up the bridges manually with the CentOS networking scripts. This is actually the default way of doing things starting with Xen 4.1 http://wiki.xensource.com/xenwiki/HostConfiguration/Networking The standard bridging built into CentOS may end up being more stable for you. Hope that helps. Thanks, Todd> One guy replied to me directly having the same problem on CentOS (I guess it > was CentOS). > > Martins Lazdans wrote: >> >> Hello! >> >> I having a problem with dom0 (running Debian lenny, latest patches >> applied) networking getting stopped after random periods of time - it''s been >> 18 hours and it''s been 5 hours. dom0 got two NIC (peth0, peth1), two subnets >> (a, b), configured two bridges (eth0, eth1 - each subnet resides on >> different NIC). dom0 have each NIC configured with dedicated IP form >> correspongin subnets. So, eth0 serves subnet a, while eth1 is serving subnet >> b. >> >> In xend-config.sxp: >> (network-script network-bridge-wrapper) and network-bridge-wrapper is >> just: >> >> #!/bin/sh >> /etc/xen/scripts/network-bridge "$@" netdev=eth0 >> /etc/xen/scripts/network-bridge "$@" netdev=eth1 >> >> After random periods of time, eth0 goes down along with all domU >> networking attached to it. I''m unnable to connect or ping dom0 neither on >> peth0 nor peth1 nor any of domU attached to that bridge. >> >> However, domU using eth1 bridge are working just fine. I can SSH to domU >> and then SSH to dom0 on peth1 IP address (remember, I could not do that form >> "outside"). >> >> After I do `/etc/xen/scripts/network-bridge stop` and then start, dom0 >> networking comes back. I just do `brctl addif eth0 vifX.0` to attach domUs >> back and all works fine till next such event. >> >> Anyone had experienced such thing? I''ve been running this server for some >> three years w/o problems. This problem started around 31 of may. Maybe it >> has something to do with any of latest updates? >> >> There are no network spikes, no high load avg, nothing in logfiles, >> nothing on xm dmesg, ne domU restarts. I''ve been searching Google for some >> days now, and nothing comes up. >> >> Basicly, I''ve got the same issue as described here: >> http://lists.xensource.com/archives/html/xen-users/2008-03/msg00609.html >> >> Btw, I''ve got very similar issue twice with other sarver, differend >> hardware, different data center, CentOS 5.6, Xen 3.4.2. However, that server >> was using only one NIC so I''m not aware if this was excacly the same >> problem. I ran into this problem twice, about two months ago. However there >> is no errors since then which made me believe problem was solved with some >> OS updates? Back then I thought it''s beeing hardware error as I had this >> server deployed only 3 months before incident. >> >> My config is below. >> >> Many thanks! >> >> # uname -a >> Linux doom 2.6.26-2-xen-amd64 #1 SMP Tue Jan 25 06:13:50 UTC 2011 x86_64 >> GNU/Linux >> >> # xm info >> host : doom >> version : #1 SMP Tue Jan 25 06:13:50 UTC 2011 >> machine : x86_64 >> nr_cpus : 8 >> nr_nodes : 1 >> cores_per_socket : 4 >> threads_per_core : 1 >> cpu_mhz : 2200 >> hw_caps : >> 178bf3ff:efd3fbff:00000000:00000110:00802001:00000000:000007ff >> total_memory : 32767 >> free_memory : 2116 >> node_to_cpu : node0:0-7 >> xen_major : 3 >> xen_minor : 2 >> xen_extra : -1 >> xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 >> hvm-3.0-x86_32p hvm-3.0-x86_64 >> xen_scheduler : credit >> xen_pagesize : 4096 >> platform_params : virt_start=0xffff800000000000 >> xen_changeset : unavailable >> cc_compiler : gcc version 4.3.1 (Debian 4.3.1-2) >> cc_compile_by : waldi >> cc_compile_domain : debian.org >> cc_compile_date : Sat Jun 28 09:32:18 UTC 2008 >> xend_config_format : 4 >> >> > > -- > Martins Lazdans > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users >-- Todd Deshane http://www.linkedin.com/in/deshantm http://www.xen.org/products/cloudxen.html http://runningxen.com/ _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Martins Lazdans
2011-Jun-17 11:29 UTC
Re: [Xen-users] dom0 networking getting screwd at random periods
It seems that troublemaker was NICs afterall. I put some braodcom dual gbit into server and after that it''s been 36h and no problems at all. Before that i was using built-int nvidia NICs (forcedeth driver). It''s strange however why restarting bridges helped? Todd Deshane wrote:> On Wed, Jun 15, 2011 at 5:26 AM, Martins Lazdans <marrtins@dqdp.net> wrote: >> Does anyone have any hint where to dig? Right now I wrote a script that >> checks ping to gateway and in case of timeouts restarts bridge and >> reattaches domUs to it. Right now bridge is stopping working like every 3 >> minutes. Can this be hardware related? >> > > You may want to consider setting up the bridges manually with the > CentOS networking scripts. > > This is actually the default way of doing things starting with Xen 4.1 > http://wiki.xensource.com/xenwiki/HostConfiguration/Networking > > The standard bridging built into CentOS may end up being more stable for you. > > Hope that helps. > > Thanks, > Todd > >> One guy replied to me directly having the same problem on CentOS (I guess it >> was CentOS). >> >> Martins Lazdans wrote: >>> Hello! >>> >>> I having a problem with dom0 (running Debian lenny, latest patches >>> applied) networking getting stopped after random periods of time - it''s been >>> 18 hours and it''s been 5 hours. dom0 got two NIC (peth0, peth1), two subnets >>> (a, b), configured two bridges (eth0, eth1 - each subnet resides on >>> different NIC). dom0 have each NIC configured with dedicated IP form >>> correspongin subnets. So, eth0 serves subnet a, while eth1 is serving subnet >>> b. >>> >>> In xend-config.sxp: >>> (network-script network-bridge-wrapper) and network-bridge-wrapper is >>> just: >>> >>> #!/bin/sh >>> /etc/xen/scripts/network-bridge "$@" netdev=eth0 >>> /etc/xen/scripts/network-bridge "$@" netdev=eth1 >>> >>> After random periods of time, eth0 goes down along with all domU >>> networking attached to it. I''m unnable to connect or ping dom0 neither on >>> peth0 nor peth1 nor any of domU attached to that bridge. >>> >>> However, domU using eth1 bridge are working just fine. I can SSH to domU >>> and then SSH to dom0 on peth1 IP address (remember, I could not do that form >>> "outside"). >>> >>> After I do `/etc/xen/scripts/network-bridge stop` and then start, dom0 >>> networking comes back. I just do `brctl addif eth0 vifX.0` to attach domUs >>> back and all works fine till next such event. >>> >>> Anyone had experienced such thing? I''ve been running this server for some >>> three years w/o problems. This problem started around 31 of may. Maybe it >>> has something to do with any of latest updates? >>> >>> There are no network spikes, no high load avg, nothing in logfiles, >>> nothing on xm dmesg, ne domU restarts. I''ve been searching Google for some >>> days now, and nothing comes up. >>> >>> Basicly, I''ve got the same issue as described here: >>> http://lists.xensource.com/archives/html/xen-users/2008-03/msg00609.html >>> >>> Btw, I''ve got very similar issue twice with other sarver, differend >>> hardware, different data center, CentOS 5.6, Xen 3.4.2. However, that server >>> was using only one NIC so I''m not aware if this was excacly the same >>> problem. I ran into this problem twice, about two months ago. However there >>> is no errors since then which made me believe problem was solved with some >>> OS updates? Back then I thought it''s beeing hardware error as I had this >>> server deployed only 3 months before incident. >>> >>> My config is below. >>> >>> Many thanks! >>> >>> # uname -a >>> Linux doom 2.6.26-2-xen-amd64 #1 SMP Tue Jan 25 06:13:50 UTC 2011 x86_64 >>> GNU/Linux >>> >>> # xm info >>> host : doom >>> version : #1 SMP Tue Jan 25 06:13:50 UTC 2011 >>> machine : x86_64 >>> nr_cpus : 8 >>> nr_nodes : 1 >>> cores_per_socket : 4 >>> threads_per_core : 1 >>> cpu_mhz : 2200 >>> hw_caps : >>> 178bf3ff:efd3fbff:00000000:00000110:00802001:00000000:000007ff >>> total_memory : 32767 >>> free_memory : 2116 >>> node_to_cpu : node0:0-7 >>> xen_major : 3 >>> xen_minor : 2 >>> xen_extra : -1 >>> xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 >>> hvm-3.0-x86_32p hvm-3.0-x86_64 >>> xen_scheduler : credit >>> xen_pagesize : 4096 >>> platform_params : virt_start=0xffff800000000000 >>> xen_changeset : unavailable >>> cc_compiler : gcc version 4.3.1 (Debian 4.3.1-2) >>> cc_compile_by : waldi >>> cc_compile_domain : debian.org >>> cc_compile_date : Sat Jun 28 09:32:18 UTC 2008 >>> xend_config_format : 4 >>> >>> >> -- >> Martins Lazdans >> >> _______________________________________________ >> Xen-users mailing list >> Xen-users@lists.xensource.com >> http://lists.xensource.com/xen-users >> > > >-- Martins Lazdans _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users