thr3ads.net - Xen users - [Xen-users] problem with static routes [Aug 2010]

If this information is useful, please help other people find it:
Share via:

Greg Woods

2010-Aug-23 15:45 UTC

[Xen-users] problem with static routes

The basic problem is that when I reboot a node in my cluster, it comes
back up without its static routes.

I posted this first to the linux-ha list, but I have since determined
that the problem happens when Xen is stopped or started. The original
post appears below. Since then, I have determined that if I add code to
my bridge-wrapper script (which basically calls the network-bridge
script once for each interface) that manually adds the static routes
back in, the problem goes away on startup, but still occurs when I put a
node into standby (which takes down all the shared resources including
Xen). 

I realize that I am running an old version of Xen, but I am mostly
wondering if this behavior is intentional or if there is a less kludgy
workaround (like a config parameter that can be set that I have missed).

Am I the only one in the world who has to use static routes?
Unfortunately I am stuck with them because we have a /16 address space
that is partially inside and partially outside our security perimeter,
which means some subnets are reached through the external interface and
some through the internal one; these are defined with static routes.

TIA,
--Greg

=============================================================================OS:
CentOS 5.5
heartbeat: heartbeat-3.0.3-2.3.el5 (latest from clusterlabs)
pacemaker: pacemaker-1.0.9.1-1.15.el5 (latest from clusterlabs)

If it matters, this cluster is primarily used to run Xen virtual
machines (xen-3.0.3-105.el5_5.5 kernel-2.6.18-194.11.1.el5xen latest
from CentOS)

I have been looking off and on for the source of this problem for quite
a while without finding what is causing it. The basic problem is that
when I reboot a node in my cluster, it comes back up without its static
routes. Adding them back in manually works; they stay until the next
reboot. These are defined in /etc/sysconfig/static-routes and are added
by the network service at boot time. I have been able to pretty much
rule out the boot process itself as the source of the problem. I added a
"netstat -r -n > /tmp/static-routes" command to the rc.local file
which
is the very last thing run at boot time and the routes are there. I have
also tried putting nodes into standby (crm node standby) and back
online, and the routes stay there through that. But once I log in after
a reboot, the static routes are gone and I have to manually re-add them.

I can probably work around this using a hideous kludge like having the
rc.local file run a background job that sleeps for a couple of minutes,
then adds the routes, but that doesn''t really fix the issue and
isn''t
guaranteed to work reliably (obviously high reliability is important or
I wouldn''t be using HA in the first place).

Has anyone ever seen this before or have any clue where I can look to
troubleshoot this?

Thanks in advance,
--Greg


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Fajar A. Nugraha

2010-Aug-23 16:32 UTC

head link

Re: [Xen-users] problem with static routes

On Mon, Aug 23, 2010 at 10:45 PM, Greg Woods <woods@ucar.edu>
wrote:> The basic problem is that when I reboot a node in my cluster, it comes
> back up without its static routes.
... and how did you determine that your setup works without Xen?

> These are defined in /etc/sysconfig/static-routes and are added
> by the network service at boot time.
If they work on normal kernel (without Xen), then you should file a
bug to RedHat. Generally they''d maintain the default network-bridge
script to "just work". If it''s really xen networking that
causes
problem, you could simply discard the default network-bridge script
and create your bridges manually using
/etc/sysconfig/network-scripts/ifcfg-*

I''m not sure you''ve configured static routes correctly though.
See
http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.5/html/Deployment_Guide/s1-networkscripts-static-routes.html
http://www.akadia.com/services/redhat_static_routes.html

-- 
Fajar

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Greg Woods

2010-Aug-23 17:25 UTC

head link

Re: [Xen-users] problem with static routes

On Mon, 2010-08-23 at 23:32 +0700, Fajar A. Nugraha wrote:
> 
> ... and how did you determine that your setup works without Xen?
I have not definitively done that yet. It would require taking down the
entire cluster including all the VMs, then removing the xen resources
from the cluster configuration, resulting in an extended down time for
all the VMs. I did determine that putting the routes back in the bridge
startup script works around the issue.

> I''m not sure you''ve configured static routes correctly
though. See
>
http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.5/html/Deployment_Guide/s1-networkscripts-static-routes.html
I will have to give this a try. First I''ve ever heard of a route-eth0
file. The network service startup script (/etc/rc.d/init.d/network)
clearly uses the /etc/sysconfig/static-routes file. 

One thing I have determined is that whatever is dropping the routes, it
is happening after booting is complete. Since rc.local is the very last
boot-time service executed, and "netstat -r -n" shows that the static
routes are properly there at that time, it has to be something that
occurs after initial boot that is removing them. The main reason for
suspecting xen networking is that there are "ip route delete" commands
in some of the scripts. That is the only place on the system I have
found anything like this. I have other high availability clusters that
do not have Xen where this issue does not occur.

I realize this is not "definitive proof" that Xen is at fault. I am
not
trying to point fingers or file bug reports at this point, I am just
trying to troubleshoot.

--Greg

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Fajar A. Nugraha

2010-Aug-23 17:45 UTC

head link

Re: [Xen-users] problem with static routes

On Tue, Aug 24, 2010 at 12:25 AM, Greg Woods <woods@ucar.edu>
wrote:> On Mon, 2010-08-23 at 23:32 +0700, Fajar A. Nugraha wrote:
>
>>
>> ... and how did you determine that your setup works without Xen?
>
> I have not definitively done that yet. It would require taking down the
> entire cluster including all the VMs, then removing the xen resources
> from the cluster configuration, resulting in an extended down time for
> all the VMs. I did determine that putting the routes back in the bridge
> startup script works around the issue.
Why take down everything?

What I meant was whether the static route setup works without Xen. It
should be something as simple as:
- installing normal (non-xen) kernel (if not already installed) on one
of the nodes
- reboot choosing that kernel

That would at least verify whether the routes stay on after booting or
not, and whether some startup script removes it. The only different
(startup-script wise) between the default normal and xen kernel setup
is that xen''s network-bridge script shouldn''t be running.

If it works -> probably xen''s network-bridge does something wrong,
and
you should definitely file a bug to RH
If it still doesn''t work (routes still missing several minutes after
boot, or not appearing at all) -> something else is causing problems
in your setup.
> The main reason for
> suspecting xen networking is that there are "ip route delete"
commands
> in some of the scripts. That is the only place on the system I have
> found anything like this. I have other high availability clusters that
> do not have Xen where this issue does not occur.
Is it part of /etc/xen/scripts/network-bridge?

If yes, you can also test disabling the script and create your own
bridge. Someting like
https://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.5/html/Virtualization_Guide/sect-Virtualization-Network_Configuration-Bridged_networking_with_libvirt.html

With this setup, don''t forget that the IP address settings will be in
the bridge''s config file (br0, xenbr0, or whatever bridge name you
choose) instead of eth0.

-- 
Fajar

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Greg Woods

2010-Aug-23 18:54 UTC

head link

Re: [Xen-users] problem with static routes

> What I meant was whether the static route setup works without Xen. It
> should be something as simple as:
> - installing normal (non-xen) kernel (if not already installed) on one
> of the nodes
> - reboot choosing that kernel
Not so simple, because if I boot a cluster node that doesn''t start Xen
resources properly, this can cause a stonith death match (where the two
nodes keep killing each other, or the working node keeps killing the
non-working node). What this means is that I would have to disable the
cluster software as well (at least on the testing node), which makes it
not quite this simple. Still it is a test that I could do and I''ll find
a time when I can do it. It requires rebooting some of the DomU''s to
move them between nodes, so I can''t do this during the work day.

For the present this is non-urgent because adding the routes back at the
end of the network-bridge script at least causes the routes to be
present after startup. The only problem is that the routes disappear
again when a cluster node is taken offline (out of the cluster), but
that is never left that way for a long period of time, so the workaround
is acceptable for now. 

The static routes, of course, only affect the Dom0 since the DomU''s
have
their own routing tables. Mainly it causes my backups to our mass
storage device to fail because the connection comes from the wrong
interface (and therefore the wrong IP address). So while I do want to
eventually track this down, it''s not an urgent priority.

Thanks for the pointers though, it should help in troubleshooting.
> 
> Is it part of /etc/xen/scripts/network-bridge?
Yes. There are a lot of "ip route" commands in there, including
"ip
route del". But it is fairly convoluted where commands to execute are
created with sed scripts, so it isn''t exactly clear what it is doing or
trying to do there. I don''t fully understand how Linux bridging works
either.

--Greg



_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Xen users - Aug 2010 - problem with static routes

[Xen-users] problem with static routes

Re: [Xen-users] problem with static routes

Re: [Xen-users] problem with static routes

Re: [Xen-users] problem with static routes

Re: [Xen-users] problem with static routes