Hammitt, Charles Allen
2010-Feb-10 19:07 UTC
[Samba] samba ctdb doesn't set default gateway properly on second node;
Hey all, I am trying to figure out a weird problem. I have a two node ctdb samba setup where the first node acts as expected setting the default gateway (startup shutdown takeip recoveryip) if there is a network blip / link down, but the second node does not set the default gateway properly on the initial recovery of the link after the blip has occurred. If I restart the machine or restart the ctdb service, then it does come back with the proper gateway and traffic flows as expected. Problem is I wouldn't think I need to do a restart or reboot to get the interface default gateway/route to be reset. Note that the interface address itself is set, but routing is the problem. Any thoughts here? I was going to send off this message yesterday but I started poking around in the event.d directory looking around for something that might help. I was reading the README file there and this note struck me as kind of odd: "Each NN must be unique and duplicates will cause undefined behaviour. I.e. having both 10.interfaces and 10.otherstuff is not allowed." I say this is odd as I have three 11's which were put in on the default install: -rwxr-xr-x 1 root root 2178 Jun 29 2009 11.natgw -rwxr-xr-x 1 root root 340 Jul 17 2009 11.route -rwxr-xr-x 1 root root 798 Jun 29 2009 11.routing So either the note is wrong, or perhaps it is part of my bug. In addition, I read in the 11.route file the comment about adding a static-routes file. I had not previously had this file so I thought, this is surely the cause of my problems. But no...sadly it is not. I had another outage today because the gateway or route was not set when the link came back up. Again, any thoughts? Noting that the only thing I do to get the interfaces to work as expected is to restart the ctdb service. Interestingly enough, the first node of the cluster which is the exact same config never shows this particular problem. Moreover, the static-routes note is not in the documentation and it probably should be put there. The only place I found out about the static-routes were by reading the 11.route file itself. So it might be legacy and not even needed or used nowadays...idk. rpm -qa|grep ctdb ctdb-1.0.86-1 uname -a Linux 2.6.18-128.7.1.el5 #1 SMP Wed Aug 19 04:00:49 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux Regards, Charles
Hammitt, Charles Allen
2010-Feb-10 19:35 UTC
[Samba] samba ctdb doesn't set default gateway properly on second node;
I guess that would need to be a "service ctdb restart" Is that even a very good idea to do? I put an "ip route add" command in the rc.local file already. And it doesn't seem to help. Of course, /etc/sysconfig/network is set properly as well. If I run the ip route add command on the command line, then the problem there is for some reason it adds an H flag for the routing table information which can be seen from "netstat -rn". The problem with the H flag is that it only seems to allow local access to that interface (like 127.0.0.1 loopback) and not other host nodes...which is the problem in the first place. Also, this won't address link down issues; only will ensure that the interface route and default gateway info is correct upon system reboot. From: Sent: Wednesday, February 10, 2010 2:16 PM To: Hammitt, Charles Allen Subject: RE: samba ctdb doesn't set default gateway properly on second node; Without looking into it further My first thought is to put a correction item in the /etc/rc.local. This is the last thing execute during the reboot. From: Hammitt, Charles Allen Sent: Wednesday, February 10, 2010 2:08 PM To: samba at lists.samba.org Subject: samba ctdb doesn't set default gateway properly on second node; Hey all, I am trying to figure out a weird problem. I have a two node ctdb samba setup where the first node acts as expected setting the default gateway (startup shutdown takeip recoveryip) if there is a network blip / link down, but the second node does not set the default gateway properly on the initial recovery of the link after the blip has occurred. If I restart the machine or restart the ctdb service, then it does come back with the proper gateway and traffic flows as expected. Problem is I wouldn't think I need to do a restart or reboot to get the interface default gateway/route to be reset. Note that the interface address itself is set, but routing is the problem. Any thoughts here? I was going to send off this message yesterday but I started poking around in the event.d directory looking around for something that might help. I was reading the README file there and this note struck me as kind of odd: "Each NN must be unique and duplicates will cause undefined behaviour. I.e. having both 10.interfaces and 10.otherstuff is not allowed." I say this is odd as I have three 11's which were put in on the default install: -rwxr-xr-x 1 root root 2178 Jun 29 2009 11.natgw -rwxr-xr-x 1 root root 340 Jul 17 2009 11.route -rwxr-xr-x 1 root root 798 Jun 29 2009 11.routing So either the note is wrong, or perhaps it is part of my bug. In addition, I read in the 11.route file the comment about adding a static-routes file. I had not previously had this file so I thought, this is surely the cause of my problems. But no...sadly it is not. I had another outage today because the gateway or route was not set when the link came back up. Again, any thoughts? Noting that the only thing I do to get the interfaces to work as expected is to restart the ctdb service. Interestingly enough, the first node of the cluster which is the exact same config never shows this particular problem. Moreover, the static-routes note is not in the documentation and it probably should be put there. The only place I found out about the static-routes were by reading the 11.route file itself. So it might be legacy and not even needed or used nowadays...idk. rpm -qa|grep ctdb ctdb-1.0.86-1 uname -a Linux 2.6.18-128.7.1.el5 #1 SMP Wed Aug 19 04:00:49 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux Regards, Charles