Has anyone ever setup a pair of fileservers running Samba such that each server can takeover the other server's functionality and identity? It seems to me that a Samba HA setup would be very similar to a virtual server Samba setup. In other words, use the 'include' directive in conjunction with the %L variable to make each smbd process act as the server named by the client. So far so good. But in an HA setup, when one server fails and the other one takes over it's IP addresses and services, the NetBIOS identity of the failed server must also be passed to the remaining server. How should one accomplish this? Can it be as simple as restarting nmbd with a different config file naming the failed server's NetBIOS name and aliases as aliases of the server that takes over? Or should one use the 'bind interfaces only' and 'interfaces' configuration parameters in two configuration files and run two nmbd processes on the server that takes over? Looking at the source I see that SO_REUSEADDR is set before binding all listening TCP sockets and, most importantly, all UDP sockets used by nmbd. This should allow two nmbd processes to run in daemon mode if they have 'bind interfaces only' set to 'true' and disjoint interface address sets specified in their 'interfaces' parameter. [When using 'bind interfaces only' nmbd will open a socket for each interface and a socket bound to INADDR_ANY for receiving broadcast UDP packets, thus my interest in SO_REUSEADDR. I.e., will the two nmbd processes co-exist?] Thus, the takeover process for SMB services would consist simply of starting an nmbd process with a smb.conf file specifying 'bind interfaces only = true' and an 'interfaces' list that contains only the IP addresses of the failover host. As for the smbd processes, the use of an 'include' directive that use '%L' substitutions in the path of the included file should suffice. Yes? Or should the takeover server simply change nmbd's configuration and [not strictly necessary] restart/HUP nmbd so it takes the failed server's NetBIOS name and aliases and adds them as aliases to the current server? I'm leaning to having two nmbd processes. Has anyone done this before? What platforms does it work with? I'm using Samba 1.9.18pl10 on Solaris 2.6 systems. Nico PS: Which is the most appropriate list for these questions? PPS: I did search the archives and the standard documentation, a bit.
The point is this: - When one server takes over the other, it takes over the failed host's disks, IP addresses and other necessary resources, then starts or updates various services so as to present, to the clients, the illusion that the failed hosts is still there. - When the failed host is brought back online one would like to have it take over its resources and services. - There's nothing to do as far as taking over a host's hostname/aliases when DNS is the naming service. You just take over the IP addresses. - But if the naming service is WINS/NetBIOS, then you have a problem, because each host must register with WINS and/or answer broadcast NetBIOS requests. Thus, when taking over a failed host one needs to take over its NetBIOS/WINS hostname and aliases but without just re-mapping them to the takeover host's IP addresses. They failed host's names must still resolve to the failed host's IP addresses so that you can later restore the services to that host once its revived. This is very important because NT clients cache their hostname lookups (like any decent platform). So when an NT client has to reconnect to its file servers it reconnects to the same IP it had been connected to, unless (so I understand) its cached hostname lookups have expired. At any rate. What I was thinking of *does* work. I've just tested it. What I suggested earlier, to recap, is: - Keep two separate configuration files for nmbd: one for each host in an HA pair. Copies of both files will be available locally on both hosts. - Each of those two files is associated with the identity of one of the two HA hosts. Thus each file lists different netbios names and aliases. - To make this work you must specify: bind interfaces only = true And you must list the IP addresses of the host to which the given config file belongs with the 'interfaces' parameter. - Under normal circumstances each HA host runs just ONE nmbd with the configuration file that corresponds to that host. - When one host takes over the other, it starts a second nmbd with the failed host's configuration file. - When the failed host is restored, the takeover host kills that second nmbd process as part of the process of relinquishing the failed host's resources; then the failed host, now restored, takes over its own resources and starts its services, as normal. And thus all goes back to normal. It works for me. I'll stick with this unless there's better way. You may want to include a note about all this in the standard Samba docs... Thanks to all that replied, Nico On Fri, Jan 29, 1999 at 12:12:34AM +1100, David Collier-Brown wrote:> Nicolas Williams wrote: > > But in an HA setup, when one server fails and the other one takes over > > it's IP addresses and services, the NetBIOS identity of the failed > > server must also be passed to the remaining server. > > My limited understanding is that the "new" server > needs to broadcast it's identity at startup time > in order to "claim" its name (which samba does). > > If the name is a duplicate, the "name server" > queries the previous holder of the name to see if it's > still alive. If not, the new server gets the name. > If not, it gets told to go away. > > In the above "name server" is a deliberately vague term. > It's a WINS server, a browse master or an elephant playing > the trombone: I don't know which (:-)) > > --dave > -- > David Collier-Brown, | Always do right. This will gratify some people > 185 Ellerslie Ave., | and astonish the rest. -- Mark Twain > Willowdale, Ontario | http://java.science.yorku.ca/~davecb > Work: (905) 477-0437 Home: (416) 223-8968 Email: davecb@canada.sun.com
> We need to modify access rights on the Unix server so that > when file(s) are created in a directory the file(s) get > the group (rw-) access privileges. When creating files in > Unix the files do receive the group (rw-) access rights. > However, files created on a PC with WINDOWS 95 within the > same directory will not receive the group (rw-) access > rights.> We have added the following to the smb.conf file:> [newsletter] > comment = newsletter > path = /some/newsletter > valid users = @staff @editors > read only = yes > write list = @editorsTo do this I would add: force group = newsletter create mask = 0760 The group would be whatever group owns the files. Dan --------------------------------------------------------- Dan Roscigno ddr@phys.ufl.edu (352)392-4028 Physics Dept. University of Florida 2122 New Physics Building
Nicolas, On Sat, 30 Jan 1999 10:28:03 +1100, Nicolas Williams wrote:> - When one server takes over the other, it takes over the failed host's > disks, IP addresses and other necessary resources, then starts or > updates various services so as to present, to the clients, the > illusion that the failed hosts is still there.That's exactly what we do in our HA cluster. Both machines each can be in one of three states: O, B and OB meaning "Original", "Backup" and "Original+Backup". I'm not so happy with the terms Original and Backup because they imply that one machine only does some kind of warm standby which is not the case. Instead of this, O has a set of functions, B has a set of functions and in the backup case the surviving machine has to provide both sets of functionality. We did not bother with all this nmbd and interface stuff. Instead we did as follows: file smb.conf.machine1: ... all the shares on machine1 ... file smb.conf.machine2: ... all the shares on machine2 ... file smb.conf.O: include = /path/smb.conf.machine1 file smb.conf.B: include = /path/smb.conf.machine2 file smb.conf.OB: include = /path/smb.conf.machine1 include = /path/smb.conf.machine2 And now for the trick: file smb.conf: ... include = /path/smb.conf.HA.state ... In our HA state changing procedure we do something like echo "include = /path/smb.conf."`get_HA_status` >/path/smb.conf.HA.state When one machine fails and the other one takes over (state transition from O or B to OB) the file smb.conf.HA.state gets modified. As O or B went down, all connections between any clients and the failed machine break (naturally :-). The clients will have to reconnect (what they do silently in normal cases). OB comes up with the interface of the now dead machine and does an "ARP reply broadcast" from its new interface so everyone in the segment learns the new MAC address.>From the view of OB clients (re)connecting to the failed machine produce newconnections. In this case smbd will re-read smb.conf, stumble over smb.conf.HA.state and will have the shares of the failed machine available. Now how to get back from OB to O and B on different machines? We decided not to give back functionality within production times. The failed machine gets repaired and waits for return of functionality. We wait until users leave office and switch back the state. There are issues like "who is local browse master". This one is tricked the easy way: Just let them fight for it, running one with os level = 65 and the other one with os level = 64. Other issues are handled with the smb.conf.O|B|OB mimic. Regards, Robert -- --------------------------------------------------------------- Robert.Dahlem@gmx.net Radio Bornheim - 2:2461/332@fidonet +49-69-4930830 (ZyX, V34) 2:2461/326@fidonet +49-69-94414444 (ISDN X.75) ---------------------------------------------------------------