Superb. I'll take a look. Thank you On Tue, Oct 6, 2020 at 1:46 AM Martin Schwenke <martin at meltin.net> wrote:> Hi Bob, > > On Mon, 5 Oct 2020 09:31:59 -0400, Robert Buck <robert.buck at som.com> > wrote: > > > It seems as though, when I go from `clustering = no` to `clustering > yes`, > > if I do a domain join, it will fail. However, if I do a `systemctl > restart > > ctdb` (knowing full well it will fail every time), if after this I add a > > sleep(15), then do a domain join, then do a `systemctl restart ctdb`, > then > > the join will have worked, AND CTDB will start properly. So in a > nutshell, > > in Ansible, > > > - do all the samba setup without clustering on, even winbind setup; > verify > > it works > > - do all the ctdb setup and turn clustering on, but we must again > > domain-join, but only after having run restart-ctdb once first, then > after > > the join, do another restart-ctdb > > > Only then does the system come to a stable point. > > > > This appears to be the only way to have a repeatable deployment process > of > > CTDB over multiple regions globally. > > > > Any thoughts or recommendations? > > I think we need to document this better. ;-) > > Although we've tried to explain things well in the wiki there are still > gaps... and this is one of them. Although some of the tutorials around > the place are dated they fill in some of these gaps nicely. > > So, I'll repeat what Ralph said but with a few more words of > explanation... :-) > > When clustering is enabled a new set of databases, managed by CTDB, > replaces those that were being used before. This means that even if a > node was previously joined to a domain it will no longer be joined > after you enable clustering. The credentials have basically > disappeared... unless you (immediately?) disable clustering again. > > In general, before you enable the 49.winbind and 50.samba event > scripts, you should start CTDB and join the domain. > > Then you can enable those scripts and restart CTDB so it will start the > services. > > Since you mention Ansible, I'll point you at autocluster, which I > rewrote (last year?) using Vagrant and Ansible. It is a testing tool > to generate virtual clusters for (developer) testing of Clustered > Samba. It has a lot of clues that need to make their way into > documentation. We don't do releases but there is a git repository at: > > https://git.samba.org/?p=autocluster.git;a=summary > > Here's the sequence of tasks that we use to configure a "nas" node: > > > https://git.samba.org/?p=autocluster.git;a=blob;f=ansible/node/roles/nas/tasks/main.yml;h=0c444bd77c0a883b1c608fcd6398592be8e962de;hb=73b6a2844e827b4c2c2b5d5946cc14c7c61d7d75 > > In particular, this file disables the event scripts: > > > https://git.samba.org/?p=autocluster.git;a=blob;f=ansible/node/roles/nas/tasks/generic/ctdb.yml;h=0271d2a11cff0e9359e115f20c5e641e3279c3ea;hb=73b6a2844e827b4c2c2b5d5946cc14c7c61d7d75 > > and later the domain is joined: > > > https://git.samba.org/?p=autocluster.git;a=blob;f=ansible/node/roles/nas/tasks/generic/ctdb-with-samba-nfs.yml;h=b6f9c6d2354e4922535d9048648df4e9e5161689;hb=73b6a2844e827b4c2c2b5d5946cc14c7c61d7d75 > > Note that I'm not an Ansible expert and these Ansible playbooks aren't > necessarily idempotent. At the moment it all works well enough and I > hope to get opportunities to clean it up more later. It is very much > aimed at developer testing... but it would be cool if a subset of it > could be used to configure "real" Samba clusters. > > However, given that you mentioned Ansible I figure that it might > document certain things for you nice and clearly. It isn't missing > anything obvious because we use it to build several test clusters each > night. > > One day later this week I'll try to take a look at the wiki and add some > documentation for joining a domain... > > peace & happiness, > martin > > --BOB BUCK SENIOR PLATFORM SOFTWARE ENGINEER SKIDMORE, OWINGS & MERRILL 7 WORLD TRADE CENTER 250 GREENWICH STREET NEW YORK, NY 10007 T (212) 298-9624 ROBERT.BUCK at SOM.COM
Hi Martin, you seem to do a lot of work on CTDB. Let me ask a question... Is there a way to segment CTDB/Samba to minimize chatter? Specifically, what I have in mind... In recent years advances have been made in distributed SQL databases (ideas which are applicable here) whereby the communication profile between peers are minimized, and synchronization is never necessary except in circumstances where a peer has the data resident in memory and needs to perform an update (requiring an MVCC lock). Through a catalog you can find out who is the chairman for any particular record, thus be able to know who manages locks related to it, as well as handles contended updates. In this way, communication tends to be segmented, and lock management is localized. It seems to us, and we need to measure with wireshark, that CTDB with Samba forms a full-mesh network, yes? And because of the architecture and communication profile, performance of the system is about 1/100th of what it is when turned off. (Please bear in mind we're talking about geo-distributed deployments here, not ones localized to a single region, where latency is not an issue, so we're speaking of distances upwards of 10,000 miles longest leg, and 5000 miles on average.) I've some experience in the area of distributed SQL databases, and it seems that perhaps some of the architectural patterns to optimize communications could apply here? All that said, if you know a way to optimize out a 1:100 performance penalty of using CTDB, please let us know. Really appreciate your feedback and help. Bob On Tue, Oct 6, 2020 at 8:24 AM Robert Buck <robert.buck at som.com> wrote:> Superb. I'll take a look. Thank you > > On Tue, Oct 6, 2020 at 1:46 AM Martin Schwenke <martin at meltin.net> wrote: > >> Hi Bob, >> >> On Mon, 5 Oct 2020 09:31:59 -0400, Robert Buck <robert.buck at som.com> >> wrote: >> >> > It seems as though, when I go from `clustering = no` to `clustering >> yes`, >> > if I do a domain join, it will fail. However, if I do a `systemctl >> restart >> > ctdb` (knowing full well it will fail every time), if after this I add a >> > sleep(15), then do a domain join, then do a `systemctl restart ctdb`, >> then >> > the join will have worked, AND CTDB will start properly. So in a >> nutshell, >> > in Ansible, >> >> > - do all the samba setup without clustering on, even winbind setup; >> verify >> > it works >> > - do all the ctdb setup and turn clustering on, but we must again >> > domain-join, but only after having run restart-ctdb once first, then >> after >> > the join, do another restart-ctdb >> >> > Only then does the system come to a stable point. >> > >> > This appears to be the only way to have a repeatable deployment process >> of >> > CTDB over multiple regions globally. >> > >> > Any thoughts or recommendations? >> >> I think we need to document this better. ;-) >> >> Although we've tried to explain things well in the wiki there are still >> gaps... and this is one of them. Although some of the tutorials around >> the place are dated they fill in some of these gaps nicely. >> >> So, I'll repeat what Ralph said but with a few more words of >> explanation... :-) >> >> When clustering is enabled a new set of databases, managed by CTDB, >> replaces those that were being used before. This means that even if a >> node was previously joined to a domain it will no longer be joined >> after you enable clustering. The credentials have basically >> disappeared... unless you (immediately?) disable clustering again. >> >> In general, before you enable the 49.winbind and 50.samba event >> scripts, you should start CTDB and join the domain. >> >> Then you can enable those scripts and restart CTDB so it will start the >> services. >> >> Since you mention Ansible, I'll point you at autocluster, which I >> rewrote (last year?) using Vagrant and Ansible. It is a testing tool >> to generate virtual clusters for (developer) testing of Clustered >> Samba. It has a lot of clues that need to make their way into >> documentation. We don't do releases but there is a git repository at: >> >> https://git.samba.org/?p=autocluster.git;a=summary >> >> Here's the sequence of tasks that we use to configure a "nas" node: >> >> >> https://git.samba.org/?p=autocluster.git;a=blob;f=ansible/node/roles/nas/tasks/main.yml;h=0c444bd77c0a883b1c608fcd6398592be8e962de;hb=73b6a2844e827b4c2c2b5d5946cc14c7c61d7d75 >> >> In particular, this file disables the event scripts: >> >> >> https://git.samba.org/?p=autocluster.git;a=blob;f=ansible/node/roles/nas/tasks/generic/ctdb.yml;h=0271d2a11cff0e9359e115f20c5e641e3279c3ea;hb=73b6a2844e827b4c2c2b5d5946cc14c7c61d7d75 >> >> and later the domain is joined: >> >> >> https://git.samba.org/?p=autocluster.git;a=blob;f=ansible/node/roles/nas/tasks/generic/ctdb-with-samba-nfs.yml;h=b6f9c6d2354e4922535d9048648df4e9e5161689;hb=73b6a2844e827b4c2c2b5d5946cc14c7c61d7d75 >> >> Note that I'm not an Ansible expert and these Ansible playbooks aren't >> necessarily idempotent. At the moment it all works well enough and I >> hope to get opportunities to clean it up more later. It is very much >> aimed at developer testing... but it would be cool if a subset of it >> could be used to configure "real" Samba clusters. >> >> However, given that you mentioned Ansible I figure that it might >> document certain things for you nice and clearly. It isn't missing >> anything obvious because we use it to build several test clusters each >> night. >> >> One day later this week I'll try to take a look at the wiki and add some >> documentation for joining a domain... >> >> peace & happiness, >> martin >> >> -- > > BOB BUCK > SENIOR PLATFORM SOFTWARE ENGINEER > > SKIDMORE, OWINGS & MERRILL > 7 WORLD TRADE CENTER > 250 GREENWICH STREET > NEW YORK, NY 10007 > T (212) 298-9624 > ROBERT.BUCK at SOM.COM >-- BOB BUCK SENIOR PLATFORM SOFTWARE ENGINEER SKIDMORE, OWINGS & MERRILL 7 WORLD TRADE CENTER 250 GREENWICH STREET NEW YORK, NY 10007 T (212) 298-9624 ROBERT.BUCK at SOM.COM
Hi Bob, On Tue, 6 Oct 2020 20:56:39 -0400, Robert Buck <robert.buck at som.com> wrote:> Hi Martin, you seem to do a lot of work on CTDB. Let me ask a question...Yes, I have done a lot of work on CTDB. A bit less lately...> Is there a way to segment CTDB/Samba to minimize chatter? Specifically, > what I have in mind... In recent years advances have been made in > distributed SQL databases (ideas which are applicable here) whereby the > communication profile between peers are minimized, and synchronization is > never necessary except in circumstances where a peer has the data resident > in memory and needs to perform an update (requiring an MVCC lock). Through > a catalog you can find out who is the chairman for any particular record, > thus be able to know who manages locks related to it, as well as handles > contended updates. In this way, communication tends to be segmented, and > lock management is localized.If you check out https://wiki.samba.org/index.php/CTDB_database_design you will see that CTDB uses something like a catalog to locate records in distributed databases. It uses a modulo scheme (based on active nodes) to locate the "location master". Martin Kleppmann's "Designing Data-Intensive Applications" (https://dataintensive.net/) (which is predated by CTDB) says this isn't a great idea, though mostly from a database recovery perspective since a lot of database has to move... it is fair to say that CTDB's database recovery isn't hugely optimised. However, in general use I think the distributed database model is sound and reasonably efficient. I'd be interested in your perspective in the context of the above. CTDB also has read-only delegation that can be enabled on distributed databases. I think this is used on some databases by default. There is also something called "sticky records" which we haven't used much but it is a simple approach to minimising record migration that might be useful. Volker Lendecke (from the Samba team) has started some work that localises records in the locking.tdb database but I haven't kept up with it.> It seems to us, and we need to measure with wireshark, that CTDB with Samba > forms a full-mesh network, yes? And because of the architecture and > communication profile, performance of the system is about 1/100th of what > it is when turned off. (Please bear in mind we're talking about > geo-distributed deployments here, not ones localized to a single region, > where latency is not an issue, so we're speaking of distances upwards of > 10,000 miles longest leg, and 5000 miles on average.) > > I've some experience in the area of distributed SQL databases, and it seems > that perhaps some of the architectural patterns to optimize communications > could apply here?Yes, CTDB does form a full-mesh network. However, it uses distributed databases for performance critical volatile database. Replicated databases are only (currently) used for persistent databases and although these perform very badly they aren't usually a bottleneck.> All that said, if you know a way to optimize out a 1:100 performance > penalty of using CTDB, please let us know.Note the comments about contention in https://wiki.samba.org/index.php/CTDB_database_design. It mentions some log messages to look for so you can start understanding the contention. Clustered Samba (with CTDB) does very badly when there is lots of contention for records. There are a few known ways of mitigating this. Looking at one example, a record containing metadata (including share mode data) for the root of a share can become very contented. This can be limited via the fileid:algorithm setting fsname_norootdir (see https://www.samba.org/samba/docs/current/man-html/vfs_fileid.8.html). However, before using this option you need to remember that its goal is to break lock coherency in the root of a share, so it has to be used very carefully. Another way of destroying cluster performance is to put Windows executables into clustered shares. This can induce near-silicon-melting contention in CTDB. Try to find ways of avoiding this. I don't remember much about solutions for this. However, the "msdfs proxy" option may be of some help to push a share for such data to a single node and simply not cluster it. All that said, I think geographical distribution is going to be a source of obvious latency. Please check out the "lmaster capability" option in the ctdb.conf(5) manual page. However, I think Ronnie Sahlberg originally added this option for situations where there is a main cluster at one end of a WAN link and a subsidiary cluster at the other end... I don't think it was aimed at generally solving the problem of using CTDB in a geographically distributed manner. Despite all I've said above, CTDB currently has no full-time developers. We have ideas for a new CTDB architecture, which has been discussed in SambaXP conference talks by Amitay Isaacs and myself in recent years. One of the goals here is to structure CTDB more clearly to reduce the barrier to entry for new developers. We don't really have obvious ideas for database optimisations but we would value any ideas. All input welcome... patches too! :-D peace & happiness, martin