Mike Ruebner
2018-Jun-03 01:32 UTC
[Samba] CTDB over WAN Link with LMASTER/RECMASTER Disabled
Hi, I came across the 'CTDB_CAPABILITY_LMASTER=no' and 'CTDB_CAPABILITY_RECMASTER=no' options in my quest to salvage a rather poorly performing CTDB cluster over Ceph(fs). Unfortunately, the docs provide not enough information for a clustering noop like myself. Would there be any benefit to disabling those options for a branch office node on a high-latency WAN connection? Throughput maxes out at 20 Mbit/sec, with latency in the 20 - 30 ms range. I am mostly concerned about SMB read/list performance which drops significantly for folders with an object count >1000. Share mount is Cephfs over Ceph Kraken/Jewel. Any pointers greatly appreciated! Mike
Martin Schwenke
2018-Jun-04 05:24 UTC
[Samba] CTDB over WAN Link with LMASTER/RECMASTER Disabled
Hi Mike, On Sat, 2 Jun 2018 20:32:00 -0500 (CDT), Mike Ruebner via samba <samba at lists.samba.org> wrote:> I came across the 'CTDB_CAPABILITY_LMASTER=no' and > 'CTDB_CAPABILITY_RECMASTER=no' options in my quest to salvage a > rather poorly performing CTDB cluster over Ceph(fs). Unfortunately, > the docs provide not enough information for a clustering noop like > myself. Would there be any benefit to disabling those options for a > branch office node on a high-latency WAN connection?> Throughput maxes out at 20 Mbit/sec, with latency in the 20 - 30 ms > range. I am mostly concerned about SMB read/list performance which > drops significantly for folders with an object count >1000. Share > mount is Cephfs over Ceph Kraken/Jewel.> Any pointers greatly appreciated!I think that the first thing to do would be to measure cephfs performance over the WAN. If the bottleneck is there when tweaking CTDB options won't help you. If the cephfs performance is good then look at CTDB... The potential CTDB problem is that volatile databases such as locking.tdb are distributed. In this case the databases are being distributed across the WAN. Records will still be distributed in this way with CTDB_CAPABILITY_LMASTER=no. The advantage would be that files accessed only (or primarily?) in the main office would not need a trip over the WAN to locate records. However, accesses from the branch office would always need to go over the WAN to locate records. If the poor performance is only being seen on the branch office node then I don't think CTDB_CAPABILITY_LMASTER=no would help... but I could be wrong. I think that with a WAN in the mix then you'll always get poor performance when there is contention for file access on either side of the WAN or when lots of small files are being created via the branch office node. Your best option is to limit contention for records. One possibility might be, depending on what is happening might be the: fileid:algorithm = fsname_norootdir Samba share option. However, you really need to understand if your problem is contention at the root of a share and you must understand the implications of that option. I doubt that many people have experience with CTDB_CAPABILITY_LMASTER=no. I've been a CTDB developer for over 10 years and I haven't seen anyone discuss how useful this option is, nor have I done any related performance analysis. Note that I have ignored CTDB_CAPABILITY_RECMASTER=no. It should only come into play during database recoveries. If you have a lot of active records (due to high client activity) then having the branch office node coordinate recovery could be very slow. However, this shouldn't affect ongoing performance. peace & happiness, martin