That (L/REC) master approach by setting LN to be those for their data, but
no other region be enabled for lmaster or recmaster, WAY faster. 8s for
100x 10MiB files. Also, several other types of operations in Windows are
now faster. Would love to discuss more, if you have time. At this point, we
have a possible geo-distributed deployment architecture. Would love to find
a couple more optimizations to provide additional headroom, knowing full
well we have more than 1000 employees that will access different shards of
data (shares).
On Wed, Oct 7, 2020 at 8:22 AM Martin Schwenke <martin at meltin.net>
wrote:
> Hi Bob,
>
> On Tue, 6 Oct 2020 20:56:39 -0400, Robert Buck <robert.buck at
som.com>
> wrote:
>
> > Hi Martin, you seem to do a lot of work on CTDB. Let me ask a
question...
>
> Yes, I have done a lot of work on CTDB. A bit less lately...
>
> > Is there a way to segment CTDB/Samba to minimize chatter?
Specifically,
> > what I have in mind... In recent years advances have been made in
> > distributed SQL databases (ideas which are applicable here) whereby
the
> > communication profile between peers are minimized, and synchronization
is
> > never necessary except in circumstances where a peer has the data
> resident
> > in memory and needs to perform an update (requiring an MVCC lock).
> Through
> > a catalog you can find out who is the chairman for any particular
record,
> > thus be able to know who manages locks related to it, as well as
handles
> > contended updates. In this way, communication tends to be segmented,
and
> > lock management is localized.
>
> If you check out https://wiki.samba.org/index.php/CTDB_database_design
> you will see that CTDB uses something like a catalog to locate records
> in distributed databases. It uses a modulo scheme (based on active
> nodes) to locate the "location master". Martin Kleppmann's
"Designing
> Data-Intensive Applications" (https://dataintensive.net/) (which is
> predated by CTDB) says this isn't a great idea, though mostly from a
> database recovery perspective since a lot of database has to move... it
> is fair to say that CTDB's database recovery isn't hugely
optimised.
> However, in general use I think the distributed database model is sound
> and reasonably efficient.
>
> I'd be interested in your perspective in the context of the above.
>
> CTDB also has read-only delegation that can be enabled on distributed
> databases. I think this is used on some databases by default. There is
> also something called "sticky records" which we haven't used
much but it
> is a simple approach to minimising record migration that might be
> useful.
>
> Volker Lendecke (from the Samba team) has started some work that
> localises records in the locking.tdb database but I haven't kept up
> with it.
>
> > It seems to us, and we need to measure with wireshark, that CTDB with
> Samba
> > forms a full-mesh network, yes? And because of the architecture and
> > communication profile, performance of the system is about 1/100th of
what
> > it is when turned off. (Please bear in mind we're talking about
> > geo-distributed deployments here, not ones localized to a single
region,
> > where latency is not an issue, so we're speaking of distances
upwards of
> > 10,000 miles longest leg, and 5000 miles on average.)
> >
> > I've some experience in the area of distributed SQL databases, and
it
> seems
> > that perhaps some of the architectural patterns to optimize
> communications
> > could apply here?
>
> Yes, CTDB does form a full-mesh network.
>
> However, it uses distributed databases for performance critical
> volatile database. Replicated databases are only (currently) used for
> persistent databases and although these perform very badly they aren't
> usually a bottleneck.
>
> > All that said, if you know a way to optimize out a 1:100 performance
> > penalty of using CTDB, please let us know.
>
> Note the comments about contention in
> https://wiki.samba.org/index.php/CTDB_database_design. It mentions
> some log messages to look for so you can start understanding the
> contention.
>
> Clustered Samba (with CTDB) does very badly when there is lots of
> contention for records. There are a few known ways of mitigating this.
>
> Looking at one example, a record containing metadata (including share
> mode data) for the root of a share can become very contented. This can
> be limited via the fileid:algorithm setting fsname_norootdir (see
> https://www.samba.org/samba/docs/current/man-html/vfs_fileid.8.html).
> However, before using this option you need to remember that its goal
> is to break lock coherency in the root of a share, so it has to be
> used very carefully.
>
> Another way of destroying cluster performance is to put Windows
> executables into clustered shares. This can induce
> near-silicon-melting contention in CTDB. Try to find ways of avoiding
> this. I don't remember much about solutions for this. However, the
> "msdfs proxy" option may be of some help to push a share for such
data
> to a single node and simply not cluster it.
>
> All that said, I think geographical distribution is going to be a
> source of obvious latency. Please check out the "lmaster
capability"
> option in the ctdb.conf(5) manual page. However, I think Ronnie
> Sahlberg originally added this option for situations where there is a
> main cluster at one end of a WAN link and a subsidiary cluster at the
> other end... I don't think it was aimed at generally solving the
> problem of using CTDB in a geographically distributed manner.
>
> Despite all I've said above, CTDB currently has no full-time
> developers. We have ideas for a new CTDB architecture, which has been
> discussed in SambaXP conference talks by Amitay Isaacs and myself in
> recent years. One of the goals here is to structure CTDB more clearly
> to reduce the barrier to entry for new developers. We don't really
> have obvious ideas for database optimisations but we would value any
> ideas.
>
> All input welcome... patches too! :-D
>
> peace & happiness,
> martin
>
>
--
BOB BUCK
SENIOR PLATFORM SOFTWARE ENGINEER
SKIDMORE, OWINGS & MERRILL
7 WORLD TRADE CENTER
250 GREENWICH STREET
NEW YORK, NY 10007
T (212) 298-9624
ROBERT.BUCK at SOM.COM