David Roid
2011-Apr-11 11:29 UTC
[Samba] [CTDB] how does LMASTER know where the record is stored?
Greetings list, I was looking at the wiki "samba and clustering" and a ctdb.pdf, admittedly both are quite old (2006 or 2007) and I don't know how things change over years, but I just have two questions about LMASTER: < this is from pdf > LMASTER fixed ? LMASTER is based on record key only ? LMASTER knows where the record is stored ? new records are stored on LMASTER Q1. From the output of "ctdb status" I can see that LMASTER is bacially configured as the node itself, then how does each node know where the record is stored? By broadcast to all nodes or any other way? And more importantly, when? Q2. If new records are stored on LMASTER, do these records need to be synced within the cluster? And when? Excuse me if this comes off sort of rude, it's just there are not enough docs of CTDB on samba site. Faithfully -David
Michael Adam
2011-Apr-13 13:07 UTC
[Samba] [CTDB] how does LMASTER know where the record is stored?
Hi David, David Roid wrote:> Greetings list, > > I was looking at the wiki "samba and clustering" and a ctdb.pdf, admittedly > both are quite old (2006 or 2007) and I don't know how things change over > years, but I just have two questions about LMASTER:First off, I have written a small paper on ctdb in 2009 which is still mostly correct today: http://samba.org/~obnox/presentations/sambaXP-2009/samba-and-ctdb.pdf It is also linked on http://ctdb.samba.org/documentation.html . But the details about LMASTER have been omitted. Maybe I should write an update version. :-)> < this is from pdf > > LMASTER fixed > ? LMASTER is based on record key only > ? LMASTER knows where the record is stored > ? new records are stored on LMASTER > > Q1. From the output of "ctdb status" I can see that LMASTER is bacially > configured as the node itself, then how does each node know where the record > is stored? By broadcast to all nodes or any other way? And more importantly, > when? > > Q2. If new records are stored on LMASTER, do these records need to be synced > within the cluster? And when?Let me explain to some detail about CTDB's view of tdb records. The trick in ctdb (that enables Samba to scale well in the cluster) is that it does _not_ propagate record updates to all nodes in the cluster. There are two essential roles for a node with respect to a record in ctdb: 1. a record's data master (aka DMASTER): This is the node that holds the current and authoritative copy of the record. This is the node that has last announced its intention to change that record and was granted permission. The DMASTER role moves in the cluster as different nodes write to the record. Nodes that were DMASTER of the record previously may hold older copies of the record. The records contain a special header field "record sequence number" (aka RSN) which is incremented whenever the DMASTER role is moved from one node to another. 2. a record's location master (LMASTER): This is the node that knows the data master for the given record. The LMASTER for a record is a fixed record in the in the cluster (as long as the list of active nodes does not change). It is calculated from the record like this (in the simplest case): A 32 bit hash value is calculated from the record's key. This 32bit value is taken module the number of nodes to yield the LMASTER's node number (if the nodes are numbered without gap starting at 0). Hence it is always cheap to contact the LMASTER and the LMASTER knows how to find the DMASTER. When a node wants to write to a record, it requests the DMASTER role for that record. It does so by sending an appropriate network request to the record's LMASTER. The lmaster knows whether the record existed previously and if so it requests the DMASTER got transfer the DMASTER role along with the record's contents via the LMASTER to the requesting node. If the record did not previously exist, the LMASTER creates and empty initial record and transfers this to the requesting node. This way, the LMASTER always has the previous copy of the record. Regarding the output of "ctdb status", e.g.: Number of nodes:4 pnn:0 10.0.0.21 OK (THIS NODE) pnn:1 10.0.0.22 OK pnn:2 10.0.0.23 OK pnn:3 10.0.0.20 OK Generation:1987363808 Size:4 hash:0 lmaster:0 hash:1 lmaster:1 hash:2 lmaster:2 hash:3 lmaster:3 Recovery mode:NORMAL (0) Recovery master:1 Here you see a 4-node cluster with node numbers 0,1,2,3. And you see which node number is lmaster for a given key's hash value modulo 4(==Size). Generally, you will see something like "hash:X lmaster:Y". E.g. stop ctdb on node number 0 an look again at ctdb status. You will see: Size:3 hash:0 lmaster:1 hash:1 lmaster:2 hash:2 lmaster:3> Excuse me if this comes off sort of rude, it's just there are not enough > docs of CTDB on samba site.No problem, there is also some (potentially) deprecated info on the wiki.samba.org. But the lmaster bit might be worth explaining in more detail anyways. Do these explanations make things more clear for you? Cheers - Michael> Faithfully > -David-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 206 bytes Desc: not available URL: <http://lists.samba.org/pipermail/samba/attachments/20110413/bd86f2af/attachment.pgp>