Thomas.Zimolong@bmi.bund.de
2006-May-02 18:25 UTC
[Ocfs2-users] understanding self fencing with ocfs2
hi list, heaving read the FAQ, I still have a problem understanding the self fencing thing. the FAQ sais: Q02 How does OCFS2's cluster services define a quorum? A02 ... A node has quorum when: * it sees an odd number of heartbeating nodes and has network connectivity to more than half of them. or * it sees an even number of heartbeating nodes and has network connectivity to at least half of them *and* has connectivity to the heartbeating node with the lowest node number. and Q03 What is fencing? A03 Fencing is the act of forecefully removing a node from a cluster. A node with OCFS2 mounted will fence itself when it realizes that it doesn't have quorum in a degraded cluster. ... with a two-node-cluster with node numbers 0 and 1, I see the following problem. if the node with node number 0 crashes and neither does heartbeat nor is it reachable via LAN, we have: - an "odd number of heartbeating nodes" (1, the node with number 1) but - no "network connectivity to more than half of them" (the only other node is'nt reachable anymore) so, as I see it, no qorum = self fencing. as a result, we end up with no node at all. is this right (and is it meant that way) or is there any special algorithm in a two node environment? our config is: two HP DL380 G4, SLES9 SP2 (no SP3, because it's not supported by EMC powerpath) Linux bmiam112 2.6.5-7.201-bigsmp #1 SMP Thu Aug 25 06:20:45 UTC 2005 i686 i686 i386 GNU/Linux all OCFS modules version 1.0.2-SLES, ocfs2console-0.99.14-0.3 ocfs2-tools-0.99.14-0.3 each with two NICs in active-standby (bond0) thanks in advance and sorry, if this is kind of a "newby-question".... greetings thomas zimolong Bundesministerium des Inneren Referat Z 6 - Funktionsbereich Anwendungsentwicklung Alt-Moabit 101 D D-10559 Berlin Fon 01888 681 2383 Fax 01888 681 5 2383 mailto:thomas.zimolong at bmi.bund.de http://bmi.bund.de
In a 2 node setup, if node 0 or 1 crashes, the other node should survive. The one issue encountered by many users was while shutting down node 0, node 1 would fence it self. The latter was because of the sequencing of service shutdowns. We added ocfs2-init script to handle shutdown sequencing. However, 1.0.2 is fairly old. We've made numerous fixes. Ideally one should be on SP3. Infact, look for SuSE to make a new drop in the coming weeks which will include the "certified" ocfs2 bits. Thomas.Zimolong at bmi.bund.de wrote:> hi list, > > heaving read the FAQ, I still have a problem understanding the > self fencing thing. > the FAQ sais: > Q02 How does OCFS2's cluster services define a quorum? > A02 ... > A node has quorum when: > * it sees an odd number of heartbeating nodes and has network > connectivity to more than half of them. > or > * it sees an even number of heartbeating nodes and has network > connectivity to at least half of them *and* has connectivity > to > the heartbeating node with the lowest node number. > > and > > Q03 What is fencing? > A03 Fencing is the act of forecefully removing a node from a > cluster. > A node with OCFS2 mounted will fence itself when it realizes > that it > doesn't have quorum in a degraded cluster. > ... > > with a two-node-cluster with node numbers 0 and 1, I see the following > problem. > if the node with node number 0 crashes and neither does heartbeat nor is > it > reachable via LAN, we have: > - an "odd number of heartbeating nodes" (1, the node with number 1) but > - no "network connectivity to more than half of them" (the only other > node > is'nt reachable anymore) > so, as I see it, no qorum = self fencing. > > as a result, we end up with no node at all. is this right (and is it > meant that > way) or is there any special algorithm in a two node environment? > > our config is: > two HP DL380 G4, > SLES9 SP2 (no SP3, because it's not supported by EMC powerpath) > Linux bmiam112 2.6.5-7.201-bigsmp #1 SMP Thu Aug 25 06:20:45 UTC 2005 > i686 i686 i386 GNU/Linux > all OCFS modules version 1.0.2-SLES, > ocfs2console-0.99.14-0.3 > ocfs2-tools-0.99.14-0.3 > each with two NICs in active-standby (bond0) > > thanks in advance and sorry, if this is kind of a "newby-question".... > > greetings > thomas zimolong > > Bundesministerium des Inneren > Referat Z 6 - Funktionsbereich Anwendungsentwicklung > Alt-Moabit 101 D > D-10559 Berlin > Fon 01888 681 2383 > Fax 01888 681 5 2383 > mailto:thomas.zimolong at bmi.bund.de > http://bmi.bund.de > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users at oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users >