thr3ads.net - samba - [Samba] Samba in Pacemaker-Cluster: CTDB fails to get recovery lock [Mar 2011]

If this information is useful, please help other people find it:
Share via:

Uwe Ritzschke

2011-Mar-11 13:13 UTC

[Samba] Samba in Pacemaker-Cluster: CTDB fails to get recovery lock

I'm currently testing fail-over with a two-node active-active cluster 
(with node dig and node dag): Both nodes are up, one is manually killed. 
CTDB on the node that's still alive should perform a recovery and 
everything should working again.

What's infrequently happening is:

After killing the pacemaker-process on dag (and dag consequently being 
fenced), dig's CTDB tries to get the recovery lock and fails. As there 
is no other node online to get the recovery lock and thus finishing 
CTDB's recovery, dig's CTDB keeps trying to get the recovery lock until 
manually stopped.
The only way to get CTDB back to work is to restart OCFS2's distributed 
lock manager.

logfiles and pacemaker-configuration are attached, any help would be 
greatly appreciated :)

Regards,
Uwe



Our setting:

two nodes directly connected via LAN running openSuse 11.3 and sharing a 
SAN-drive that is connected via two interfaces using multipath.

pacemaker 1.1.2
corosync 1.2.1
cluster-glue 1.0.5-1.4
ctdb 1.0.114-2.20
ocfs2 1.4.3-1.4
multipath 0.4.8-51.3




-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: crm.config
URL:
<http://lists.samba.org/pipermail/samba/attachments/20110311/024ac44a/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: log.ctdb
URL:
<http://lists.samba.org/pipermail/samba/attachments/20110311/024ac44a/attachment-0001.ksh>

Jim McDonough

2011-Mar-14 18:23 UTC

head link

[Samba] Samba in Pacemaker-Cluster: CTDB fails to get recovery lock

On Fri, Mar 11, 2011 at 8:13 AM, Uwe Ritzschke
<uwe.ritzschke.2 at cms.hu-berlin.de> wrote:> I'm currently testing fail-over with a two-node active-active cluster
(with
> node dig and node dag): Both nodes are up, one is manually killed. CTDB on
> the node that's still alive should perform a recovery and everything
should
> working again.
>
> What's infrequently happening is:
>
> After killing the pacemaker-process on dag (and dag consequently being
> fenced), dig's CTDB tries to get the recovery lock and fails. As there
is no
> other node online to get the recovery lock and thus finishing CTDB's
> recovery, dig's CTDB keeps trying to get the recovery lock until
manually
> stopped.
> The only way to get CTDB back to work is to restart OCFS2's distributed
lock
> manager.
>
>
> Our setting:
>
> two nodes directly connected via LAN running openSuse 11.3 and sharing a
> SAN-drive that is connected via two interfaces using multipath.
>
> pacemaker 1.1.2
> corosync 1.2.1
> cluster-glue 1.0.5-1.4
> ctdb 1.0.114-2.20
> ocfs2 1.4.3-1.4
> multipath 0.4.8-51.3
>You might want to try updated packages from the repository:
http://download.opensuse.org/repositories/network:/ha-clustering/openSUSE_11.3/

This would give you newer code levels on the HA packages.


-- 
Jim McDonough
Samba Team
SUSE labs
jmcd at samba dot org
jmcd at themcdonoughs dot org

Possibly Parallel Threads

Search for more seemingly similar threads

samba - Mar 2011 - Samba in Pacemaker-Cluster: CTDB fails to get recovery lock

[Samba] Samba in Pacemaker-Cluster: CTDB fails to get recovery lock

[Samba] Samba in Pacemaker-Cluster: CTDB fails to get recovery lock

Possibly Parallel Threads