thr3ads.net - samba - [Samba] CTDB and nfs-ganesha [Oct 2019]

If this information is useful, please help other people find it:
Share via:

Martin Schwenke

2019-Oct-05 10:12 UTC

[Samba] CTDB and nfs-ganesha

Hi Max,

On Fri, 4 Oct 2019 14:01:22 +0000, Max DiOrio
<Max.DiOrio at ieeeglobalspec.com> wrote:
> Looks like this is the actual error:
> 
> 2019/10/04 09:51:29.174870 ctdbd[17244]: Recovery has started
> 2019/10/04 09:51:29.174982 ctdbd[17244]: ../ctdb/server/ctdb_server.c:188
ctdb request 2147483554 of type 8 length 48 from node 1 to 0
> 2019/10/04 09:51:29.175021 ctdbd[17244]: Recovery lock configuration
inconsistent: recmaster has NULL, this node has
/run/gluster/shared_storage/.CTDB-lockfile, shutting down
> 2019/10/04 09:51:29.175045 ctdbd[17244]: Shutdown sequence commencing.
> 2019/10/04 09:51:29.175056 ctdbd[17244]: Set runstate to SHUTDOWN (6)
Yep.  CTDB refuses to work if the recovery lock is configured to be
different on different nodes, since that is an important
misconfiguration. If you make it the same on all nodes then it will get
past this.
> I'm attaching the full log from this startup.  
> 
> The other thing that baffles me is that I have most of the legacy scripts
disabled, yet the startup shows that's running them all.  Also have no idea
why it's listing the legacy scripts twice here, and the list is different.
> 
> [[LAColo-Prod] root at hq-6pgluster01 ~]# ctdb event script list legacy
> * 00.ctdb
>   01.reclock
>   05.system
>   06.nfs
> * 10.interface
>   11.natgw
>   11.routing
>   13.per_ip_routing
>   20.multipathd
>   31.clamd
>   40.vsftpd
>   41.httpd
>   49.winbind
>   50.samba
>   60.nfs
>   70.iscsi
>   91.lvs
> 
> * 01.reclock
>   05.system
> * 06.nfs
>   11.natgw
>   11.routing
>   13.per_ip_routing
>   20.multipathd
>   31.clamd
>   40.vsftpd
>   41.httpd
>   49.winbind
>   50.samba
> * 60.nfs
>   70.iscsi
>   91.lvs
This is strange.  I can explain the above, but I can't explain why all
of the scripts without stars are running.

The first list is the scripts installed with CTDB
in /usr/share/ctdb/events/legacy/.  These are enabled via a symlink
to /etc/ctdb/events/legacy/, so you should see 2 symlinks.

The second list is "custom" scripts installed directly
into /etc/ctdb/events/legacy/, or perhaps linked to some other place.
I don't know how either of these things could have happened.  What does:

  ls -l /etc/ctdb/events/legacy/

show?

Thanks...

peace & happiness,
martin

Max DiOrio

2019-Oct-05 10:58 UTC

head link

[Samba] CTDB and nfs-ganesha

I?ll have to check out the script issue on Monday.

You said the lock needs to be the same on all nodes.   I can do that but this is
now in production and restarting the ctdb service forces a failover of the ip,
which actually causes a failure of a few of our Kubernetes sql database pods -
they freak out and don?t recovery if storage is ripped out from under them.

Is there a way to do this without an ip takeover on each node when ctdb is
restarted?

Thanks. This is slowly starting to make sense.

________________________________
From: Martin Schwenke <martin at meltin.net>
Sent: Saturday, October 5, 2019 6:12 AM
To: Max DiOrio
Cc: samba at lists.samba.org
Subject: Re: [Samba] CTDB and nfs-ganesha

NOTE: This email originated from outside of the organization.

Hi Max,

On Fri, 4 Oct 2019 14:01:22 +0000, Max DiOrio
<Max.DiOrio at ieeeglobalspec.com> wrote:
> Looks like this is the actual error:
>
> 2019/10/04 09:51:29.174870 ctdbd[17244]: Recovery has started
> 2019/10/04 09:51:29.174982 ctdbd[17244]: ../ctdb/server/ctdb_server.c:188
ctdb request 2147483554 of type 8 length 48 from node 1 to 0
> 2019/10/04 09:51:29.175021 ctdbd[17244]: Recovery lock configuration
inconsistent: recmaster has NULL, this node has
/run/gluster/shared_storage/.CTDB-lockfile, shutting down
> 2019/10/04 09:51:29.175045 ctdbd[17244]: Shutdown sequence commencing.
> 2019/10/04 09:51:29.175056 ctdbd[17244]: Set runstate to SHUTDOWN (6)
Yep.  CTDB refuses to work if the recovery lock is configured to be
different on different nodes, since that is an important
misconfiguration. If you make it the same on all nodes then it will get
past this.
> I'm attaching the full log from this startup.
>
> The other thing that baffles me is that I have most of the legacy scripts
disabled, yet the startup shows that's running them all.  Also have no idea
why it's listing the legacy scripts twice here, and the list is different.
>
> [[LAColo-Prod] root at hq-6pgluster01 ~]# ctdb event script list legacy
> * 00.ctdb
>   01.reclock
>   05.system
>   06.nfs
> * 10.interface
>   11.natgw
>   11.routing
>   13.per_ip_routing
>   20.multipathd
>   31.clamd
>   40.vsftpd
>   41.httpd
>   49.winbind
>   50.samba
>   60.nfs
>   70.iscsi
>   91.lvs
>
> * 01.reclock
>   05.system
> * 06.nfs
>   11.natgw
>   11.routing
>   13.per_ip_routing
>   20.multipathd
>   31.clamd
>   40.vsftpd
>   41.httpd
>   49.winbind
>   50.samba
> * 60.nfs
>   70.iscsi
>   91.lvs
This is strange.  I can explain the above, but I can't explain why all
of the scripts without stars are running.

The first list is the scripts installed with CTDB
in /usr/share/ctdb/events/legacy/.  These are enabled via a symlink
to /etc/ctdb/events/legacy/, so you should see 2 symlinks.

The second list is "custom" scripts installed directly
into /etc/ctdb/events/legacy/, or perhaps linked to some other place.
I don't know how either of these things could have happened.  What does:

  ls -l /etc/ctdb/events/legacy/

show?

Thanks...

peace & happiness,
martin

Martin Schwenke

2019-Oct-06 06:19 UTC

head link

[Samba] CTDB and nfs-ganesha

Hi Max,

On Sat, 5 Oct 2019 10:58:12 +0000, Max DiOrio
<Max.DiOrio at ieeeglobalspec.com> wrote:
> I?ll have to check out the script issue on Monday.
> You said the lock needs to be the same on all nodes.   I can do that
> but this is now in production and restarting the ctdb service forces
> a failover of the ip, which actually causes a failure of a few of our
> Kubernetes sql database pods - they freak out and don?t recovery if
> storage is ripped out from under them.
> Is there a way to do this without an ip takeover on each node when
> ctdb is restarted?
There are 2 ways.

The first is the NoIPTakeover tunable (see ctdb-tunables(7)).  The
current semantics of this is that it is evaluated on the recovery
master node.  If you set it on all nodes (in /etc/ctdb/ctdb.tunables on
the any nodes you take down, or using "ctdb setvar NoIPTakeover 1" on
any nodes that are up... note that it only survives reboots if it is
in ctdb.tunables) then that should do what you want.

However, I think the easiest option is to temporarily empty the
/etc/ctdb/public_addresses file (comment out all lines?) on the nodes
that you don't want taking over addresses when you restart them.  When
you finally want takeover to occur then uncomment the addresses and run
"ctdb reloadips <node>" for the <node> in question.  That
will cause
that node to reload its nodes file and trigger a failover.  Or just
uncomment the addresses and restart CTDB.  ;-)
> Thanks. This is slowly starting to make sense.
I'm glad.  If you can think of particular documentation that can be
improved then please point me at it...  or send a patch! :-)

peace & happiness,
martin

Apparently Analagous Threads

Search for more reasonably related threads

samba - Oct 2019 - CTDB and nfs-ganesha

[Samba] CTDB and nfs-ganesha

[Samba] CTDB and nfs-ganesha

[Samba] CTDB and nfs-ganesha

Apparently Analagous Threads