thr3ads.net - Gluster users - [Gluster-users] How to maintain HA using NFS clients if the NFS daemon process gets killed on a gluster node? [Jan 2016]

If this information is useful, please help other people find it:
Share via:

Kris Laib

2016-Jan-28 05:38 UTC

[Gluster-users] How to maintain HA using NFS clients if the NFS daemon process gets killed on a gluster node?

Soumya,

CTDB failover works great if the server crashes or the NIC is pulled, but I
don't believe there's anything in the CTDB setup that would cause it to
realize there is a problem if only the glusterfs process responsible for serving
NFS is killed but network connectivity with other CTDB nodes remains intact.    
If others are able to kill just the PID for the associated "NFS Server on
localhost" process and have CTDB issue a failover, I'd be very
interested to know how their setup differs from mine.

Thanks for the nfs-ganesha suggestion, I'm not very familiar with that
option and don't have enough time in my timeline to properly test it before
moving to production, but I will look into it further for a possible solution
down the road or if my deadline gets extended.   The FUSE client may be a good
option for us as well, but I can't seem to get speeds higher than 30 MB/s
using the Gluster FUSE client (I posted more details on that earlier today to
this group as well, looking for advice there).

-Kris

________________________________________
From: Soumya Koduri <skoduri at redhat.com>
Sent: Wednesday, January 27, 2016 8:15 PM
To: Kris Laib; gluster-users at gluster.org
Subject: Re: [Gluster-users] How to maintain HA using NFS clients if the NFS
daemon process gets killed on a gluster node?

On 01/27/2016 09:39 PM, Kris Laib wrote:> Hi all,
>
> We're getting ready to roll out Gluster using standard NFS from the
> clients, and CTDB and RRDNS to help facilitate HA.   I thought we were
> good to know, but recently had an issue where there wasn't enough
memory
> on one of the gluster nodes in a test cluster, and OOM killer took out
> the NFS daemon process.   Since there was still IP traffic between nodes
> and the gluster-based local CTDB mount for the lock file was intact,
> CTDB didn't kick in an initiate failover, and all clients connected to
For gluster-NFS, CTDB is typically configured to maintain high
availability and I guess you have done the same. Could you check why
CTDB hasn't initiated IP failover?

An alternative solution is to use nfs-ganesha [1][2] to provide NFS
support for gluster volumes and can be configured to maintain HA using
gluster CLI.

Thanks,
Soumya

[1]
http://blog.gluster.org/2015/10/linux-scale-out-nfsv4-using-nfs-ganesha-and-glusterfs-one-step-at-a-time/

[2]
http://gluster.readthedocs.org/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Intergration/
(section# Using Highly Available Active-Active NFS-Ganesha And GlusterFS
cli)
> the node where NFS was killed lost their connections.   We'll obviously
> fix the lack of memory, but going forward how can we protect against
> clients getting disconnected if the NFS daemon should be stopped for any
> reason?
>
> Our cluster is 3 nodes, 1 is a silent witness node to help with split
> brain, and the other 2 host the volumes with one brick per node, and 1x2
> replication.
>
> Is there something incorrect about my setup, or is this a known downfall
> to using standard NFS mounts with gluster?
>
> Thanks,
>
> Kris
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>

Soumya Koduri

2016-Jan-28 06:00 UTC

head link

[Gluster-users] How to maintain HA using NFS clients if the NFS daemon process gets killed on a gluster node?

On 01/28/2016 11:08 AM, Kris Laib wrote:> Soumya,
>
> CTDB failover works great if the server crashes or the NIC is pulled, but I
don't believe there's anything in the CTDB setup that would cause it to
realize there is a problem if only the glusterfs process responsible for serving
NFS is killed but network connectivity with other CTDB nodes remains intact.    
If others are able to kill just the PID for the associated "NFS Server on
localhost" process and have CTDB issue a failover, I'd be very
interested to know how their setup differs from mine.
Okay. I have personally never tried out CTDB setup. But FWIH we can 
configure CTDB (using a option in ctdbd.conf) to manage any service such 
a way that CTDB service as well goes down when the service stops (for 
any reason) initiating failover.

CC'in Niels and couple of others who shall be able to help you out here.

Thanks,
Soumya
>
> Thanks for the nfs-ganesha suggestion, I'm not very familiar with that
option and don't have enough time in my timeline to properly test it before
moving to production, but I will look into it further for a possible solution
down the road or if my deadline gets extended.   The FUSE client may be a good
option for us as well, but I can't seem to get speeds higher than 30 MB/s
using the Gluster FUSE client (I posted more details on that earlier today to
this group as well, looking for advice there).
>
> -Kris
>
> ________________________________________
> From: Soumya Koduri <skoduri at redhat.com>
> Sent: Wednesday, January 27, 2016 8:15 PM
> To: Kris Laib; gluster-users at gluster.org
> Subject: Re: [Gluster-users] How to maintain HA using NFS clients if the
NFS daemon process gets killed on a gluster node?
>
> On 01/27/2016 09:39 PM, Kris Laib wrote:
>> Hi all,
>>
>> We're getting ready to roll out Gluster using standard NFS from the
>> clients, and CTDB and RRDNS to help facilitate HA.   I thought we were
>> good to know, but recently had an issue where there wasn't enough
memory
>> on one of the gluster nodes in a test cluster, and OOM killer took out
>> the NFS daemon process.   Since there was still IP traffic between
nodes
>> and the gluster-based local CTDB mount for the lock file was intact,
>> CTDB didn't kick in an initiate failover, and all clients connected
to
>
> For gluster-NFS, CTDB is typically configured to maintain high
> availability and I guess you have done the same. Could you check why
> CTDB hasn't initiated IP failover?
>
> An alternative solution is to use nfs-ganesha [1][2] to provide NFS
> support for gluster volumes and can be configured to maintain HA using
> gluster CLI.
>
> Thanks,
> Soumya
>
> [1]
>
http://blog.gluster.org/2015/10/linux-scale-out-nfsv4-using-nfs-ganesha-and-glusterfs-one-step-at-a-time/
>
> [2]
>
http://gluster.readthedocs.org/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Intergration/
> (section# Using Highly Available Active-Active NFS-Ganesha And GlusterFS
> cli)
>
>> the node where NFS was killed lost their connections.   We'll
obviously
>> fix the lack of memory, but going forward how can we protect against
>> clients getting disconnected if the NFS daemon should be stopped for
any
>> reason?
>>
>> Our cluster is 3 nodes, 1 is a silent witness node to help with split
>> brain, and the other 2 host the volumes with one brick per node, and
1x2
>> replication.
>>
>> Is there something incorrect about my setup, or is this a known
downfall
>> to using standard NFS mounts with gluster?
>>
>> Thanks,
>>
>> Kris
>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>>

Raghavendra Talur

2016-Jan-28 06:22 UTC

head link

[Gluster-users] How to maintain HA using NFS clients if the NFS daemon process gets killed on a gluster node?

On Thu, Jan 28, 2016 at 11:08 AM, Kris Laib <Kris.Laib at nwea.org> wrote:
> Soumya,
>
> CTDB failover works great if the server crashes or the NIC is pulled, but
> I don't believe there's anything in the CTDB setup that would cause
it to
> realize there is a problem if only the glusterfs process responsible for
> serving NFS is killed but network connectivity with other CTDB nodes
> remains intact.     If others are able to kill just the PID for the
> associated "NFS Server on localhost" process and have CTDB issue
a
> failover, I'd be very interested to know how their setup differs from
mine.
>
>I think you can achieve that by CTDB_MANAGES_NFS option. Refer to last four
sections on this link  https://ctdb.samba.org/nfs.html . I have not
personally used this option and because this is gluster-NFS and not kernel
NFS, you might have to edit the scripts like  /etc/ctdb/events.d/60.nfs.


Thanks for the nfs-ganesha suggestion, I'm not very familiar with
that> option and don't have enough time in my timeline to properly test it
before
> moving to production, but I will look into it further for a possible
> solution down the road or if my deadline gets extended.   The FUSE client
> may be a good option for us as well, but I can't seem to get speeds
higher
> than 30 MB/s using the Gluster FUSE client (I posted more details on that
> earlier today to this group as well, looking for advice there).
>
> -Kris
>
> ________________________________________
> From: Soumya Koduri <skoduri at redhat.com>
> Sent: Wednesday, January 27, 2016 8:15 PM
> To: Kris Laib; gluster-users at gluster.org
> Subject: Re: [Gluster-users] How to maintain HA using NFS clients if the
> NFS daemon process gets killed on a gluster node?
>
> On 01/27/2016 09:39 PM, Kris Laib wrote:
> > Hi all,
> >
> > We're getting ready to roll out Gluster using standard NFS from
the
> > clients, and CTDB and RRDNS to help facilitate HA.   I thought we were
> > good to know, but recently had an issue where there wasn't enough
memory
> > on one of the gluster nodes in a test cluster, and OOM killer took out
> > the NFS daemon process.   Since there was still IP traffic between
nodes
> > and the gluster-based local CTDB mount for the lock file was intact,
> > CTDB didn't kick in an initiate failover, and all clients
connected to
>
> For gluster-NFS, CTDB is typically configured to maintain high
> availability and I guess you have done the same. Could you check why
> CTDB hasn't initiated IP failover?
>
> An alternative solution is to use nfs-ganesha [1][2] to provide NFS
> support for gluster volumes and can be configured to maintain HA using
> gluster CLI.
>
> Thanks,
> Soumya
>
> [1]
>
>
http://blog.gluster.org/2015/10/linux-scale-out-nfsv4-using-nfs-ganesha-and-glusterfs-one-step-at-a-time/
>
> [2]
>
>
http://gluster.readthedocs.org/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Intergration/
> (section# Using Highly Available Active-Active NFS-Ganesha And GlusterFS
> cli)
>
> > the node where NFS was killed lost their connections.   We'll
obviously
> > fix the lack of memory, but going forward how can we protect against
> > clients getting disconnected if the NFS daemon should be stopped for
any
> > reason?
> >
> > Our cluster is 3 nodes, 1 is a silent witness node to help with split
> > brain, and the other 2 host the volumes with one brick per node, and
1x2
> > replication.
> >
> > Is there something incorrect about my setup, or is this a known
downfall
> > to using standard NFS mounts with gluster?
> >
> > Thanks,
> >
> > Kris
> >
> >
> >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-users
> >
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160128/4068f621/attachment.html>

Gluster users - Jan 2016 - How to maintain HA using NFS clients if the NFS daemon process gets killed on a gluster node?

[Gluster-users] How to maintain HA using NFS clients if the NFS daemon process gets killed on a gluster node?

[Gluster-users] How to maintain HA using NFS clients if the NFS daemon process gets killed on a gluster node?

[Gluster-users] How to maintain HA using NFS clients if the NFS daemon process gets killed on a gluster node?