thr3ads.net - Gluster users - [Gluster-users] pacemaker VIP routing latency to gluster node. [Sep 2016]

If this information is useful, please help other people find it:
Share via:

Soumya Koduri

2016-Sep-23 08:33 UTC

[Gluster-users] pacemaker VIP routing latency to gluster node.

On 09/23/2016 02:34 AM, Dung Le wrote:> Hello,
>
> I have a pretty straight forward configuration as below:
>
> 3 storage nodes running version 3.7.11 with replica of 3 and it using
> native gluster NFS.
> corosync version 1.4.7 and pacemaker version 1.1.12
> I have DNS round-robin on 3 VIPs living on the 3 storage nodes.
>
> *_Here is how I configure my corosync:_*
>
> SN1 with x.x.x.001
> SN2 with x.x.x.002
> SN3 with x.x.x.003
>
>
>
******************************************************************************************************************
> *_Below is pcs config output:_*
>
> Cluster Name: dfs_cluster
> Corosync Nodes:
>  SN1 SN2 SN3
> Pacemaker Nodes:
>  SN1 SN2 SN3
>
> Resources:
>  Clone: Gluster-clone
>   Meta Attrs: clone-max=3 clone-node-max=3 globally-unique=false
>   Resource: Gluster (class=ocf provider=glusterfs type=glusterd)
>    Operations: start interval=0s timeout=20 (Gluster-start-interval-0s)
>                stop interval=0s timeout=20 (Gluster-stop-interval-0s)
>                monitor interval=10s (Gluster-monitor-interval-10s)
>  Resource: SN1-ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
>   Attributes: ip=x.x.x.001 cidr_netmask=32
>   Operations: start interval=0s timeout=20s
> (SN1-ClusterIP-start-interval-0s)
>               stop interval=0s timeout=20s (SN1-ClusterIP-stop-interval-0s)
>               monitor interval=10s (SN1-ClusterIP-monitor-interval-10s)
>  Resource: SN2-ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
>   Attributes: ip=x.x.x.002 cidr_netmask=32
>   Operations: start interval=0s timeout=20s
> (SN2-ClusterIP-start-interval-0s)
>               stop interval=0s timeout=20s (SN2-ClusterIP-stop-interval-0s)
>               monitor interval=10s (SN2-ClusterIP-monitor-interval-10s)
>  Resource: SN3-ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
>   Attributes: ip=x.x.x.003 cidr_netmask=32
>   Operations: start interval=0s timeout=20s
> (SN3-ClusterIP-start-interval-0s)
>               stop interval=0s timeout=20s (SN3-ClusterIP-stop-interval-0s)
>               monitor interval=10s (SN3-ClusterIP-monitor-interval-10s)
>
> Stonith Devices:
> Fencing Levels:
>
> Location Constraints:
>   Resource: SN1-ClusterIP
>     Enabled on: SN1 (score:3000) (id:location-SN1-ClusterIP-SN1-3000)
>     Enabled on: SN2 (score:2000) (id:location-SN1-ClusterIP-SN2-2000)
>     Enabled on: SN3 (score:1000) (id:location-SN1-ClusterIP-SN3-1000)
>   Resource: SN2-ClusterIP
>     Enabled on: SN2 (score:3000) (id:location-SN2-ClusterIP-SN2-3000)
>     Enabled on: SN3 (score:2000) (id:location-SN2-ClusterIP-SN3-2000)
>     Enabled on: SN1 (score:1000) (id:location-SN2-ClusterIP-SN1-1000)
>   Resource: SN3-ClusterIP
>     Enabled on: SN3 (score:3000) (id:location-SN3-ClusterIP-SN3-3000)
>     Enabled on: SN1 (score:2000) (id:location-SN3-ClusterIP-SN1-2000)
>     Enabled on: SN2 (score:1000) (id:location-SN3-ClusterIP-SN2-1000)
> Ordering Constraints:
>   start Gluster-clone then start SN1-ClusterIP (kind:Mandatory)
> (id:order-Gluster-clone-SN1-ClusterIP-mandatory)
>   start Gluster-clone then start SN2-ClusterIP (kind:Mandatory)
> (id:order-Gluster-clone-SN2-ClusterIP-mandatory)
>   start Gluster-clone then start SN3-ClusterIP (kind:Mandatory)
> (id:order-Gluster-clone-SN3-ClusterIP-mandatory)
> Colocation Constraints:
>
> Resources Defaults:
>  is-managed: true
>  target-role: Started
>  requires: nothing
>  multiple-active: stop_nkart
> Operations Defaults:
>  No defaults set
>
> Cluster Properties:
>  cluster-infrastructure: cman
>  dc-version: 1.1.11-97629de
>  no-quorum-policy: ignore
>  stonith-enabled: false
>
>
******************************************************************************************************************
> *_pcs status output:_*
>
> Cluster name: dfs_cluster
> Last updated: Thu Sep 22 16:57:35 2016
> Last change: Mon Aug 29 18:02:44 2016
> Stack: cman
> Current DC: SN1 - partition with quorum
> Version: 1.1.11-97629de
> 3 Nodes configured
> 6 Resources configured
>
>
> Online: [ SN1 SN2 SN3 ]
>
> Full list of resources:
>
>  Clone Set: Gluster-clone [Gluster]
>      Started: [ SN1 SN2 SN3 ]
>  SN1-ClusterIP(ocf::heartbeat:IPaddr2):Started SN1
>  SN2-ClusterIP(ocf::heartbeat:IPaddr2):Started SN2
>  SN3-ClusterIP(ocf::heartbeat:IPaddr2):Started SN3
>
>
******************************************************************************************************************
>
>
> When I mount the gluster volume, I'm using the VIP name. It will choose
> one of the storage nodes to establish NFS.
>
> *_My issue is:_*
> *_
> _*
> After mounted gluster volume for 1 - 2 hrs, all the clients are
> reporting not getting df output as df got hung. I did check the dmessage
> log from client side and getting the following error :
>
> /Sep 20 05:46:45 xxxxx kernel: nfs: server nfsserver001 not responding,
> still trying/
> /Sep 20 05:49:45 xxxxx kernel: nfs: server nfsserver001 not responding,
> still trying/
>
> I did try to mount the gluster volume using the DNS round-robin to
> different mountpoint but the mount process was not successful.
Did you check 'pcs status' output that time? Maybe the *-ClusterIP* 
resources would have gone to Stopped state, making VIPs unavailable.

Thanks,
Soumya

Then I> tried to mount the gluster volume using storage node IP itself (not VIP
> ip), and I was able to mount the gluster volume. Afterward, I flipped
> all the clients to mount storage node IP directly and they have been up
> for more than 12hrs without any issue.
>
> Any idea what might cause this issue?
>
> Thanks a lot,
>
> ~ Vic Le
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>

Dung Le

2016-Sep-23 17:14 UTC

head link

[Gluster-users] pacemaker VIP routing latency to gluster node.

Hi Soumya,
> Did you check 'pcs status' output that time? Maybe the *-ClusterIP*
resources would have gone to Stopped state, making VIPs unavailable.
Yes, I did check the ?pcs status? and everything was good at the time. 

I just hit the issue again with VIP mounting and df output yesterday. 

On the client 1, DF output was hung . I also could NOT mount the gluster volume
via VIP x.x.x.001, but I could mount the gluster volume via VIP x.x.x.002 &
x.x.x.003.
On the client 2, I could mount the gluster volume via VIP  x.x.x.001 & 
x.x.x.002 &  x.x.x.003.

Since I did configure pacemaker VIP ip x.x.x.001 for SN1, so I went ahead to
stop pcs service on SN1 ?pcs cluster stop?. The VIP ip x.x.x.001 failover to SN2
as my configuration, afterward I could mount the gluster volume via VIP?s IP
x.x.x.001 on the client 1.

Any idea ??

Thanks,
~ Vic Le
> On Sep 23, 2016, at 1:33 AM, Soumya Koduri <skoduri at redhat.com>
wrote:
> 
> 
> 
> On 09/23/2016 02:34 AM, Dung Le wrote:
>> Hello,
>> 
>> I have a pretty straight forward configuration as below:
>> 
>> 3 storage nodes running version 3.7.11 with replica of 3 and it using
>> native gluster NFS.
>> corosync version 1.4.7 and pacemaker version 1.1.12
>> I have DNS round-robin on 3 VIPs living on the 3 storage nodes.
>> 
>> *_Here is how I configure my corosync:_*
>> 
>> SN1 with x.x.x.001
>> SN2 with x.x.x.002
>> SN3 with x.x.x.003
>> 
>> 
>>
******************************************************************************************************************
>> *_Below is pcs config output:_*
>> 
>> Cluster Name: dfs_cluster
>> Corosync Nodes:
>> SN1 SN2 SN3
>> Pacemaker Nodes:
>> SN1 SN2 SN3
>> 
>> Resources:
>> Clone: Gluster-clone
>>  Meta Attrs: clone-max=3 clone-node-max=3 globally-unique=false
>>  Resource: Gluster (class=ocf provider=glusterfs type=glusterd)
>>   Operations: start interval=0s timeout=20 (Gluster-start-interval-0s)
>>               stop interval=0s timeout=20 (Gluster-stop-interval-0s)
>>               monitor interval=10s (Gluster-monitor-interval-10s)
>> Resource: SN1-ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
>>  Attributes: ip=x.x.x.001 cidr_netmask=32
>>  Operations: start interval=0s timeout=20s
>> (SN1-ClusterIP-start-interval-0s)
>>              stop interval=0s timeout=20s
(SN1-ClusterIP-stop-interval-0s)
>>              monitor interval=10s (SN1-ClusterIP-monitor-interval-10s)
>> Resource: SN2-ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
>>  Attributes: ip=x.x.x.002 cidr_netmask=32
>>  Operations: start interval=0s timeout=20s
>> (SN2-ClusterIP-start-interval-0s)
>>              stop interval=0s timeout=20s
(SN2-ClusterIP-stop-interval-0s)
>>              monitor interval=10s (SN2-ClusterIP-monitor-interval-10s)
>> Resource: SN3-ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
>>  Attributes: ip=x.x.x.003 cidr_netmask=32
>>  Operations: start interval=0s timeout=20s
>> (SN3-ClusterIP-start-interval-0s)
>>              stop interval=0s timeout=20s
(SN3-ClusterIP-stop-interval-0s)
>>              monitor interval=10s (SN3-ClusterIP-monitor-interval-10s)
>> 
>> Stonith Devices:
>> Fencing Levels:
>> 
>> Location Constraints:
>>  Resource: SN1-ClusterIP
>>    Enabled on: SN1 (score:3000) (id:location-SN1-ClusterIP-SN1-3000)
>>    Enabled on: SN2 (score:2000) (id:location-SN1-ClusterIP-SN2-2000)
>>    Enabled on: SN3 (score:1000) (id:location-SN1-ClusterIP-SN3-1000)
>>  Resource: SN2-ClusterIP
>>    Enabled on: SN2 (score:3000) (id:location-SN2-ClusterIP-SN2-3000)
>>    Enabled on: SN3 (score:2000) (id:location-SN2-ClusterIP-SN3-2000)
>>    Enabled on: SN1 (score:1000) (id:location-SN2-ClusterIP-SN1-1000)
>>  Resource: SN3-ClusterIP
>>    Enabled on: SN3 (score:3000) (id:location-SN3-ClusterIP-SN3-3000)
>>    Enabled on: SN1 (score:2000) (id:location-SN3-ClusterIP-SN1-2000)
>>    Enabled on: SN2 (score:1000) (id:location-SN3-ClusterIP-SN2-1000)
>> Ordering Constraints:
>>  start Gluster-clone then start SN1-ClusterIP (kind:Mandatory)
>> (id:order-Gluster-clone-SN1-ClusterIP-mandatory)
>>  start Gluster-clone then start SN2-ClusterIP (kind:Mandatory)
>> (id:order-Gluster-clone-SN2-ClusterIP-mandatory)
>>  start Gluster-clone then start SN3-ClusterIP (kind:Mandatory)
>> (id:order-Gluster-clone-SN3-ClusterIP-mandatory)
>> Colocation Constraints:
>> 
>> Resources Defaults:
>> is-managed: true
>> target-role: Started
>> requires: nothing
>> multiple-active: stop_nkart
>> Operations Defaults:
>> No defaults set
>> 
>> Cluster Properties:
>> cluster-infrastructure: cman
>> dc-version: 1.1.11-97629de
>> no-quorum-policy: ignore
>> stonith-enabled: false
>> 
>>
******************************************************************************************************************
>> *_pcs status output:_*
>> 
>> Cluster name: dfs_cluster
>> Last updated: Thu Sep 22 16:57:35 2016
>> Last change: Mon Aug 29 18:02:44 2016
>> Stack: cman
>> Current DC: SN1 - partition with quorum
>> Version: 1.1.11-97629de
>> 3 Nodes configured
>> 6 Resources configured
>> 
>> 
>> Online: [ SN1 SN2 SN3 ]
>> 
>> Full list of resources:
>> 
>> Clone Set: Gluster-clone [Gluster]
>>     Started: [ SN1 SN2 SN3 ]
>> SN1-ClusterIP(ocf::heartbeat:IPaddr2):Started SN1
>> SN2-ClusterIP(ocf::heartbeat:IPaddr2):Started SN2
>> SN3-ClusterIP(ocf::heartbeat:IPaddr2):Started SN3
>> 
>>
******************************************************************************************************************
>> 
>> 
>> When I mount the gluster volume, I'm using the VIP name. It will
choose
>> one of the storage nodes to establish NFS.
>> 
>> *_My issue is:_*
>> *_
>> _*
>> After mounted gluster volume for 1 - 2 hrs, all the clients are
>> reporting not getting df output as df got hung. I did check the
dmessage
>> log from client side and getting the following error :
>> 
>> /Sep 20 05:46:45 xxxxx kernel: nfs: server nfsserver001 not responding,
>> still trying/
>> /Sep 20 05:49:45 xxxxx kernel: nfs: server nfsserver001 not responding,
>> still trying/
>> 
>> I did try to mount the gluster volume using the DNS round-robin to
>> different mountpoint but the mount process was not successful.
> 
> Did you check 'pcs status' output that time? Maybe the *-ClusterIP*
resources would have gone to Stopped state, making VIPs unavailable.
> 
> Thanks,
> Soumya
> 
> Then I
>> tried to mount the gluster volume using storage node IP itself (not VIP
>> ip), and I was able to mount the gluster volume. Afterward, I flipped
>> all the clients to mount storage node IP directly and they have been up
>> for more than 12hrs without any issue.
>> 
>> Any idea what might cause this issue?
>> 
>> Thanks a lot,
>> 
>> ~ Vic Le
>> 
>> 
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160923/c7eb7921/attachment.html>

Gluster users - Sep 2016 - pacemaker VIP routing latency to gluster node.

[Gluster-users] pacemaker VIP routing latency to gluster node.

[Gluster-users] pacemaker VIP routing latency to gluster node.