Soumya Koduri
2016-Sep-23 08:33 UTC
[Gluster-users] pacemaker VIP routing latency to gluster node.
On 09/23/2016 02:34 AM, Dung Le wrote:> Hello, > > I have a pretty straight forward configuration as below: > > 3 storage nodes running version 3.7.11 with replica of 3 and it using > native gluster NFS. > corosync version 1.4.7 and pacemaker version 1.1.12 > I have DNS round-robin on 3 VIPs living on the 3 storage nodes. > > *_Here is how I configure my corosync:_* > > SN1 with x.x.x.001 > SN2 with x.x.x.002 > SN3 with x.x.x.003 > > > ****************************************************************************************************************** > *_Below is pcs config output:_* > > Cluster Name: dfs_cluster > Corosync Nodes: > SN1 SN2 SN3 > Pacemaker Nodes: > SN1 SN2 SN3 > > Resources: > Clone: Gluster-clone > Meta Attrs: clone-max=3 clone-node-max=3 globally-unique=false > Resource: Gluster (class=ocf provider=glusterfs type=glusterd) > Operations: start interval=0s timeout=20 (Gluster-start-interval-0s) > stop interval=0s timeout=20 (Gluster-stop-interval-0s) > monitor interval=10s (Gluster-monitor-interval-10s) > Resource: SN1-ClusterIP (class=ocf provider=heartbeat type=IPaddr2) > Attributes: ip=x.x.x.001 cidr_netmask=32 > Operations: start interval=0s timeout=20s > (SN1-ClusterIP-start-interval-0s) > stop interval=0s timeout=20s (SN1-ClusterIP-stop-interval-0s) > monitor interval=10s (SN1-ClusterIP-monitor-interval-10s) > Resource: SN2-ClusterIP (class=ocf provider=heartbeat type=IPaddr2) > Attributes: ip=x.x.x.002 cidr_netmask=32 > Operations: start interval=0s timeout=20s > (SN2-ClusterIP-start-interval-0s) > stop interval=0s timeout=20s (SN2-ClusterIP-stop-interval-0s) > monitor interval=10s (SN2-ClusterIP-monitor-interval-10s) > Resource: SN3-ClusterIP (class=ocf provider=heartbeat type=IPaddr2) > Attributes: ip=x.x.x.003 cidr_netmask=32 > Operations: start interval=0s timeout=20s > (SN3-ClusterIP-start-interval-0s) > stop interval=0s timeout=20s (SN3-ClusterIP-stop-interval-0s) > monitor interval=10s (SN3-ClusterIP-monitor-interval-10s) > > Stonith Devices: > Fencing Levels: > > Location Constraints: > Resource: SN1-ClusterIP > Enabled on: SN1 (score:3000) (id:location-SN1-ClusterIP-SN1-3000) > Enabled on: SN2 (score:2000) (id:location-SN1-ClusterIP-SN2-2000) > Enabled on: SN3 (score:1000) (id:location-SN1-ClusterIP-SN3-1000) > Resource: SN2-ClusterIP > Enabled on: SN2 (score:3000) (id:location-SN2-ClusterIP-SN2-3000) > Enabled on: SN3 (score:2000) (id:location-SN2-ClusterIP-SN3-2000) > Enabled on: SN1 (score:1000) (id:location-SN2-ClusterIP-SN1-1000) > Resource: SN3-ClusterIP > Enabled on: SN3 (score:3000) (id:location-SN3-ClusterIP-SN3-3000) > Enabled on: SN1 (score:2000) (id:location-SN3-ClusterIP-SN1-2000) > Enabled on: SN2 (score:1000) (id:location-SN3-ClusterIP-SN2-1000) > Ordering Constraints: > start Gluster-clone then start SN1-ClusterIP (kind:Mandatory) > (id:order-Gluster-clone-SN1-ClusterIP-mandatory) > start Gluster-clone then start SN2-ClusterIP (kind:Mandatory) > (id:order-Gluster-clone-SN2-ClusterIP-mandatory) > start Gluster-clone then start SN3-ClusterIP (kind:Mandatory) > (id:order-Gluster-clone-SN3-ClusterIP-mandatory) > Colocation Constraints: > > Resources Defaults: > is-managed: true > target-role: Started > requires: nothing > multiple-active: stop_nkart > Operations Defaults: > No defaults set > > Cluster Properties: > cluster-infrastructure: cman > dc-version: 1.1.11-97629de > no-quorum-policy: ignore > stonith-enabled: false > > ****************************************************************************************************************** > *_pcs status output:_* > > Cluster name: dfs_cluster > Last updated: Thu Sep 22 16:57:35 2016 > Last change: Mon Aug 29 18:02:44 2016 > Stack: cman > Current DC: SN1 - partition with quorum > Version: 1.1.11-97629de > 3 Nodes configured > 6 Resources configured > > > Online: [ SN1 SN2 SN3 ] > > Full list of resources: > > Clone Set: Gluster-clone [Gluster] > Started: [ SN1 SN2 SN3 ] > SN1-ClusterIP(ocf::heartbeat:IPaddr2):Started SN1 > SN2-ClusterIP(ocf::heartbeat:IPaddr2):Started SN2 > SN3-ClusterIP(ocf::heartbeat:IPaddr2):Started SN3 > > ****************************************************************************************************************** > > > When I mount the gluster volume, I'm using the VIP name. It will choose > one of the storage nodes to establish NFS. > > *_My issue is:_* > *_ > _* > After mounted gluster volume for 1 - 2 hrs, all the clients are > reporting not getting df output as df got hung. I did check the dmessage > log from client side and getting the following error : > > /Sep 20 05:46:45 xxxxx kernel: nfs: server nfsserver001 not responding, > still trying/ > /Sep 20 05:49:45 xxxxx kernel: nfs: server nfsserver001 not responding, > still trying/ > > I did try to mount the gluster volume using the DNS round-robin to > different mountpoint but the mount process was not successful.Did you check 'pcs status' output that time? Maybe the *-ClusterIP* resources would have gone to Stopped state, making VIPs unavailable. Thanks, Soumya Then I> tried to mount the gluster volume using storage node IP itself (not VIP > ip), and I was able to mount the gluster volume. Afterward, I flipped > all the clients to mount storage node IP directly and they have been up > for more than 12hrs without any issue. > > Any idea what might cause this issue? > > Thanks a lot, > > ~ Vic Le > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users >
Dung Le
2016-Sep-23 17:14 UTC
[Gluster-users] pacemaker VIP routing latency to gluster node.
Hi Soumya,> Did you check 'pcs status' output that time? Maybe the *-ClusterIP* resources would have gone to Stopped state, making VIPs unavailable.Yes, I did check the ?pcs status? and everything was good at the time. I just hit the issue again with VIP mounting and df output yesterday. On the client 1, DF output was hung . I also could NOT mount the gluster volume via VIP x.x.x.001, but I could mount the gluster volume via VIP x.x.x.002 & x.x.x.003. On the client 2, I could mount the gluster volume via VIP x.x.x.001 & x.x.x.002 & x.x.x.003. Since I did configure pacemaker VIP ip x.x.x.001 for SN1, so I went ahead to stop pcs service on SN1 ?pcs cluster stop?. The VIP ip x.x.x.001 failover to SN2 as my configuration, afterward I could mount the gluster volume via VIP?s IP x.x.x.001 on the client 1. Any idea ?? Thanks, ~ Vic Le> On Sep 23, 2016, at 1:33 AM, Soumya Koduri <skoduri at redhat.com> wrote: > > > > On 09/23/2016 02:34 AM, Dung Le wrote: >> Hello, >> >> I have a pretty straight forward configuration as below: >> >> 3 storage nodes running version 3.7.11 with replica of 3 and it using >> native gluster NFS. >> corosync version 1.4.7 and pacemaker version 1.1.12 >> I have DNS round-robin on 3 VIPs living on the 3 storage nodes. >> >> *_Here is how I configure my corosync:_* >> >> SN1 with x.x.x.001 >> SN2 with x.x.x.002 >> SN3 with x.x.x.003 >> >> >> ****************************************************************************************************************** >> *_Below is pcs config output:_* >> >> Cluster Name: dfs_cluster >> Corosync Nodes: >> SN1 SN2 SN3 >> Pacemaker Nodes: >> SN1 SN2 SN3 >> >> Resources: >> Clone: Gluster-clone >> Meta Attrs: clone-max=3 clone-node-max=3 globally-unique=false >> Resource: Gluster (class=ocf provider=glusterfs type=glusterd) >> Operations: start interval=0s timeout=20 (Gluster-start-interval-0s) >> stop interval=0s timeout=20 (Gluster-stop-interval-0s) >> monitor interval=10s (Gluster-monitor-interval-10s) >> Resource: SN1-ClusterIP (class=ocf provider=heartbeat type=IPaddr2) >> Attributes: ip=x.x.x.001 cidr_netmask=32 >> Operations: start interval=0s timeout=20s >> (SN1-ClusterIP-start-interval-0s) >> stop interval=0s timeout=20s (SN1-ClusterIP-stop-interval-0s) >> monitor interval=10s (SN1-ClusterIP-monitor-interval-10s) >> Resource: SN2-ClusterIP (class=ocf provider=heartbeat type=IPaddr2) >> Attributes: ip=x.x.x.002 cidr_netmask=32 >> Operations: start interval=0s timeout=20s >> (SN2-ClusterIP-start-interval-0s) >> stop interval=0s timeout=20s (SN2-ClusterIP-stop-interval-0s) >> monitor interval=10s (SN2-ClusterIP-monitor-interval-10s) >> Resource: SN3-ClusterIP (class=ocf provider=heartbeat type=IPaddr2) >> Attributes: ip=x.x.x.003 cidr_netmask=32 >> Operations: start interval=0s timeout=20s >> (SN3-ClusterIP-start-interval-0s) >> stop interval=0s timeout=20s (SN3-ClusterIP-stop-interval-0s) >> monitor interval=10s (SN3-ClusterIP-monitor-interval-10s) >> >> Stonith Devices: >> Fencing Levels: >> >> Location Constraints: >> Resource: SN1-ClusterIP >> Enabled on: SN1 (score:3000) (id:location-SN1-ClusterIP-SN1-3000) >> Enabled on: SN2 (score:2000) (id:location-SN1-ClusterIP-SN2-2000) >> Enabled on: SN3 (score:1000) (id:location-SN1-ClusterIP-SN3-1000) >> Resource: SN2-ClusterIP >> Enabled on: SN2 (score:3000) (id:location-SN2-ClusterIP-SN2-3000) >> Enabled on: SN3 (score:2000) (id:location-SN2-ClusterIP-SN3-2000) >> Enabled on: SN1 (score:1000) (id:location-SN2-ClusterIP-SN1-1000) >> Resource: SN3-ClusterIP >> Enabled on: SN3 (score:3000) (id:location-SN3-ClusterIP-SN3-3000) >> Enabled on: SN1 (score:2000) (id:location-SN3-ClusterIP-SN1-2000) >> Enabled on: SN2 (score:1000) (id:location-SN3-ClusterIP-SN2-1000) >> Ordering Constraints: >> start Gluster-clone then start SN1-ClusterIP (kind:Mandatory) >> (id:order-Gluster-clone-SN1-ClusterIP-mandatory) >> start Gluster-clone then start SN2-ClusterIP (kind:Mandatory) >> (id:order-Gluster-clone-SN2-ClusterIP-mandatory) >> start Gluster-clone then start SN3-ClusterIP (kind:Mandatory) >> (id:order-Gluster-clone-SN3-ClusterIP-mandatory) >> Colocation Constraints: >> >> Resources Defaults: >> is-managed: true >> target-role: Started >> requires: nothing >> multiple-active: stop_nkart >> Operations Defaults: >> No defaults set >> >> Cluster Properties: >> cluster-infrastructure: cman >> dc-version: 1.1.11-97629de >> no-quorum-policy: ignore >> stonith-enabled: false >> >> ****************************************************************************************************************** >> *_pcs status output:_* >> >> Cluster name: dfs_cluster >> Last updated: Thu Sep 22 16:57:35 2016 >> Last change: Mon Aug 29 18:02:44 2016 >> Stack: cman >> Current DC: SN1 - partition with quorum >> Version: 1.1.11-97629de >> 3 Nodes configured >> 6 Resources configured >> >> >> Online: [ SN1 SN2 SN3 ] >> >> Full list of resources: >> >> Clone Set: Gluster-clone [Gluster] >> Started: [ SN1 SN2 SN3 ] >> SN1-ClusterIP(ocf::heartbeat:IPaddr2):Started SN1 >> SN2-ClusterIP(ocf::heartbeat:IPaddr2):Started SN2 >> SN3-ClusterIP(ocf::heartbeat:IPaddr2):Started SN3 >> >> ****************************************************************************************************************** >> >> >> When I mount the gluster volume, I'm using the VIP name. It will choose >> one of the storage nodes to establish NFS. >> >> *_My issue is:_* >> *_ >> _* >> After mounted gluster volume for 1 - 2 hrs, all the clients are >> reporting not getting df output as df got hung. I did check the dmessage >> log from client side and getting the following error : >> >> /Sep 20 05:46:45 xxxxx kernel: nfs: server nfsserver001 not responding, >> still trying/ >> /Sep 20 05:49:45 xxxxx kernel: nfs: server nfsserver001 not responding, >> still trying/ >> >> I did try to mount the gluster volume using the DNS round-robin to >> different mountpoint but the mount process was not successful. > > Did you check 'pcs status' output that time? Maybe the *-ClusterIP* resources would have gone to Stopped state, making VIPs unavailable. > > Thanks, > Soumya > > Then I >> tried to mount the gluster volume using storage node IP itself (not VIP >> ip), and I was able to mount the gluster volume. Afterward, I flipped >> all the clients to mount storage node IP directly and they have been up >> for more than 12hrs without any issue. >> >> Any idea what might cause this issue? >> >> Thanks a lot, >> >> ~ Vic Le >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://www.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160923/c7eb7921/attachment.html>