Tomalak Geret'kal
2017-Dec-07 13:29 UTC
[Gluster-users] GlusterFS, Pacemaker, OCF resource agents on CentOS 7
Hi guys I'm wondering if anyone here is using the GlusterFS OCF resource agents with Pacemaker on CentOS 7? yum install centos-release-gluster yum install glusterfs-server glusterfs-resource-agents The reason I ask is that there seem to be a few problems with them on 3.10, but these problems are so severe that I'm struggling to believe I'm not just doing something wrong. I created my brick (on a volume previously used for DRBD, thus its name): mkfs.xfs /dev/cl/lv_drbd -f mkdir -p /gluster/test_brick mount -t xfs /dev/cl/lv_drbd /gluster And then my volume (enabling clients to mount it via NFS): systemctl start glusterd gluster volume create logs replica 2 transport tcp pcmk01-drbd:/gluster/test_brick pcmk02-drbd:/gluster/test_brick gluster volume start test_logs gluster volume set test_logs nfs.disable off And here's where the fun starts. Firstly, we need to work around bug 1233344* (which was closed when 3.7 went end-of-life but still seems valid in 3.10): sed -i 's#voldir="/etc/glusterd/vols/${OCF_RESKEY_volname}"#voldir="/var/lib/glusterd/vols/${OCF_RESKEY_volname}"#' /usr/lib/ocf/resource.d/glusterfs/volume With that done, I [attempt to] stop GlusterFS so it can be brought under Pacemaker control: systemctl stop glusterfsd systemctl stop glusterd umount /gluster (I usually have to manually kill glusterfs processes at this point before the unmount works - why does the systemctl stop not do it?) With the node in standby (just one is online in this example, but another is configured), I then set up the resources: pcs node standby pcs resource create gluster_data ocf:heartbeat:Filesystem device="/dev/cl/lv_drbd" directory="/gluster" fstype="xfs" pcs resource create glusterd ocf:glusterfs:glusterd pcs resource create gluster_vol ocf:glusterfs:volume volname="test_logs" pcs resource create test_logs ocf:heartbeat:Filesystem \ ??? device="localhost:/test_logs" directory="/var/log/test" fstype="nfs" \ ??? options="vers=3,tcp,nolock,context=system_u:object_r:httpd_sys_content_t:s0" \ ??? op monitor OCF_CHECK_LEVEL="20" pcs resource clone glusterd pcs resource clone gluster_data pcs resource clone gluster_vol ordered=true pcs constraint order start gluster_data-clone then start glusterd-clone pcs constraint order start glusterd-clone then start gluster_vol-clone pcs constraint order start gluster_vol-clone then start test_logs pcs constraint colocation add test_logs with FloatingIp INFINITY (note the SELinux wrangling - this is because I have a CGI web application which will later need to read files from the /var/log/test mount) At this point, even with the node in standby, it's /already/ failing: [root at pcmk01 ~]# pcs status Cluster name: test_cluster Stack: corosync Current DC: pcmk01-cr (version 1.1.15-11.el7_3.5-e174ec8) - partition WITHOUT quorum Last updated: Thu Dec? 7 13:20:41 2017????????? Last change: Thu Dec? 7 13:09:33 2017 by root via crm_attribute on pcmk01-cr 2 nodes and 13 resources configured Online: [ pcmk01-cr ] OFFLINE: [ pcmk02-cr ] Full list of resources: ?FloatingIp???? (ocf::heartbeat:IPaddr2):?????? Started pcmk01-cr ?test_logs????? (ocf::heartbeat:Filesystem):??? Stopped ?Clone Set: glusterd-clone [glusterd] ???? Stopped: [ pcmk01-cr pcmk02-cr ] ?Clone Set: gluster_data-clone [gluster_data] ???? Stopped: [ pcmk01-cr pcmk02-cr ] ?Clone Set: gluster_vol-clone [gluster_vol] ???? gluster_vol??????? (ocf::glusterfs:volume):??????? FAILED pcmk01-cr (blocked) ???? Stopped: [ pcmk02-cr ] Failed Actions: * gluster_data_start_0 on pcmk01-cr 'not configured' (6): call=72, status=complete, exitreason='DANGER! xfs on /dev/cl/lv_drbd is NOT cluster-aware!', ??? last-rc-change='Thu Dec? 7 13:09:28 2017', queued=0ms, exec=250ms * gluster_vol_stop_0 on pcmk01-cr 'unknown error' (1): call=60, status=Timed Out, exitreason='none', ??? last-rc-change='Thu Dec? 7 12:55:11 2017', queued=0ms, exec=20004ms Daemon Status: ? corosync: active/enabled ? pacemaker: active/enabled ? pcsd: active/enabled 1. The data mount can't be created? Why? 2. Why is there a volume "stop" command being attempted, and why does it fail? 3. Why is any of this happening in standby? I can't have the resources failing before I've even made the node live! I could understand why a gluster_vol start operation would fail when glusterd is (correctly) stopped, but why is there a *stop* operation? And why does that make the resource "blocked"? Given the above steps, is there something fundamental I'm missing about how these resource agents should be used? How do *you* configure GlusterFS on Pacemaker? Any advice appreciated. Best regards * https://bugzilla.redhat.com/show_bug.cgi?id=1233344 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20171207/51ab4c9a/attachment.html>
Hetz Ben Hamo
2017-Dec-07 13:47 UTC
[Gluster-users] GlusterFS, Pacemaker, OCF resource agents on CentOS 7
> > With the node in standby (just one is online in this example, but another > is configured), I then set up the resources: > > pcs node standby > pcs resource create gluster_data ocf:heartbeat:Filesystem > device="/dev/cl/lv_drbd" directory="/gluster" fstype="xfs" > pcs resource create glusterd ocf:glusterfs:glusterd > pcs resource create gluster_vol ocf:glusterfs:volume volname="test_logs" > pcs resource create test_logs ocf:heartbeat:Filesystem \ > device="localhost:/test_logs" directory="/var/log/test" fstype="nfs" \ > options="vers=3,tcp,nolock,context=system_u:object_r:httpd_sys_content_t:s0" > \ > op monitor OCF_CHECK_LEVEL="20" > pcs resource clone glusterd > pcs resource clone gluster_data > pcs resource clone gluster_vol ordered=true > pcs constraint order start gluster_data-clone then start glusterd-clone > pcs constraint order start glusterd-clone then start gluster_vol-clone > pcs constraint order start gluster_vol-clone then start test_logs > pcs constraint colocation add test_logs with FloatingIp INFINITY > > > Out of curiosity, did you write it or you found those commands somewhereelse? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20171207/fd16e8ab/attachment.html>
Tomalak Geret'kal
2017-Dec-07 13:48 UTC
[Gluster-users] GlusterFS, Pacemaker, OCF resource agents on CentOS 7
On 07/12/2017 13:47, Hetz Ben Hamo wrote:> Out of curiosity,? did you write it or you found those > commands somewhere else?I wrote it, using what little experience of pcs I have so far - haven't actually been able to find any documented steps/instructions. :( Although this blog - http://www.tomvernon.co.uk/blog/2015/01/gluster-activepassive-cluster/ - suggested I was on the right path. Cheers -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20171207/2dd232e6/attachment.html>
Jiffin Tony Thottan
2017-Dec-08 10:17 UTC
[Gluster-users] GlusterFS, Pacemaker, OCF resource agents on CentOS 7
Hi, Can u please explain for what purpose pacemaker cluster used here? Regards, Jiffin On Thursday 07 December 2017 06:59 PM, Tomalak Geret'kal wrote:> > Hi guys > > I'm wondering if anyone here is using the GlusterFS OCF resource > agents with Pacemaker on CentOS 7? > > yum install centos-release-gluster > yum install glusterfs-server glusterfs-resource-agents > > The reason I ask is that there seem to be a few problems with them on > 3.10, but these problems are so severe that I'm struggling to believe > I'm not just doing something wrong. > > I created my brick (on a volume previously used for DRBD, thus its name): > > mkfs.xfs /dev/cl/lv_drbd -f > mkdir -p /gluster/test_brick > mount -t xfs /dev/cl/lv_drbd /gluster > > And then my volume (enabling clients to mount it via NFS): > > systemctl start glusterd > gluster volume create logs replica 2 transport tcp > pcmk01-drbd:/gluster/test_brick pcmk02-drbd:/gluster/test_brick > gluster volume start test_logs > gluster volume set test_logs nfs.disable off > > And here's where the fun starts. > > Firstly, we need to work around bug 1233344* (which was closed when > 3.7 went end-of-life but still seems valid in 3.10): > > sed -i > 's#voldir="/etc/glusterd/vols/${OCF_RESKEY_volname}"#voldir="/var/lib/glusterd/vols/${OCF_RESKEY_volname}"#' > /usr/lib/ocf/resource.d/glusterfs/volume > > With that done, I [attempt to] stop GlusterFS so it can be brought > under Pacemaker control: > > systemctl stop glusterfsd > systemctl stop glusterd > umount /gluster > > (I usually have to manually kill glusterfs processes at this point > before the unmount works - why does the systemctl stop not do it?) > > With the node in standby (just one is online in this example, but > another is configured), I then set up the resources: > > pcs node standby > pcs resource create gluster_data ocf:heartbeat:Filesystem > device="/dev/cl/lv_drbd" directory="/gluster" fstype="xfs" > pcs resource create glusterd ocf:glusterfs:glusterd > pcs resource create gluster_vol ocf:glusterfs:volume volname="test_logs" > pcs resource create test_logs ocf:heartbeat:Filesystem \ > ??? device="localhost:/test_logs" directory="/var/log/test" fstype="nfs" \ > options="vers=3,tcp,nolock,context=system_u:object_r:httpd_sys_content_t:s0" > \ > ??? op monitor OCF_CHECK_LEVEL="20" > pcs resource clone glusterd > pcs resource clone gluster_data > pcs resource clone gluster_vol ordered=true > pcs constraint order start gluster_data-clone then start glusterd-clone > pcs constraint order start glusterd-clone then start gluster_vol-clone > pcs constraint order start gluster_vol-clone then start test_logs > pcs constraint colocation add test_logs with FloatingIp INFINITY > > (note the SELinux wrangling - this is because I have a CGI web > application which will later need to read files from the /var/log/test > mount) > > At this point, even with the node in standby, it's /already/ failing: > > [root at pcmk01 ~]# pcs status > Cluster name: test_cluster > Stack: corosync > Current DC: pcmk01-cr (version 1.1.15-11.el7_3.5-e174ec8) - partition > WITHOUT quorum > Last updated: Thu Dec? 7 13:20:41 2017????????? Last change: Thu Dec? > 7 13:09:33 2017 by root via crm_attribute on pcmk01-cr > > 2 nodes and 13 resources configured > > Online: [ pcmk01-cr ] > OFFLINE: [ pcmk02-cr ] > > Full list of resources: > > ?FloatingIp???? (ocf::heartbeat:IPaddr2):?????? Started pcmk01-cr > ?test_logs????? (ocf::heartbeat:Filesystem):??? Stopped > ?Clone Set: glusterd-clone [glusterd] > ???? Stopped: [ pcmk01-cr pcmk02-cr ] > ?Clone Set: gluster_data-clone [gluster_data] > ???? Stopped: [ pcmk01-cr pcmk02-cr ] > ?Clone Set: gluster_vol-clone [gluster_vol] > ???? gluster_vol??????? (ocf::glusterfs:volume): FAILED pcmk01-cr > (blocked) > ???? Stopped: [ pcmk02-cr ] > > Failed Actions: > * gluster_data_start_0 on pcmk01-cr 'not configured' (6): call=72, > status=complete, exitreason='DANGER! xfs on /dev/cl/lv_drbd is NOT > cluster-aware!', > ??? last-rc-change='Thu Dec? 7 13:09:28 2017', queued=0ms, exec=250ms > * gluster_vol_stop_0 on pcmk01-cr 'unknown error' (1): call=60, > status=Timed Out, exitreason='none', > ??? last-rc-change='Thu Dec? 7 12:55:11 2017', queued=0ms, exec=20004ms > > > Daemon Status: > ? corosync: active/enabled > ? pacemaker: active/enabled > ? pcsd: active/enabled > > 1. The data mount can't be created? Why? > 2. Why is there a volume "stop" command being attempted, and why does > it fail? > 3. Why is any of this happening in standby? I can't have the resources > failing before I've even made the node live! I could understand why a > gluster_vol start operation would fail when glusterd is (correctly) > stopped, but why is there a *stop* operation? And why does that make > the resource "blocked"? > > Given the above steps, is there something fundamental I'm missing > about how these resource agents should be used? How do *you* configure > GlusterFS on Pacemaker? > > Any advice appreciated. > > Best regards > > > * https://bugzilla.redhat.com/show_bug.cgi?id=1233344 > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20171208/c2780cd1/attachment.html>
Tomalak Geret'kal
2017-Dec-08 10:55 UTC
[Gluster-users] GlusterFS, Pacemaker, OCF resource agents on CentOS 7
Hi Jiffin Pacemaker clusters allow us to effectively distribute services across multiple computers. In my case, I am creating an active-passive cluster for my software, and my software relies on Apache, MySQL and GlusterFS. Thus, I want GlusterFS to be controlled by Pacemaker so that: 1. A node can be deemed "bad" if GlusterFS is not running (using constraints to prohibit failover to a bad node) 2. The GlusterFS volume can be automatically mounted on whatever's the active node 3. Services all go into standby together Is this not the recommended approach? What else should I do? Thanks On 08/12/2017 10:17, Jiffin Tony Thottan wrote:> > Hi, > > Can u please explain for what purpose pacemaker cluster > used here? > > Regards, > > Jiffin > > > On Thursday 07 December 2017 06:59 PM, Tomalak Geret'kal > wrote: >> >> Hi guys >> >> I'm wondering if anyone here is using the GlusterFS OCF >> resource agents with Pacemaker on CentOS 7? >> >> yum install centos-release-gluster >> yum install glusterfs-server glusterfs-resource-agents >> >> The reason I ask is that there seem to be a few problems >> with them on 3.10, but these problems are so severe that >> I'm struggling to believe I'm not just doing something wrong. >> >> I created my brick (on a volume previously used for DRBD, >> thus its name): >> >> mkfs.xfs /dev/cl/lv_drbd -f >> mkdir -p /gluster/test_brick >> mount -t xfs /dev/cl/lv_drbd /gluster >> >> And then my volume (enabling clients to mount it via NFS): >> >> systemctl start glusterd >> gluster volume create logs replica 2 transport tcp >> pcmk01-drbd:/gluster/test_brick >> pcmk02-drbd:/gluster/test_brick >> gluster volume start test_logs >> gluster volume set test_logs nfs.disable off >> >> And here's where the fun starts. >> >> Firstly, we need to work around bug 1233344* (which was >> closed when 3.7 went end-of-life but still seems valid in >> 3.10): >> >> sed -i >> 's#voldir="/etc/glusterd/vols/${OCF_RESKEY_volname}"#voldir="/var/lib/glusterd/vols/${OCF_RESKEY_volname}"#' >> /usr/lib/ocf/resource.d/glusterfs/volume >> >> With that done, I [attempt to] stop GlusterFS so it can >> be brought under Pacemaker control: >> >> systemctl stop glusterfsd >> systemctl stop glusterd >> umount /gluster >> >> (I usually have to manually kill glusterfs processes at >> this point before the unmount works - why does the >> systemctl stop not do it?) >> >> With the node in standby (just one is online in this >> example, but another is configured), I then set up the >> resources: >> >> pcs node standby >> pcs resource create gluster_data ocf:heartbeat:Filesystem >> device="/dev/cl/lv_drbd" directory="/gluster" fstype="xfs" >> pcs resource create glusterd ocf:glusterfs:glusterd >> pcs resource create gluster_vol ocf:glusterfs:volume >> volname="test_logs" >> pcs resource create test_logs ocf:heartbeat:Filesystem \ >> ??? device="localhost:/test_logs" >> directory="/var/log/test" fstype="nfs" \ >> ??? >> options="vers=3,tcp,nolock,context=system_u:object_r:httpd_sys_content_t:s0" >> \ >> ??? op monitor OCF_CHECK_LEVEL="20" >> pcs resource clone glusterd >> pcs resource clone gluster_data >> pcs resource clone gluster_vol ordered=true >> pcs constraint order start gluster_data-clone then start >> glusterd-clone >> pcs constraint order start glusterd-clone then start >> gluster_vol-clone >> pcs constraint order start gluster_vol-clone then start >> test_logs >> pcs constraint colocation add test_logs with FloatingIp >> INFINITY >> >> (note the SELinux wrangling - this is because I have a >> CGI web application which will later need to read files >> from the /var/log/test mount) >> >> At this point, even with the node in standby, it's >> /already/ failing: >> >> [root at pcmk01 ~]# pcs status >> Cluster name: test_cluster >> Stack: corosync >> Current DC: pcmk01-cr (version 1.1.15-11.el7_3.5-e174ec8) >> - partition WITHOUT quorum >> Last updated: Thu Dec? 7 13:20:41 2017????????? Last >> change: Thu Dec? 7 13:09:33 2017 by root via >> crm_attribute on pcmk01-cr >> >> 2 nodes and 13 resources configured >> >> Online: [ pcmk01-cr ] >> OFFLINE: [ pcmk02-cr ] >> >> Full list of resources: >> >> ?FloatingIp???? (ocf::heartbeat:IPaddr2):?????? Started >> pcmk01-cr >> ?test_logs????? (ocf::heartbeat:Filesystem):??? Stopped >> ?Clone Set: glusterd-clone [glusterd] >> ???? Stopped: [ pcmk01-cr pcmk02-cr ] >> ?Clone Set: gluster_data-clone [gluster_data] >> ???? Stopped: [ pcmk01-cr pcmk02-cr ] >> ?Clone Set: gluster_vol-clone [gluster_vol] >> ???? gluster_vol??????? (ocf::glusterfs:volume):??????? >> FAILED pcmk01-cr (blocked) >> ???? Stopped: [ pcmk02-cr ] >> >> Failed Actions: >> * gluster_data_start_0 on pcmk01-cr 'not configured' (6): >> call=72, status=complete, exitreason='DANGER! xfs on >> /dev/cl/lv_drbd is NOT cluster-aware!', >> ??? last-rc-change='Thu Dec? 7 13:09:28 2017', >> queued=0ms, exec=250ms >> * gluster_vol_stop_0 on pcmk01-cr 'unknown error' (1): >> call=60, status=Timed Out, exitreason='none', >> ??? last-rc-change='Thu Dec? 7 12:55:11 2017', >> queued=0ms, exec=20004ms >> >> >> Daemon Status: >> ? corosync: active/enabled >> ? pacemaker: active/enabled >> ? pcsd: active/enabled >> >> 1. The data mount can't be created? Why? >> 2. Why is there a volume "stop" command being attempted, >> and why does it fail? >> 3. Why is any of this happening in standby? I can't have >> the resources failing before I've even made the node >> live! I could understand why a gluster_vol start >> operation would fail when glusterd is (correctly) >> stopped, but why is there a *stop* operation? And why >> does that make the resource "blocked"? >> >> Given the above steps, is there something fundamental I'm >> missing about how these resource agents should be used? >> How do *you* configure GlusterFS on Pacemaker? >> >> Any advice appreciated. >> >> Best regards >> >> >> * https://bugzilla.redhat.com/show_bug.cgi?id=1233344 >> >> >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://lists.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20171208/adc18bd9/attachment.html>