thr3ads.net - Gluster users - [Gluster-users] GlusterFS, Pacemaker, OCF resource agents on CentOS 7 [Dec 2017]

If this information is useful, please help other people find it:
Share via:

Tomalak Geret'kal

2017-Dec-07 13:29 UTC

[Gluster-users] GlusterFS, Pacemaker, OCF resource agents on CentOS 7

Hi guys

I'm wondering if anyone here is using the GlusterFS OCF
resource agents with Pacemaker on CentOS 7?

yum install centos-release-gluster
yum install glusterfs-server glusterfs-resource-agents

The reason I ask is that there seem to be a few problems
with them on 3.10, but these problems are so severe that I'm
struggling to believe I'm not just doing something wrong.

I created my brick (on a volume previously used for DRBD,
thus its name):

mkfs.xfs /dev/cl/lv_drbd -f
mkdir -p /gluster/test_brick
mount -t xfs /dev/cl/lv_drbd /gluster

And then my volume (enabling clients to mount it via NFS):

systemctl start glusterd
gluster volume create logs replica 2 transport tcp
pcmk01-drbd:/gluster/test_brick pcmk02-drbd:/gluster/test_brick
gluster volume start test_logs
gluster volume set test_logs nfs.disable off

And here's where the fun starts.

Firstly, we need to work around bug 1233344* (which was
closed when 3.7 went end-of-life but still seems valid in 3.10):

sed -i
's#voldir="/etc/glusterd/vols/${OCF_RESKEY_volname}"#voldir="/var/lib/glusterd/vols/${OCF_RESKEY_volname}"#'
/usr/lib/ocf/resource.d/glusterfs/volume

With that done, I [attempt to] stop GlusterFS so it can be
brought under Pacemaker control:

systemctl stop glusterfsd
systemctl stop glusterd
umount /gluster

(I usually have to manually kill glusterfs processes at this
point before the unmount works - why does the systemctl stop
not do it?)

With the node in standby (just one is online in this
example, but another is configured), I then set up the
resources:

pcs node standby
pcs resource create gluster_data ocf:heartbeat:Filesystem
device="/dev/cl/lv_drbd" directory="/gluster"
fstype="xfs"
pcs resource create glusterd ocf:glusterfs:glusterd
pcs resource create gluster_vol ocf:glusterfs:volume
volname="test_logs"
pcs resource create test_logs ocf:heartbeat:Filesystem \
??? device="localhost:/test_logs" directory="/var/log/test"
fstype="nfs" \
???
options="vers=3,tcp,nolock,context=system_u:object_r:httpd_sys_content_t:s0"
\
??? op monitor OCF_CHECK_LEVEL="20"
pcs resource clone glusterd
pcs resource clone gluster_data
pcs resource clone gluster_vol ordered=true
pcs constraint order start gluster_data-clone then start
glusterd-clone
pcs constraint order start glusterd-clone then start
gluster_vol-clone
pcs constraint order start gluster_vol-clone then start
test_logs
pcs constraint colocation add test_logs with FloatingIp INFINITY

(note the SELinux wrangling - this is because I have a CGI
web application which will later need to read files from the
/var/log/test mount)

At this point, even with the node in standby, it's /already/
failing:

[root at pcmk01 ~]# pcs status
Cluster name: test_cluster
Stack: corosync
Current DC: pcmk01-cr (version 1.1.15-11.el7_3.5-e174ec8) -
partition WITHOUT quorum
Last updated: Thu Dec? 7 13:20:41 2017????????? Last change:
Thu Dec? 7 13:09:33 2017 by root via crm_attribute on pcmk01-cr

2 nodes and 13 resources configured

Online: [ pcmk01-cr ]
OFFLINE: [ pcmk02-cr ]

Full list of resources:

?FloatingIp???? (ocf::heartbeat:IPaddr2):?????? Started
pcmk01-cr
?test_logs????? (ocf::heartbeat:Filesystem):??? Stopped
?Clone Set: glusterd-clone [glusterd]
???? Stopped: [ pcmk01-cr pcmk02-cr ]
?Clone Set: gluster_data-clone [gluster_data]
???? Stopped: [ pcmk01-cr pcmk02-cr ]
?Clone Set: gluster_vol-clone [gluster_vol]
???? gluster_vol??????? (ocf::glusterfs:volume):???????
FAILED pcmk01-cr (blocked)
???? Stopped: [ pcmk02-cr ]

Failed Actions:
* gluster_data_start_0 on pcmk01-cr 'not configured' (6):
call=72, status=complete, exitreason='DANGER! xfs on
/dev/cl/lv_drbd is NOT cluster-aware!',
??? last-rc-change='Thu Dec? 7 13:09:28 2017', queued=0ms,
exec=250ms
* gluster_vol_stop_0 on pcmk01-cr 'unknown error' (1):
call=60, status=Timed Out, exitreason='none',
??? last-rc-change='Thu Dec? 7 12:55:11 2017', queued=0ms,
exec=20004ms



Daemon Status:
? corosync: active/enabled
? pacemaker: active/enabled
? pcsd: active/enabled

1. The data mount can't be created? Why?
2. Why is there a volume "stop" command being attempted, and
why does it fail?
3. Why is any of this happening in standby? I can't have the
resources failing before I've even made the node live! I
could understand why a gluster_vol start operation would
fail when glusterd is (correctly) stopped, but why is there
a *stop* operation? And why does that make the resource
"blocked"?

Given the above steps, is there something fundamental I'm
missing about how these resource agents should be used? How
do *you* configure GlusterFS on Pacemaker?

Any advice appreciated.

Best regards


* https://bugzilla.redhat.com/show_bug.cgi?id=1233344


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20171207/51ab4c9a/attachment.html>

Hetz Ben Hamo

2017-Dec-07 13:47 UTC

head link

[Gluster-users] GlusterFS, Pacemaker, OCF resource agents on CentOS 7

>
> With the node in standby (just one is online in this example, but another
> is configured), I then set up the resources:
>
> pcs node standby
> pcs resource create gluster_data ocf:heartbeat:Filesystem
> device="/dev/cl/lv_drbd" directory="/gluster"
fstype="xfs"
> pcs resource create glusterd ocf:glusterfs:glusterd
> pcs resource create gluster_vol ocf:glusterfs:volume
volname="test_logs"
> pcs resource create test_logs ocf:heartbeat:Filesystem \
>     device="localhost:/test_logs"
directory="/var/log/test" fstype="nfs" \
>    
options="vers=3,tcp,nolock,context=system_u:object_r:httpd_sys_content_t:s0"
> \
>     op monitor OCF_CHECK_LEVEL="20"
> pcs resource clone glusterd
> pcs resource clone gluster_data
> pcs resource clone gluster_vol ordered=true
> pcs constraint order start gluster_data-clone then start glusterd-clone
> pcs constraint order start glusterd-clone then start gluster_vol-clone
> pcs constraint order start gluster_vol-clone then start test_logs
> pcs constraint colocation add test_logs with FloatingIp INFINITY
>
>
> Out of curiosity,  did you write it or you found those commands somewhereelse?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20171207/fd16e8ab/attachment.html>

Tomalak Geret'kal

2017-Dec-07 13:48 UTC

head link

[Gluster-users] GlusterFS, Pacemaker, OCF resource agents on CentOS 7

On 07/12/2017 13:47, Hetz Ben Hamo wrote:> Out of curiosity,? did you write it or you found those
> commands somewhere else?
I wrote it, using what little experience of pcs I have so
far - haven't actually been able to find any documented
steps/instructions. :(

Although this blog -
http://www.tomvernon.co.uk/blog/2015/01/gluster-activepassive-cluster/
- suggested I was on the right path.

Cheers


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20171207/2dd232e6/attachment.html>

Jiffin Tony Thottan

2017-Dec-08 10:17 UTC

head link

[Gluster-users] GlusterFS, Pacemaker, OCF resource agents on CentOS 7

Hi,

Can u please explain for what purpose pacemaker cluster used here?

Regards,

Jiffin


On Thursday 07 December 2017 06:59 PM, Tomalak Geret'kal
wrote:>
> Hi guys
>
> I'm wondering if anyone here is using the GlusterFS OCF resource 
> agents with Pacemaker on CentOS 7?
>
> yum install centos-release-gluster
> yum install glusterfs-server glusterfs-resource-agents
>
> The reason I ask is that there seem to be a few problems with them on 
> 3.10, but these problems are so severe that I'm struggling to believe 
> I'm not just doing something wrong.
>
> I created my brick (on a volume previously used for DRBD, thus its name):
>
> mkfs.xfs /dev/cl/lv_drbd -f
> mkdir -p /gluster/test_brick
> mount -t xfs /dev/cl/lv_drbd /gluster
>
> And then my volume (enabling clients to mount it via NFS):
>
> systemctl start glusterd
> gluster volume create logs replica 2 transport tcp 
> pcmk01-drbd:/gluster/test_brick pcmk02-drbd:/gluster/test_brick
> gluster volume start test_logs
> gluster volume set test_logs nfs.disable off
>
> And here's where the fun starts.
>
> Firstly, we need to work around bug 1233344* (which was closed when 
> 3.7 went end-of-life but still seems valid in 3.10):
>
> sed -i 
>
's#voldir="/etc/glusterd/vols/${OCF_RESKEY_volname}"#voldir="/var/lib/glusterd/vols/${OCF_RESKEY_volname}"#'
> /usr/lib/ocf/resource.d/glusterfs/volume
>
> With that done, I [attempt to] stop GlusterFS so it can be brought 
> under Pacemaker control:
>
> systemctl stop glusterfsd
> systemctl stop glusterd
> umount /gluster
>
> (I usually have to manually kill glusterfs processes at this point 
> before the unmount works - why does the systemctl stop not do it?)
>
> With the node in standby (just one is online in this example, but 
> another is configured), I then set up the resources:
>
> pcs node standby
> pcs resource create gluster_data ocf:heartbeat:Filesystem 
> device="/dev/cl/lv_drbd" directory="/gluster"
fstype="xfs"
> pcs resource create glusterd ocf:glusterfs:glusterd
> pcs resource create gluster_vol ocf:glusterfs:volume
volname="test_logs"
> pcs resource create test_logs ocf:heartbeat:Filesystem \
> ??? device="localhost:/test_logs"
directory="/var/log/test" fstype="nfs" \
>
options="vers=3,tcp,nolock,context=system_u:object_r:httpd_sys_content_t:s0"
> \
> ??? op monitor OCF_CHECK_LEVEL="20"
> pcs resource clone glusterd
> pcs resource clone gluster_data
> pcs resource clone gluster_vol ordered=true
> pcs constraint order start gluster_data-clone then start glusterd-clone
> pcs constraint order start glusterd-clone then start gluster_vol-clone
> pcs constraint order start gluster_vol-clone then start test_logs
> pcs constraint colocation add test_logs with FloatingIp INFINITY
>
> (note the SELinux wrangling - this is because I have a CGI web 
> application which will later need to read files from the /var/log/test 
> mount)
>
> At this point, even with the node in standby, it's /already/ failing:
>
> [root at pcmk01 ~]# pcs status
> Cluster name: test_cluster
> Stack: corosync
> Current DC: pcmk01-cr (version 1.1.15-11.el7_3.5-e174ec8) - partition 
> WITHOUT quorum
> Last updated: Thu Dec? 7 13:20:41 2017????????? Last change: Thu Dec? 
> 7 13:09:33 2017 by root via crm_attribute on pcmk01-cr
>
> 2 nodes and 13 resources configured
>
> Online: [ pcmk01-cr ]
> OFFLINE: [ pcmk02-cr ]
>
> Full list of resources:
>
> ?FloatingIp???? (ocf::heartbeat:IPaddr2):?????? Started pcmk01-cr
> ?test_logs????? (ocf::heartbeat:Filesystem):??? Stopped
> ?Clone Set: glusterd-clone [glusterd]
> ???? Stopped: [ pcmk01-cr pcmk02-cr ]
> ?Clone Set: gluster_data-clone [gluster_data]
> ???? Stopped: [ pcmk01-cr pcmk02-cr ]
> ?Clone Set: gluster_vol-clone [gluster_vol]
> ???? gluster_vol??????? (ocf::glusterfs:volume): FAILED pcmk01-cr 
> (blocked)
> ???? Stopped: [ pcmk02-cr ]
>
> Failed Actions:
> * gluster_data_start_0 on pcmk01-cr 'not configured' (6): call=72, 
> status=complete, exitreason='DANGER! xfs on /dev/cl/lv_drbd is NOT 
> cluster-aware!',
> ??? last-rc-change='Thu Dec? 7 13:09:28 2017', queued=0ms,
exec=250ms
> * gluster_vol_stop_0 on pcmk01-cr 'unknown error' (1): call=60, 
> status=Timed Out, exitreason='none',
> ??? last-rc-change='Thu Dec? 7 12:55:11 2017', queued=0ms,
exec=20004ms
>
>
> Daemon Status:
> ? corosync: active/enabled
> ? pacemaker: active/enabled
> ? pcsd: active/enabled
>
> 1. The data mount can't be created? Why?
> 2. Why is there a volume "stop" command being attempted, and why
does
> it fail?
> 3. Why is any of this happening in standby? I can't have the resources 
> failing before I've even made the node live! I could understand why a 
> gluster_vol start operation would fail when glusterd is (correctly) 
> stopped, but why is there a *stop* operation? And why does that make 
> the resource "blocked"?
>
> Given the above steps, is there something fundamental I'm missing 
> about how these resource agents should be used? How do *you* configure 
> GlusterFS on Pacemaker?
>
> Any advice appreciated.
>
> Best regards
>
>
> * https://bugzilla.redhat.com/show_bug.cgi?id=1233344
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20171208/c2780cd1/attachment.html>

Tomalak Geret'kal

2017-Dec-08 10:55 UTC

head link

[Gluster-users] GlusterFS, Pacemaker, OCF resource agents on CentOS 7

Hi Jiffin

Pacemaker clusters allow us to effectively distribute
services across multiple computers.
In my case, I am creating an active-passive cluster for my
software, and my software relies on Apache, MySQL and
GlusterFS. Thus, I want GlusterFS to be controlled by
Pacemaker so that:

1. A node can be deemed "bad" if GlusterFS is not running
(using constraints to prohibit failover to a bad node)
2. The GlusterFS volume can be automatically mounted on
whatever's the active node
3. Services all go into standby together

Is this not the recommended approach? What else should I do?

Thanks


On 08/12/2017 10:17, Jiffin Tony Thottan wrote:>
> Hi,
>
> Can u please explain for what purpose pacemaker cluster
> used here?
>
> Regards,
>
> Jiffin
>
>
> On Thursday 07 December 2017 06:59 PM, Tomalak Geret'kal
> wrote:
>>
>> Hi guys
>>
>> I'm wondering if anyone here is using the GlusterFS OCF
>> resource agents with Pacemaker on CentOS 7?
>>
>> yum install centos-release-gluster
>> yum install glusterfs-server glusterfs-resource-agents
>>
>> The reason I ask is that there seem to be a few problems
>> with them on 3.10, but these problems are so severe that
>> I'm struggling to believe I'm not just doing something wrong.
>>
>> I created my brick (on a volume previously used for DRBD,
>> thus its name):
>>
>> mkfs.xfs /dev/cl/lv_drbd -f
>> mkdir -p /gluster/test_brick
>> mount -t xfs /dev/cl/lv_drbd /gluster
>>
>> And then my volume (enabling clients to mount it via NFS):
>>
>> systemctl start glusterd
>> gluster volume create logs replica 2 transport tcp
>> pcmk01-drbd:/gluster/test_brick
>> pcmk02-drbd:/gluster/test_brick
>> gluster volume start test_logs
>> gluster volume set test_logs nfs.disable off
>>
>> And here's where the fun starts.
>>
>> Firstly, we need to work around bug 1233344* (which was
>> closed when 3.7 went end-of-life but still seems valid in
>> 3.10):
>>
>> sed -i
>>
's#voldir="/etc/glusterd/vols/${OCF_RESKEY_volname}"#voldir="/var/lib/glusterd/vols/${OCF_RESKEY_volname}"#'
>> /usr/lib/ocf/resource.d/glusterfs/volume
>>
>> With that done, I [attempt to] stop GlusterFS so it can
>> be brought under Pacemaker control:
>>
>> systemctl stop glusterfsd
>> systemctl stop glusterd
>> umount /gluster
>>
>> (I usually have to manually kill glusterfs processes at
>> this point before the unmount works - why does the
>> systemctl stop not do it?)
>>
>> With the node in standby (just one is online in this
>> example, but another is configured), I then set up the
>> resources:
>>
>> pcs node standby
>> pcs resource create gluster_data ocf:heartbeat:Filesystem
>> device="/dev/cl/lv_drbd" directory="/gluster"
fstype="xfs"
>> pcs resource create glusterd ocf:glusterfs:glusterd
>> pcs resource create gluster_vol ocf:glusterfs:volume
>> volname="test_logs"
>> pcs resource create test_logs ocf:heartbeat:Filesystem \
>> ??? device="localhost:/test_logs"
>> directory="/var/log/test" fstype="nfs" \
>> ???
>>
options="vers=3,tcp,nolock,context=system_u:object_r:httpd_sys_content_t:s0"
>> \
>> ??? op monitor OCF_CHECK_LEVEL="20"
>> pcs resource clone glusterd
>> pcs resource clone gluster_data
>> pcs resource clone gluster_vol ordered=true
>> pcs constraint order start gluster_data-clone then start
>> glusterd-clone
>> pcs constraint order start glusterd-clone then start
>> gluster_vol-clone
>> pcs constraint order start gluster_vol-clone then start
>> test_logs
>> pcs constraint colocation add test_logs with FloatingIp
>> INFINITY
>>
>> (note the SELinux wrangling - this is because I have a
>> CGI web application which will later need to read files
>> from the /var/log/test mount)
>>
>> At this point, even with the node in standby, it's
>> /already/ failing:
>>
>> [root at pcmk01 ~]# pcs status
>> Cluster name: test_cluster
>> Stack: corosync
>> Current DC: pcmk01-cr (version 1.1.15-11.el7_3.5-e174ec8)
>> - partition WITHOUT quorum
>> Last updated: Thu Dec? 7 13:20:41 2017????????? Last
>> change: Thu Dec? 7 13:09:33 2017 by root via
>> crm_attribute on pcmk01-cr
>>
>> 2 nodes and 13 resources configured
>>
>> Online: [ pcmk01-cr ]
>> OFFLINE: [ pcmk02-cr ]
>>
>> Full list of resources:
>>
>> ?FloatingIp???? (ocf::heartbeat:IPaddr2):?????? Started
>> pcmk01-cr
>> ?test_logs????? (ocf::heartbeat:Filesystem):??? Stopped
>> ?Clone Set: glusterd-clone [glusterd]
>> ???? Stopped: [ pcmk01-cr pcmk02-cr ]
>> ?Clone Set: gluster_data-clone [gluster_data]
>> ???? Stopped: [ pcmk01-cr pcmk02-cr ]
>> ?Clone Set: gluster_vol-clone [gluster_vol]
>> ???? gluster_vol??????? (ocf::glusterfs:volume):???????
>> FAILED pcmk01-cr (blocked)
>> ???? Stopped: [ pcmk02-cr ]
>>
>> Failed Actions:
>> * gluster_data_start_0 on pcmk01-cr 'not configured' (6):
>> call=72, status=complete, exitreason='DANGER! xfs on
>> /dev/cl/lv_drbd is NOT cluster-aware!',
>> ??? last-rc-change='Thu Dec? 7 13:09:28 2017',
>> queued=0ms, exec=250ms
>> * gluster_vol_stop_0 on pcmk01-cr 'unknown error' (1):
>> call=60, status=Timed Out, exitreason='none',
>> ??? last-rc-change='Thu Dec? 7 12:55:11 2017',
>> queued=0ms, exec=20004ms
>>
>>
>> Daemon Status:
>> ? corosync: active/enabled
>> ? pacemaker: active/enabled
>> ? pcsd: active/enabled
>>
>> 1. The data mount can't be created? Why?
>> 2. Why is there a volume "stop" command being attempted,
>> and why does it fail?
>> 3. Why is any of this happening in standby? I can't have
>> the resources failing before I've even made the node
>> live! I could understand why a gluster_vol start
>> operation would fail when glusterd is (correctly)
>> stopped, but why is there a *stop* operation? And why
>> does that make the resource "blocked"?
>>
>> Given the above steps, is there something fundamental I'm
>> missing about how these resource agents should be used?
>> How do *you* configure GlusterFS on Pacemaker?
>>
>> Any advice appreciated.
>>
>> Best regards
>>
>>
>> * https://bugzilla.redhat.com/show_bug.cgi?id=1233344
>>
>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20171208/adc18bd9/attachment.html>

Possibly Parallel Threads

Search for more possibly parallel threads

Gluster users - Dec 2017 - GlusterFS, Pacemaker, OCF resource agents on CentOS 7

[Gluster-users] GlusterFS, Pacemaker, OCF resource agents on CentOS 7

[Gluster-users] GlusterFS, Pacemaker, OCF resource agents on CentOS 7

[Gluster-users] GlusterFS, Pacemaker, OCF resource agents on CentOS 7

[Gluster-users] GlusterFS, Pacemaker, OCF resource agents on CentOS 7

[Gluster-users] GlusterFS, Pacemaker, OCF resource agents on CentOS 7

Possibly Parallel Threads