thr3ads.net - Gluster users - [Gluster-users] Self-heal doesn't appear to be happening [Mar 2015]

If this information is useful, please help other people find it:
Share via:

Jonathan Heese

2015-Mar-15 18:16 UTC

[Gluster-users] Self-heal doesn't appear to be happening

Hello all,


I have a 2 node 2 brick replicate gluster volume that I'm having trouble
making fault tolerant (a seemingly basic feature!) under CentOS 6.6 using EPEL
packages.


Both nodes are as close to identical hardware and software as possible, and
I'm running the following packages:

glusterfs-rdma-3.6.2-1.el6.x86_64
glusterfs-fuse-3.6.2-1.el6.x86_64
glusterfs-libs-3.6.2-1.el6.x86_64
glusterfs-cli-3.6.2-1.el6.x86_64
glusterfs-api-3.6.2-1.el6.x86_64
glusterfs-server-3.6.2-1.el6.x86_64
glusterfs-3.6.2-1.el6.x86_64


They both have dual-port Mellanox 20Gbps InfiniBand cards with a straight (i.e.
"crossover") cable and opensm to facilitate the RDMA transport between
them.


Here are some data dumps to set the stage (and yes, the output of these commands
looks the same on both nodes):


[root at duchess ~]# gluster volume info

Volume Name: gluster_disk
Type: Replicate
Volume ID: b1279e22-8589-407b-8671-3760f42e93e4
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: rdma
Bricks:
Brick1: duke-ib:/bricks/brick1
Brick2: duchess-ib:/bricks/brick1


[root at duchess ~]# gluster volume status
Status of volume: gluster_disk
Gluster process                                         Port    Online  Pid
------------------------------------------------------------------------------
Brick duke-ib:/bricks/brick1                            49153   Y       9594
Brick duchess-ib:/bricks/brick1                         49153   Y       9583
NFS Server on localhost                                 2049    Y       9590
Self-heal Daemon on localhost                           N/A     Y       9597
NFS Server on 10.10.10.1                                2049    Y       9607
Self-heal Daemon on 10.10.10.1                          N/A     Y       9614

Task Status of Volume gluster_disk
------------------------------------------------------------------------------
There are no active volume tasks


[root at duchess ~]# gluster peer status
Number of Peers: 1

Hostname: 10.10.10.1
Uuid: aca56ec5-94bb-4bb0-8a9e-b3d134bbfe7b
State: Peer in Cluster (Connected)


So before putting any real data on these guys (the data will eventually be a
handful of large image files backing an iSCSI target via tgtd for ESXi
datastores), I wanted to simulate the failure of one of the nodes. So I stopped
glusterfsd and glusterd on duchess, waited about 5 minutes, then started them
back up again, tail'ing /var/log/glusterfs/* and /var/log/messages. I'm
not sure exactly what I'm looking for, but the logs quieted down after just
a minute or so of restarting the daemons. I didn't see much indicating that
self-healing was going on.


Every now and then (and seemingly more often than not), when I run "gluster
volume heal gluster_disk info", I get no output from the command, and the
following dumps into my /var/log/messages:


Mar 15 13:59:16 duchess kernel: glfsheal[10365]: segfault at 7ff56068d020 ip
00007ff54f366d80 sp 00007ff54e22adf8 error 6 in
libmthca-rdmav2.so[7ff54f365000+7000]
Mar 15 13:59:17 duchess abrtd: Directory
'ccpp-2015-03-15-13:59:16-10359' creation detected
Mar 15 13:59:17 duchess abrt[10368]: Saved core dump of pid 10359
(/usr/sbin/glfsheal) to /var/spool/abrt/ccpp-2015-03-15-13:59:16-10359
(225595392 bytes)
Mar 15 13:59:25 duchess abrtd: Package 'glusterfs-server' isn't
signed with proper key
Mar 15 13:59:25 duchess abrtd: 'post-create' on
'/var/spool/abrt/ccpp-2015-03-15-13:59:16-10359' exited with 1
Mar 15 13:59:25 duchess abrtd: Deleting problem directory
'/var/spool/abrt/ccpp-2015-03-15-13:59:16-10359'


Other times, when I'm lucky, I get messages from the "heal info"
command indicating that datastore1.img (the file that I intentionally changed
while duchess was offline) is in need of healing:


[root at duke ~]# gluster volume heal gluster_disk info
Brick duke.jonheese.local:/bricks/brick1/
/datastore1.img - Possibly undergoing heal

Number of entries: 1

Brick duchess.jonheese.local:/bricks/brick1/
/datastore1.img - Possibly undergoing heal

Number of entries: 1


But watching df on the bricks and tailing glustershd.log doesn't seem to
indicate that anything is actually happening -- and df indicates that brick on
duke *is* different in file size from the brick on duchess. It's been over
an hour now, and I'm not confident that the selfheal functionality is even
working at all... Nor do I know how to do anything about it!


Also, I find it a little bit troubling that I'm using the aliases (in
/etc/hosts on both servers) duke-ib and duchess-ib for the gluster node
configuration, but the "heal info" command refers to my nodes with
their internal FQDNs, which resolve to their 1Gbps interface IPs... That
doesn't mean that they're trying to communicate over those interfaces
(the volume is configured with "transport rdma", as you can see
above), does it?


Can anyone throw out any ideas on how I can:

1. Determine whether this is intentional behavior (or a bug?),

2. Determine whether my data has been properly resync'd across the bricks,
and

3. Make it work correctly if not.


Thanks in advance!


Regards,

Jon Heese
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150315/7e6a624f/attachment.html>

Jonathan Heese

2015-Mar-15 18:23 UTC

head link

[Gluster-users] Self-heal doesn't appear to be happening

A couple more notes:


I should explicitly mention that I did make a very minor change (touched a file)
inside datastore1.img while duchess was offline, so I know that there were at
least a few bits that would've need to be resynced across the cluster after
it rejoined.


Also, the reason I'm so paranoid about this is that this is actually the
second time I've configured this particular cluster, because the first time
it all went down in flames (along with about 6 months of work on some VMs) after
I rebooted the Gluster boxes, in series, and apparently didn't wait long
enough between reboots (or confirm that the self-heal process completed first). 
So now I'm doing my due diligence on this.


Thanks again.


Regards,

Jon Heese


________________________________
From: gluster-users-bounces at gluster.org <gluster-users-bounces at
gluster.org> on behalf of Jonathan Heese <jheese at inetu.net>
Sent: Sunday, March 15, 2015 2:16 PM
To: gluster-users at gluster.org
Subject: [Gluster-users] Self-heal doesn't appear to be happening


Hello all,


I have a 2 node 2 brick replicate gluster volume that I'm having trouble
making fault tolerant (a seemingly basic feature!) under CentOS 6.6 using EPEL
packages.


Both nodes are as close to identical hardware and software as possible, and
I'm running the following packages:

glusterfs-rdma-3.6.2-1.el6.x86_64
glusterfs-fuse-3.6.2-1.el6.x86_64
glusterfs-libs-3.6.2-1.el6.x86_64
glusterfs-cli-3.6.2-1.el6.x86_64
glusterfs-api-3.6.2-1.el6.x86_64
glusterfs-server-3.6.2-1.el6.x86_64
glusterfs-3.6.2-1.el6.x86_64


They both have dual-port Mellanox 20Gbps InfiniBand cards with a straight (i.e.
"crossover") cable and opensm to facilitate the RDMA transport between
them.


Here are some data dumps to set the stage (and yes, the output of these commands
looks the same on both nodes):


[root at duchess ~]# gluster volume info

Volume Name: gluster_disk
Type: Replicate
Volume ID: b1279e22-8589-407b-8671-3760f42e93e4
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: rdma
Bricks:
Brick1: duke-ib:/bricks/brick1
Brick2: duchess-ib:/bricks/brick1


[root at duchess ~]# gluster volume status
Status of volume: gluster_disk
Gluster process                                         Port    Online  Pid
------------------------------------------------------------------------------
Brick duke-ib:/bricks/brick1                            49153   Y       9594
Brick duchess-ib:/bricks/brick1                         49153   Y       9583
NFS Server on localhost                                 2049    Y       9590
Self-heal Daemon on localhost                           N/A     Y       9597
NFS Server on 10.10.10.1                                2049    Y       9607
Self-heal Daemon on 10.10.10.1                          N/A     Y       9614

Task Status of Volume gluster_disk
------------------------------------------------------------------------------
There are no active volume tasks


[root at duchess ~]# gluster peer status
Number of Peers: 1

Hostname: 10.10.10.1
Uuid: aca56ec5-94bb-4bb0-8a9e-b3d134bbfe7b
State: Peer in Cluster (Connected)


So before putting any real data on these guys (the data will eventually be a
handful of large image files backing an iSCSI target via tgtd for ESXi
datastores), I wanted to simulate the failure of one of the nodes. So I stopped
glusterfsd and glusterd on duchess, waited about 5 minutes, then started them
back up again, tail'ing /var/log/glusterfs/* and /var/log/messages. I'm
not sure exactly what I'm looking for, but the logs quieted down after just
a minute or so of restarting the daemons. I didn't see much indicating that
self-healing was going on.


Every now and then (and seemingly more often than not), when I run "gluster
volume heal gluster_disk info", I get no output from the command, and the
following dumps into my /var/log/messages:


Mar 15 13:59:16 duchess kernel: glfsheal[10365]: segfault at 7ff56068d020 ip
00007ff54f366d80 sp 00007ff54e22adf8 error 6 in
libmthca-rdmav2.so[7ff54f365000+7000]
Mar 15 13:59:17 duchess abrtd: Directory
'ccpp-2015-03-15-13:59:16-10359' creation detected
Mar 15 13:59:17 duchess abrt[10368]: Saved core dump of pid 10359
(/usr/sbin/glfsheal) to /var/spool/abrt/ccpp-2015-03-15-13:59:16-10359
(225595392 bytes)
Mar 15 13:59:25 duchess abrtd: Package 'glusterfs-server' isn't
signed with proper key
Mar 15 13:59:25 duchess abrtd: 'post-create' on
'/var/spool/abrt/ccpp-2015-03-15-13:59:16-10359' exited with 1
Mar 15 13:59:25 duchess abrtd: Deleting problem directory
'/var/spool/abrt/ccpp-2015-03-15-13:59:16-10359'


Other times, when I'm lucky, I get messages from the "heal info"
command indicating that datastore1.img (the file that I intentionally changed
while duchess was offline) is in need of healing:


[root at duke ~]# gluster volume heal gluster_disk info
Brick duke.jonheese.local:/bricks/brick1/
/datastore1.img - Possibly undergoing heal

Number of entries: 1

Brick duchess.jonheese.local:/bricks/brick1/
/datastore1.img - Possibly undergoing heal

Number of entries: 1


But watching df on the bricks and tailing glustershd.log doesn't seem to
indicate that anything is actually happening -- and df indicates that brick on
duke *is* different in file size from the brick on duchess. It's been over
an hour now, and I'm not confident that the selfheal functionality is even
working at all... Nor do I know how to do anything about it!


Also, I find it a little bit troubling that I'm using the aliases (in
/etc/hosts on both servers) duke-ib and duchess-ib for the gluster node
configuration, but the "heal info" command refers to my nodes with
their internal FQDNs, which resolve to their 1Gbps interface IPs... That
doesn't mean that they're trying to communicate over those interfaces
(the volume is configured with "transport rdma", as you can see
above), does it?


Can anyone throw out any ideas on how I can:

1. Determine whether this is intentional behavior (or a bug?),

2. Determine whether my data has been properly resync'd across the bricks,
and

3. Make it work correctly if not.


Thanks in advance!


Regards,

Jon Heese
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150315/2500d482/attachment.html>

Joe Julian

2015-Mar-15 19:39 UTC

head link

[Gluster-users] Self-heal doesn't appear to be happening

On 03/15/2015 11:16 AM, Jonathan Heese wrote:>
> Hello all,
>
>
> I have a 2 node 2 brick replicate gluster volume that I'm having 
> trouble making fault tolerant (a seemingly basic feature!) under 
> CentOS 6.6 using EPEL packages.
>
>
> Both nodes are as close to identical hardware and software as 
> possible, and I'm running the following packages:
>
> glusterfs-rdma-3.6.2-1.el6.x86_64
> glusterfs-fuse-3.6.2-1.el6.x86_64
> glusterfs-libs-3.6.2-1.el6.x86_64
> glusterfs-cli-3.6.2-1.el6.x86_64
> glusterfs-api-3.6.2-1.el6.x86_64
> glusterfs-server-3.6.2-1.el6.x86_64
> glusterfs-3.6.2-1.el6.x86_64
>3.6.2 is not considered production stable. Based on your expressed 
concern, you should probably be running 3.5.3.>
>
> They both have dual-port Mellanox 20Gbps InfiniBand cards with a 
> straight (i.e. "crossover") cable and opensm to facilitate the
RDMA
> transport between them.
>
>
> Here are some data dumps to set the stage (and yes, the output of 
> these commands looks the same on both nodes):
>
>
> [root at duchess ~]# gluster volume info
>
> Volume Name: gluster_disk
> Type: Replicate
> Volume ID: b1279e22-8589-407b-8671-3760f42e93e4
> Status: Started
> Number of Bricks: 1 x 2 = 2
> Transport-type: rdma
> Bricks:
> Brick1: duke-ib:/bricks/brick1
> Brick2: duchess-ib:/bricks/brick1
>
>
> [root at duchess ~]# gluster volume status
> Status of volume: gluster_disk
> Gluster process Port    Online  Pid
>
------------------------------------------------------------------------------
> Brick duke-ib:/bricks/brick1 49153   Y       9594
> Brick duchess-ib:/bricks/brick1 49153   Y       9583
> NFS Server on localhost 2049    Y       9590
> Self-heal Daemon on localhost N/A     Y       9597
> NFS Server on 10.10.10.1 2049    Y       9607
> Self-heal Daemon on 10.10.10.1 N/A     Y       9614
>
> Task Status of Volume gluster_disk
>
------------------------------------------------------------------------------
> There are no active volume tasks
>
>
> [root at duchess ~]# gluster peer status
> Number of Peers: 1
>
> Hostname: 10.10.10.1
> Uuid: aca56ec5-94bb-4bb0-8a9e-b3d134bbfe7b
> State: Peer in Cluster (Connected)
>
>
> So before putting any real data on these guys (the data will 
> eventually be a handful of large image files backing an iSCSI target 
> via tgtd for ESXi datastores), I wanted to simulate the failure of one 
> of the nodes. So I stopped glusterfsd and glusterd on duchess, waited 
> about 5 minutes, then started them back up again, tail'ing 
> /var/log/glusterfs/* and /var/log/messages. I'm not sure exactly what 
> I'm looking for, but the logs quieted down after just a minute or so 
> of restarting the daemons. I didn't see much indicating that 
> self-healing was going on.
>
>
> Every now and then (and seemingly more often than not), when I run 
> "gluster volume heal gluster_disk info", I get no output from the
> command, and the following dumps into my /var/log/messages:
>
>
> Mar 15 13:59:16 duchess kernel: glfsheal[10365]: segfault at 
> 7ff56068d020 ip 00007ff54f366d80 sp 00007ff54e22adf8 error 6 in 
> libmthca-rdmav2.so[7ff54f365000+7000]
>This a segfault in the mellanox driver. Please report it to the driver 
developers.>
> Mar 15 13:59:17 duchess abrtd: Directory 
> 'ccpp-2015-03-15-13:59:16-10359' creation detected
> Mar 15 13:59:17 duchess abrt[10368]: Saved core dump of pid 10359 
> (/usr/sbin/glfsheal) to /var/spool/abrt/ccpp-2015-03-15-13:59:16-10359 
> (225595392 bytes)
> Mar 15 13:59:25 duchess abrtd: Package 'glusterfs-server' isn't
signed
> with proper key
> Mar 15 13:59:25 duchess abrtd: 'post-create' on 
> '/var/spool/abrt/ccpp-2015-03-15-13:59:16-10359' exited with 1
> Mar 15 13:59:25 duchess abrtd: Deleting problem directory 
> '/var/spool/abrt/ccpp-2015-03-15-13:59:16-10359'
>
> Other times, when I'm lucky, I get messages from the "heal
info"
> command indicating that datastore1.img (the file that I intentionally 
> changed while duchess was offline) is in need of healing:
>
>
> [root at duke ~]# gluster volume heal gluster_disk info
> Brick duke.jonheese.local:/bricks/brick1/
> /datastore1.img - Possibly undergoing heal
>
> Number of entries: 1
>
> Brick duchess.jonheese.local:/bricks/brick1/
> /datastore1.img - Possibly undergoing heal
>
> Number of entries: 1
>
>
> But watching df on the bricks and tailing glustershd.log doesn't seem 
> to indicate that anything is actually happening -- and df indicates 
> that brick on duke *is* different in file size from the brick on 
> duchess. It's been over an hour now, and I'm not confident that the
> selfheal functionality is even working at all... Nor do I know how to 
> do anything about it!
>File sizes are not necessarily any indication. If the changes you made 
were nulls, the change may be sparse. df --apparent is a little better 
indicator. Comparing hashes would be even better.

The extended attributes on the file itself, on the bricks, can tell you 
the heal state. Look at "getfattr -m . -d -e hex $file". The
trusted.afr
attributes, if non-zero, show pending changes destined for the other
server.>
>
> Also, I find it a little bit troubling that I'm using the aliases (in 
> /etc/hosts on both servers) duke-ib and duchess-ib for the gluster 
> node configuration, but the "heal info" command refers to my
nodes
> with their internal FQDNs, which resolve to their 1Gbps interface 
> IPs... That doesn't mean that they're trying to communicate over
those
> interfaces (the volume is configured with "transport rdma", as
you can
> see above), does it?
>
I'd call that a bug. It should report the hostnames as they're listed in
the volume info.>
>
> Can anyone throw out any ideas on how I can:
>
> 1. Determine whether this is intentional behavior (or a bug?),
>
> 2. Determine whether my data has been properly resync'd across the 
> bricks, and
>
> 3. Make it work correctly if not.
>
>
> Thanks in advance!
>
>
> Regards,
>
> Jon Heese
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150315/a228c90c/attachment.html>

Gluster users - Mar 2015 - Self-heal doesn't appear to be happening

[Gluster-users] Self-heal doesn't appear to be happening

[Gluster-users] Self-heal doesn't appear to be happening

[Gluster-users] Self-heal doesn't appear to be happening