thr3ads.net - Gluster users - [Gluster-users] [Gluster-devel] heal hanging [Jan 2016]

If this information is useful, please help other people find it:
Share via:

Pranith Kumar Karampuri

2016-Jan-25 02:27 UTC

[Gluster-users] [Gluster-devel] heal hanging

It seems like there is a lot of finodelk/inodelk traffic. I wonder why 
that is. I think the next steps is to collect statedump of the brick 
which is taking lot of CPU, using "gluster volume statedump
<volname>"

Pranith
On 01/22/2016 08:36 AM, Glomski, Patrick wrote:> Pranith, attached are stack traces collected every second for 20 
> seconds from the high-%cpu glusterfsd process.
>
> Patrick
>
> On Thu, Jan 21, 2016 at 9:46 PM, Glomski, Patrick 
> <patrick.glomski at corvidtec.com <mailto:patrick.glomski at
corvidtec.com>>
> wrote:
>
>     Last entry for get_real_filename on any of the bricks was when we
>     turned off the samba gfapi vfs plugin earlier today:
>
>     /var/log/glusterfs/bricks/data-brick01a-homegfs.log:[2016-01-21
>     15:13:00.008239] E [server-rpc-fops.c:768:server_getxattr_cbk]
>     0-homegfs-server: 105: GETXATTR /wks_backup
>     (40e582d6-b0c7-4099-ba88-9168a3c32ca6)
>     (glusterfs.get_real_filename:desktop.ini) ==> (Permission denied)
>
>     We'll get back to you with those traces when %cpu spikes again. As
>     with most sporadic problems, as soon as you want something out of
>     it, the issue becomes harder to reproduce.
>
>
>     On Thu, Jan 21, 2016 at 9:21 PM, Pranith Kumar Karampuri
>     <pkarampu at redhat.com <mailto:pkarampu at redhat.com>>
wrote:
>
>
>
>         On 01/22/2016 07:25 AM, Glomski, Patrick wrote:
>>         Unfortunately, all samba mounts to the gluster volume through
>>         the gfapi vfs plugin have been disabled for the last 6 hours
>>         or so and frequency of %cpu spikes is increased. We had
>>         switched to sharing a fuse mount through samba, but I just
>>         disabled that as well. There are no samba shares of this
>>         volume now. The spikes now happen every thirty minutes or so.
>>         We've resorted to just rebooting the machine with high load
>>         for the present.
>
>         Could you see if the logs of following type are not at all coming?
>         [2016-01-21 15:13:00.005736] E
>         [server-rpc-fops.c:768:server_getxattr_cbk] 0-homegfs-server:
>         110: GETXATTR /wks_backup (40e582d6-b0c7-4099-ba88-9168a3c
>         32ca6) (glusterfs.get_real_filename:desktop.ini) ==>
>         (Permission denied)
>
>         These are operations that failed. Operations that succeed are
>         the ones that will scan the directory. But I don't have a way
>         to find them other than using tcpdumps.
>
>         At the moment I have 2 theories:
>         1) these get_real_filename calls
>         2) [2016-01-21 16:10:38.017828] E
>         [server-helpers.c:46:gid_resolve] 0-gid-cache: getpwuid_r(494)
>         failed
>         "
>
>         Yessir they are.  Normally, sssd would look to the local cache
>         file in /var/lib/sss/db/ first, to get any group or userid
>         information, then go out to the domain controller.  I put the
>         options that we are using on our GFS volumes below?  Thanks
>         for your help.
>
>         We had been running sssd with sssd_nss and sssd_be
>         sub-processes on these systems for a long time, under the GFS
>         3.5.2 code, and not run into the problem that David described
>         with the high cpu usage on sssd_nss.
>
>         *"
>         *That was Tom Young's email 1.5 years back when we debugged
>         it. But the process which was consuming lot of cpu is
>         sssd_nss. So I am not sure if it is same issue. Let us debug
>         to see '1)' doesn't happen. The gstack traces I asked
for
>         should also help.
>
>
>         Pranith
>>
>>         On Thu, Jan 21, 2016 at 8:49 PM, Pranith Kumar Karampuri
>>         <pkarampu at redhat.com <mailto:pkarampu at
redhat.com>> wrote:
>>
>>
>>
>>             On 01/22/2016 07:13 AM, Glomski, Patrick wrote:
>>>             We use the samba glusterfs virtual filesystem (the
>>>             current version provided on download.gluster.org
>>>             <http://download.gluster.org>), but no windows
clients
>>>             connecting directly.
>>
>>             Hmm.. Is there a way to disable using this and check if
>>             the CPU% still increases? What getxattr of
>>             "glusterfs.get_real_filename <filanme>"
does is to scan
>>             the entire directory looking for
strcasecmp(<filname>,
>>             <scanned-filename>). If anything matches then it will
>>             return the <scanned-filename>. But the problem is the
>>             scan is costly. So I wonder if this is the reason for the
>>             CPU spikes.
>>
>>             Pranith
>>
>>>
>>>             On Thu, Jan 21, 2016 at 8:37 PM, Pranith Kumar
Karampuri
>>>             <pkarampu at redhat.com <mailto:pkarampu at
redhat.com>> wrote:
>>>
>>>                 Do you have any windows clients? I see a lot of
>>>                 getxattr calls for
"glusterfs.get_real_filename"
>>>                 which lead to full readdirs of the directories on
>>>                 the brick.
>>>
>>>                 Pranith
>>>
>>>                 On 01/22/2016 12:51 AM, Glomski, Patrick wrote:
>>>>                 Pranith, could this kind of behavior be
>>>>                 self-inflicted by us deleting files directly
from
>>>>                 the bricks? We have done that in the past to
clean
>>>>                 up an issues where gluster wouldn't allow
us to
>>>>                 delete from the mount.
>>>>
>>>>                 If so, is it feasible to clean them up by
running a
>>>>                 search on the .glusterfs directories directly
and
>>>>                 removing files with a reference count of 1 that
are
>>>>                 non-zero size (or directly checking the xattrs
to
>>>>                 be sure that it's not a DHT link).
>>>>
>>>>                 find /data/brick01a/homegfs/.glusterfs -type f
-not
>>>>                 -empty -links -2 -exec rm -f "{}" \;
>>>>
>>>>                 Is there anything I'm inherently missing
with that
>>>>                 approach that will further corrupt the system?
>>>>
>>>>
>>>>                 On Thu, Jan 21, 2016 at 1:02 PM, Glomski,
Patrick
>>>>                 <patrick.glomski at corvidtec.com
>>>>                 <mailto:patrick.glomski at
corvidtec.com>> wrote:
>>>>
>>>>                     Load spiked again: ~1200%cpu on gfs02a for
>>>>                     glusterfsd. Crawl has been running on one
of
>>>>                     the bricks on gfs02b for 25 min or so and
users
>>>>                     cannot access the volume.
>>>>
>>>>                     I re-listed the xattrop directories as well
as
>>>>                     a 'top' entry and heal statistics.
Then I
>>>>                     restarted the gluster services on gfs02a.
>>>>
>>>>                     =================== top
==================>>>>                     PID USER PR  NI  VIRT RES
SHR S %CPU %MEM
>>>>                     TIME+ COMMAND
>>>>                      8969 root      20 0 2815m 204m 3588 S
1181.0
>>>>                     0.6 591:06.93 glusterfsd
>>>>
>>>>                     =================== xattrop
==================>>>>                    
/data/brick01a/homegfs/.glusterfs/indices/xattrop:
>>>>                    
xattrop-41f19453-91e4-437c-afa9-3b25614de210
>>>>                    
xattrop-9b815879-2f4d-402b-867c-a6d65087788c
>>>>
>>>>                    
/data/brick02a/homegfs/.glusterfs/indices/xattrop:
>>>>                    
xattrop-70131855-3cfb-49af-abce-9d23f57fb393
>>>>                    
xattrop-dfb77848-a39d-4417-a725-9beca75d78c6
>>>>
>>>>                    
/data/brick01b/homegfs/.glusterfs/indices/xattrop:
>>>>                     e6e47ed9-309b-42a7-8c44-28c29b9a20f8
>>>>                    
xattrop-5c797a64-bde7-4eac-b4fc-0befc632e125
>>>>                    
xattrop-38ec65a1-00b5-4544-8a6c-bf0f531a1934
>>>>                    
xattrop-ef0980ad-f074-4163-979f-16d5ef85b0a0
>>>>
>>>>                    
/data/brick02b/homegfs/.glusterfs/indices/xattrop:
>>>>                    
xattrop-7402438d-0ee7-4fcf-b9bb-b561236f99bc
>>>>                    
xattrop-8ffbf5f7-ace3-497d-944e-93ac85241413
>>>>
>>>>                    
/data/brick01a/homegfs/.glusterfs/indices/xattrop:
>>>>                    
xattrop-0115acd0-caae-4dfd-b3b4-7cc42a0ff531
>>>>
>>>>                    
/data/brick02a/homegfs/.glusterfs/indices/xattrop:
>>>>                    
xattrop-7e20fdb1-5224-4b9a-be06-568708526d70
>>>>
>>>>                    
/data/brick01b/homegfs/.glusterfs/indices/xattrop:
>>>>                     8034bc06-92cd-4fa5-8aaf-09039e79d2c8
>>>>                     c9ce22ed-6d8b-471b-a111-b39e57f0b512
>>>>                     94fa1d60-45ad-4341-b69c-315936b51e8d
>>>>                    
xattrop-9c04623a-64ce-4f66-8b23-dbaba49119c7
>>>>
>>>>                    
/data/brick02b/homegfs/.glusterfs/indices/xattrop:
>>>>                    
xattrop-b8c8f024-d038-49a2-9a53-c54ead09111d
>>>>
>>>>
>>>>                     =================== heal stats
==================>>>>
>>>>                     homegfs [b0-gfsib01a] : Starting time of
>>>>                     crawl       : Thu Jan 21 12:36:45 2016
>>>>                     homegfs [b0-gfsib01a] : Ending time of
crawl :
>>>>                     Thu Jan 21 12:36:45 2016
>>>>                     homegfs [b0-gfsib01a] : Type of crawl:
INDEX
>>>>                     homegfs [b0-gfsib01a] : No. of entries
healed : 0
>>>>                     homegfs [b0-gfsib01a] : No. of entries in
>>>>                     split-brain: 0
>>>>                     homegfs [b0-gfsib01a] : No. of heal failed
>>>>                     entries   : 0
>>>>
>>>>                     homegfs [b1-gfsib01b] : Starting time of
>>>>                     crawl       : Thu Jan 21 12:36:19 2016
>>>>                     homegfs [b1-gfsib01b] : Ending time of
crawl :
>>>>                     Thu Jan 21 12:36:19 2016
>>>>                     homegfs [b1-gfsib01b] : Type of crawl:
INDEX
>>>>                     homegfs [b1-gfsib01b] : No. of entries
healed : 0
>>>>                     homegfs [b1-gfsib01b] : No. of entries in
>>>>                     split-brain: 0
>>>>                     homegfs [b1-gfsib01b] : No. of heal failed
>>>>                     entries   : 1
>>>>
>>>>                     homegfs [b2-gfsib01a] : Starting time of
>>>>                     crawl       : Thu Jan 21 12:36:48 2016
>>>>                     homegfs [b2-gfsib01a] : Ending time of
crawl :
>>>>                     Thu Jan 21 12:36:48 2016
>>>>                     homegfs [b2-gfsib01a] : Type of crawl:
INDEX
>>>>                     homegfs [b2-gfsib01a] : No. of entries
healed : 0
>>>>                     homegfs [b2-gfsib01a] : No. of entries in
>>>>                     split-brain: 0
>>>>                     homegfs [b2-gfsib01a] : No. of heal failed
>>>>                     entries   : 0
>>>>
>>>>                     homegfs [b3-gfsib01b] : Starting time of
>>>>                     crawl       : Thu Jan 21 12:36:47 2016
>>>>                     homegfs [b3-gfsib01b] : Ending time of
crawl :
>>>>                     Thu Jan 21 12:36:47 2016
>>>>                     homegfs [b3-gfsib01b] : Type of crawl:
INDEX
>>>>                     homegfs [b3-gfsib01b] : No. of entries
healed : 0
>>>>                     homegfs [b3-gfsib01b] : No. of entries in
>>>>                     split-brain: 0
>>>>                     homegfs [b3-gfsib01b] : No. of heal failed
>>>>                     entries   : 0
>>>>
>>>>                     homegfs [b4-gfsib02a] : Starting time of
>>>>                     crawl       : Thu Jan 21 12:36:06 2016
>>>>                     homegfs [b4-gfsib02a] : Ending time of
crawl :
>>>>                     Thu Jan 21 12:36:06 2016
>>>>                     homegfs [b4-gfsib02a] : Type of crawl:
INDEX
>>>>                     homegfs [b4-gfsib02a] : No. of entries
healed : 0
>>>>                     homegfs [b4-gfsib02a] : No. of entries in
>>>>                     split-brain: 0
>>>>                     homegfs [b4-gfsib02a] : No. of heal failed
>>>>                     entries   : 0
>>>>
>>>>                     homegfs [b5-gfsib02b] : Starting time of
>>>>                     crawl       : Thu Jan 21 12:13:40 2016
>>>>                     homegfs [b5-gfsib02b] : *** Crawl is in
>>>>                     progress ***
>>>>                     homegfs [b5-gfsib02b] : Type of crawl:
INDEX
>>>>                     homegfs [b5-gfsib02b] : No. of entries
healed : 0
>>>>                     homegfs [b5-gfsib02b] : No. of entries in
>>>>                     split-brain: 0
>>>>                     homegfs [b5-gfsib02b] : No. of heal failed
>>>>                     entries   : 0
>>>>
>>>>                     homegfs [b6-gfsib02a] : Starting time of
>>>>                     crawl       : Thu Jan 21 12:36:58 2016
>>>>                     homegfs [b6-gfsib02a] : Ending time of
crawl :
>>>>                     Thu Jan 21 12:36:58 2016
>>>>                     homegfs [b6-gfsib02a] : Type of crawl:
INDEX
>>>>                     homegfs [b6-gfsib02a] : No. of entries
healed : 0
>>>>                     homegfs [b6-gfsib02a] : No. of entries in
>>>>                     split-brain: 0
>>>>                     homegfs [b6-gfsib02a] : No. of heal failed
>>>>                     entries   : 0
>>>>
>>>>                     homegfs [b7-gfsib02b] : Starting time of
>>>>                     crawl       : Thu Jan 21 12:36:50 2016
>>>>                     homegfs [b7-gfsib02b] : Ending time of
crawl :
>>>>                     Thu Jan 21 12:36:50 2016
>>>>                     homegfs [b7-gfsib02b] : Type of crawl:
INDEX
>>>>                     homegfs [b7-gfsib02b] : No. of entries
healed : 0
>>>>                     homegfs [b7-gfsib02b] : No. of entries in
>>>>                     split-brain: 0
>>>>                     homegfs [b7-gfsib02b] : No. of heal failed
>>>>                     entries   : 0
>>>>
>>>>
>>>>                    
=======================================================================================>>>>
I waited a few minutes for the heals to finish
>>>>                     and ran the heal statistics and info again.
one
>>>>                     file is in split-brain. Aside from the
>>>>                     split-brain, the load on all systems is
down
>>>>                     now and they are behaving normally.
>>>>                     glustershd.log is attached. What is going
on???
>>>>
>>>>                     Thu Jan 21 12:53:50 EST 2016
>>>>
>>>>                     =================== homegfs
==================>>>>
>>>>                     homegfs [b0-gfsib01a] : Starting time of
>>>>                     crawl       : Thu Jan 21 12:53:02 2016
>>>>                     homegfs [b0-gfsib01a] : Ending time of
crawl :
>>>>                     Thu Jan 21 12:53:02 2016
>>>>                     homegfs [b0-gfsib01a] : Type of crawl:
INDEX
>>>>                     homegfs [b0-gfsib01a] : No. of entries
healed : 0
>>>>                     homegfs [b0-gfsib01a] : No. of entries in
>>>>                     split-brain: 0
>>>>                     homegfs [b0-gfsib01a] : No. of heal failed
>>>>                     entries   : 0
>>>>
>>>>                     homegfs [b1-gfsib01b] : Starting time of
>>>>                     crawl       : Thu Jan 21 12:53:38 2016
>>>>                     homegfs [b1-gfsib01b] : Ending time of
crawl :
>>>>                     Thu Jan 21 12:53:38 2016
>>>>                     homegfs [b1-gfsib01b] : Type of crawl:
INDEX
>>>>                     homegfs [b1-gfsib01b] : No. of entries
healed : 0
>>>>                     homegfs [b1-gfsib01b] : No. of entries in
>>>>                     split-brain: 0
>>>>                     homegfs [b1-gfsib01b] : No. of heal failed
>>>>                     entries   : 1
>>>>
>>>>                     homegfs [b2-gfsib01a] : Starting time of
>>>>                     crawl       : Thu Jan 21 12:53:04 2016
>>>>                     homegfs [b2-gfsib01a] : Ending time of
crawl :
>>>>                     Thu Jan 21 12:53:04 2016
>>>>                     homegfs [b2-gfsib01a] : Type of crawl:
INDEX
>>>>                     homegfs [b2-gfsib01a] : No. of entries
healed : 0
>>>>                     homegfs [b2-gfsib01a] : No. of entries in
>>>>                     split-brain: 0
>>>>                     homegfs [b2-gfsib01a] : No. of heal failed
>>>>                     entries   : 0
>>>>
>>>>                     homegfs [b3-gfsib01b] : Starting time of
>>>>                     crawl       : Thu Jan 21 12:53:04 2016
>>>>                     homegfs [b3-gfsib01b] : Ending time of
crawl :
>>>>                     Thu Jan 21 12:53:04 2016
>>>>                     homegfs [b3-gfsib01b] : Type of crawl:
INDEX
>>>>                     homegfs [b3-gfsib01b] : No. of entries
healed : 0
>>>>                     homegfs [b3-gfsib01b] : No. of entries in
>>>>                     split-brain: 0
>>>>                     homegfs [b3-gfsib01b] : No. of heal failed
>>>>                     entries   : 0
>>>>
>>>>                     homegfs [b4-gfsib02a] : Starting time of
>>>>                     crawl       : Thu Jan 21 12:53:33 2016
>>>>                     homegfs [b4-gfsib02a] : Ending time of
crawl :
>>>>                     Thu Jan 21 12:53:33 2016
>>>>                     homegfs [b4-gfsib02a] : Type of crawl:
INDEX
>>>>                     homegfs [b4-gfsib02a] : No. of entries
healed : 0
>>>>                     homegfs [b4-gfsib02a] : No. of entries in
>>>>                     split-brain: 0
>>>>                     homegfs [b4-gfsib02a] : No. of heal failed
>>>>                     entries   : 1
>>>>
>>>>                     homegfs [b5-gfsib02b] : Starting time of
>>>>                     crawl       : Thu Jan 21 12:53:14 2016
>>>>                     homegfs [b5-gfsib02b] : Ending time of
crawl :
>>>>                     Thu Jan 21 12:53:15 2016
>>>>                     homegfs [b5-gfsib02b] : Type of crawl:
INDEX
>>>>                     homegfs [b5-gfsib02b] : No. of entries
healed : 0
>>>>                     homegfs [b5-gfsib02b] : No. of entries in
>>>>                     split-brain: 0
>>>>                     homegfs [b5-gfsib02b] : No. of heal failed
>>>>                     entries   : 3
>>>>
>>>>                     homegfs [b6-gfsib02a] : Starting time of
>>>>                     crawl       : Thu Jan 21 12:53:04 2016
>>>>                     homegfs [b6-gfsib02a] : Ending time of
crawl :
>>>>                     Thu Jan 21 12:53:04 2016
>>>>                     homegfs [b6-gfsib02a] : Type of crawl:
INDEX
>>>>                     homegfs [b6-gfsib02a] : No. of entries
healed : 0
>>>>                     homegfs [b6-gfsib02a] : No. of entries in
>>>>                     split-brain: 0
>>>>                     homegfs [b6-gfsib02a] : No. of heal failed
>>>>                     entries   : 0
>>>>
>>>>                     homegfs [b7-gfsib02b] : Starting time of
>>>>                     crawl       : Thu Jan 21 12:53:09 2016
>>>>                     homegfs [b7-gfsib02b] : Ending time of
crawl :
>>>>                     Thu Jan 21 12:53:09 2016
>>>>                     homegfs [b7-gfsib02b] : Type of crawl:
INDEX
>>>>                     homegfs [b7-gfsib02b] : No. of entries
healed : 0
>>>>                     homegfs [b7-gfsib02b] : No. of entries in
>>>>                     split-brain: 0
>>>>                     homegfs [b7-gfsib02b] : No. of heal failed
>>>>                     entries   : 0
>>>>
>>>>                     *** gluster bug in 'gluster volume heal
homegfs
>>>>                     statistics' ***
>>>>                     *** Use 'gluster volume heal homegfs
info'
>>>>                     until bug is fixed ***
>>>>
>>>>                     Brick
gfs01a.corvidtec.com:/data/brick01a/homegfs/
>>>>                     Number of entries: 0
>>>>
>>>>                     Brick
gfs01b.corvidtec.com:/data/brick01b/homegfs/
>>>>                     Number of entries: 0
>>>>
>>>>                     Brick
gfs01a.corvidtec.com:/data/brick02a/homegfs/
>>>>                     Number of entries: 0
>>>>
>>>>                     Brick
gfs01b.corvidtec.com:/data/brick02b/homegfs/
>>>>                     Number of entries: 0
>>>>
>>>>                     Brick
gfs02a.corvidtec.com:/data/brick01a/homegfs/
>>>>                     /users/bangell/.gconfd - Is in split-brain
>>>>
>>>>                     Number of entries: 1
>>>>
>>>>                     Brick
gfs02b.corvidtec.com:/data/brick01b/homegfs/
>>>>                     /users/bangell/.gconfd - Is in split-brain
>>>>
>>>>                     /users/bangell/.gconfd/saved_state
>>>>                     Number of entries: 2
>>>>
>>>>                     Brick
gfs02a.corvidtec.com:/data/brick02a/homegfs/
>>>>                     Number of entries: 0
>>>>
>>>>                     Brick
gfs02b.corvidtec.com:/data/brick02b/homegfs/
>>>>                     Number of entries: 0
>>>>
>>>>
>>>>
>>>>
>>>>                     On Thu, Jan 21, 2016 at 11:10 AM, Pranith
Kumar
>>>>                     Karampuri <pkarampu at redhat.com
>>>>                     <mailto:pkarampu at redhat.com>>
wrote:
>>>>
>>>>
>>>>
>>>>                         On 01/21/2016 09:26 PM, Glomski,
Patrick wrote:
>>>>>                         I should mention that the problem
is not
>>>>>                         currently occurring and there are
no heals
>>>>>                         (output appended). By restarting
the
>>>>>                         gluster services, we can stop the
crawl,
>>>>>                         which lowers the load for a while.
>>>>>                         Subsequent crawls seem to finish
properly.
>>>>>                         For what it's worth,
files/folders that
>>>>>                         show up in the 'volume
info' output during
>>>>>                         a hung crawl don't seem to be
anything out
>>>>>                         of the ordinary.
>>>>>
>>>>>                         Over the past four days, the
typical time
>>>>>                         before the problem recurs after
>>>>>                         suppressing it in this manner is an
hour.
>>>>>                         Last night when we reached out to
you was
>>>>>                         the last time it happened and the
load has
>>>>>                         been low since (a relief). David
believes
>>>>>                         that recursively listing the files
(ls
>>>>>                         -alR or similar) from a client
mount can
>>>>>                         force the issue to happen, but
obviously
>>>>>                         I'd rather not unless we have
some precise
>>>>>                         thing we're looking for. Let me
know if
>>>>>                         you'd like me to attempt to
drive the
>>>>>                         system unstable like that and what
I
>>>>>                         should look for. As it's a
production
>>>>>                         system, I'd rather not leave it
in this
>>>>>                         state for long.
>>>>
>>>>                         Will it be possible to send glustershd,
>>>>                         mount logs of the past 4 days? I would
like
>>>>                         to see if this is because of directory
>>>>                         self-heal going wild (Ravi is working
on
>>>>                         throttling feature for 3.8, which will
>>>>                         allow to put breaks on self-heal
traffic)
>>>>
>>>>                         Pranith
>>>>
>>>>>
>>>>>                         [root at gfs01a xattrop]# gluster
volume heal
>>>>>                         homegfs info
>>>>>                         Brick
>>>>>                        
gfs01a.corvidtec.com:/data/brick01a/homegfs/
>>>>>                         Number of entries: 0
>>>>>
>>>>>                         Brick
>>>>>                        
gfs01b.corvidtec.com:/data/brick01b/homegfs/
>>>>>                         Number of entries: 0
>>>>>
>>>>>                         Brick
>>>>>                        
gfs01a.corvidtec.com:/data/brick02a/homegfs/
>>>>>                         Number of entries: 0
>>>>>
>>>>>                         Brick
>>>>>                        
gfs01b.corvidtec.com:/data/brick02b/homegfs/
>>>>>                         Number of entries: 0
>>>>>
>>>>>                         Brick
>>>>>                        
gfs02a.corvidtec.com:/data/brick01a/homegfs/
>>>>>                         Number of entries: 0
>>>>>
>>>>>                         Brick
>>>>>                        
gfs02b.corvidtec.com:/data/brick01b/homegfs/
>>>>>                         Number of entries: 0
>>>>>
>>>>>                         Brick
>>>>>                        
gfs02a.corvidtec.com:/data/brick02a/homegfs/
>>>>>                         Number of entries: 0
>>>>>
>>>>>                         Brick
>>>>>                        
gfs02b.corvidtec.com:/data/brick02b/homegfs/
>>>>>                         Number of entries: 0
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>                         On Thu, Jan 21, 2016 at 10:40 AM,
Pranith
>>>>>                         Kumar Karampuri <pkarampu at
redhat.com
>>>>>                         <mailto:pkarampu at
redhat.com>> wrote:
>>>>>
>>>>>
>>>>>
>>>>>                             On 01/21/2016 08:25 PM,
Glomski,
>>>>>                             Patrick wrote:
>>>>>>                             Hello, Pranith. The typical
behavior
>>>>>>                             is that the %cpu on a
glusterfsd
>>>>>>                             process jumps to number of
processor
>>>>>>                             cores available (800% or
1200%,
>>>>>>                             depending on the pair of
nodes
>>>>>>                             involved) and the load
average on the
>>>>>>                             machine goes very high
(~20). The
>>>>>>                             volume's heal
statistics output shows
>>>>>>                             that it is crawling one of
the bricks
>>>>>>                             and trying to heal, but
this crawl
>>>>>>                             hangs and never seems to
finish.
>>>>>>
>>>>>>                             The number of files in the
xattrop
>>>>>>                             directory varies over time,
so I ran
>>>>>>                             a wc -l as you requested
periodically
>>>>>>                             for some time and then
started
>>>>>>                             including a datestamped
list of the
>>>>>>                             files that were in the
xattrops
>>>>>>                             directory on each brick to
see which
>>>>>>                             were persistent. All bricks
had files
>>>>>>                             in the xattrop folder, so
all results
>>>>>>                             are attached.
>>>>>                             Thanks this info is helpful. I
don't
>>>>>                             see a lot of files. Could you
give
>>>>>                             output of "gluster volume
heal
>>>>>                             <volname> info"? Is
there any
>>>>>                             directory in there which is
LARGE?
>>>>>
>>>>>                             Pranith
>>>>>
>>>>>>
>>>>>>                             Please let me know if there
is
>>>>>>                             anything else I can
provide.
>>>>>>
>>>>>>                             Patrick
>>>>>>
>>>>>>
>>>>>>                             On Thu, Jan 21, 2016 at
12:01 AM,
>>>>>>                             Pranith Kumar Karampuri
>>>>>>                             <pkarampu at redhat.com
>>>>>>                             <mailto:pkarampu at
redhat.com>> wrote:
>>>>>>
>>>>>>                                 hey,
>>>>>>                                        Which process is
consuming
>>>>>>                                 so much cpu? I went
through the
>>>>>>                                 logs you gave me. I see
that the
>>>>>>                                 following files are in
gfid
>>>>>>                                 mismatch state:
>>>>>>
>>>>>>                                
<066e4525-8f8b-43aa-b7a1-86bbcecc68b9/safebrowsing-backup>,
>>>>>>                                
<1d48754b-b38c-403d-94e2-0f5c41d5f885/recovery.bak>,
>>>>>>                                
<ddc92637-303a-4059-9c56-ab23b1bb6ae9/patch0008.cnvrg>,
>>>>>>
>>>>>>                                 Could you give me the
output of
>>>>>>                                 "ls
<brick-path>/indices/xattrop
>>>>>>                                 | wc -l" output on
all the bricks
>>>>>>                                 which are acting this
way? This
>>>>>>                                 will tell us the number
of
>>>>>>                                 pending self-heals on
the system.
>>>>>>
>>>>>>                                 Pranith
>>>>>>
>>>>>>
>>>>>>                                 On 01/20/2016 09:26 PM,
David
>>>>>>                                 Robinson wrote:
>>>>>>>                                 resending with
parsed logs...
>>>>>>>>>                                 I am having
issues with 3.6.6
>>>>>>>>>                                 where the
load will spike up
>>>>>>>>>                                 to 800% for
one of the
>>>>>>>>>                                 glusterfsd
processes and the
>>>>>>>>>                                 users can
no longer access the
>>>>>>>>>                                 system.  If
I reboot the node,
>>>>>>>>>                                 the heal
will finish normally
>>>>>>>>>                                 after a few
minutes and the
>>>>>>>>>                                 system will
be responsive,
>>>>>>>>>                                 but a few
hours later the
>>>>>>>>>                                 issue will
start again. It
>>>>>>>>>                                 look like
it is hanging in a
>>>>>>>>>                                 heal and
spinning up the load
>>>>>>>>>                                 on one of
the bricks.  The
>>>>>>>>>                                 heal gets
stuck and says it is
>>>>>>>>>                                 crawling
and never returns.
>>>>>>>>>                                 After a few
minutes of the
>>>>>>>>>                                 heal saying
it is crawling,
>>>>>>>>>                                 the load
spikes up and the
>>>>>>>>>                                 mounts
become unresponsive.
>>>>>>>>>                                 Any
suggestions on how to fix
>>>>>>>>>                                 this?  It
has us stopped cold
>>>>>>>>>                                 as the user
can no longer
>>>>>>>>>                                 access the
systems when the
>>>>>>>>>                                 load
spikes... Logs attached.
>>>>>>>>>                                 System
setup info is:
>>>>>>>>>                                 [root at
gfs01a ~]# gluster
>>>>>>>>>                                 volume info
homegfs
>>>>>>>>>
>>>>>>>>>                                 Volume
Name: homegfs
>>>>>>>>>                                 Type:
Distributed-Replicate
>>>>>>>>>                                 Volume ID:
>>>>>>>>>                                
1e32672a-f1b7-4b58-ba94-58c085e59071
>>>>>>>>>                                 Status:
Started
>>>>>>>>>                                 Number of
Bricks: 4 x 2 = 8
>>>>>>>>>                                
Transport-type: tcp
>>>>>>>>>                                 Bricks:
>>>>>>>>>                                 Brick1:
>>>>>>>>>                                
gfsib01a.corvidtec.com:/data/brick01a/homegfs
>>>>>>>>>                                 Brick2:
>>>>>>>>>                                
gfsib01b.corvidtec.com:/data/brick01b/homegfs
>>>>>>>>>                                 Brick3:
>>>>>>>>>                                
gfsib01a.corvidtec.com:/data/brick02a/homegfs
>>>>>>>>>                                 Brick4:
>>>>>>>>>                                
gfsib01b.corvidtec.com:/data/brick02b/homegfs
>>>>>>>>>                                 Brick5:
>>>>>>>>>                                
gfsib02a.corvidtec.com:/data/brick01a/homegfs
>>>>>>>>>                                 Brick6:
>>>>>>>>>                                
gfsib02b.corvidtec.com:/data/brick01b/homegfs
>>>>>>>>>                                 Brick7:
>>>>>>>>>                                
gfsib02a.corvidtec.com:/data/brick02a/homegfs
>>>>>>>>>                                 Brick8:
>>>>>>>>>                                
gfsib02b.corvidtec.com:/data/brick02b/homegfs
>>>>>>>>>                                 Options
Reconfigured:
>>>>>>>>>                                
performance.io-thread-count: 32
>>>>>>>>>                                
performance.cache-size: 128MB
>>>>>>>>>                                
performance.write-behind-window-size:
>>>>>>>>>                                 128MB
>>>>>>>>>                                
server.allow-insecure: on
>>>>>>>>>                                
network.ping-timeout: 42
>>>>>>>>>                                
storage.owner-gid: 100
>>>>>>>>>                                
geo-replication.indexing: off
>>>>>>>>>                                
geo-replication.ignore-pid-check:
>>>>>>>>>                                 on
>>>>>>>>>                                
changelog.changelog: off
>>>>>>>>>                                
changelog.fsync-interval: 3
>>>>>>>>>                                
changelog.rollover-time: 15
>>>>>>>>>                                
server.manage-gids: on
>>>>>>>>>                                
diagnostics.client-log-level:
>>>>>>>>>                                 WARNING
>>>>>>>>>                                 [root at
gfs01a ~]# rpm -qa |
>>>>>>>>>                                 grep
gluster
>>>>>>>>>                                
gluster-nagios-common-0.1.1-0.el6.noarch
>>>>>>>>>                                
glusterfs-fuse-3.6.6-1.el6.x86_64
>>>>>>>>>                                
glusterfs-debuginfo-3.6.6-1.el6.x86_64
>>>>>>>>>                                
glusterfs-libs-3.6.6-1.el6.x86_64
>>>>>>>>>                                
glusterfs-geo-replication-3.6.6-1.el6.x86_64
>>>>>>>>>                                
glusterfs-api-3.6.6-1.el6.x86_64
>>>>>>>>>                                
glusterfs-devel-3.6.6-1.el6.x86_64
>>>>>>>>>                                
glusterfs-api-devel-3.6.6-1.el6.x86_64
>>>>>>>>>                                
glusterfs-3.6.6-1.el6.x86_64
>>>>>>>>>                                
glusterfs-cli-3.6.6-1.el6.x86_64
>>>>>>>>>                                
glusterfs-rdma-3.6.6-1.el6.x86_64
>>>>>>>>>                                
samba-vfs-glusterfs-4.1.11-2.el6.x86_64
>>>>>>>>>                                
glusterfs-server-3.6.6-1.el6.x86_64
>>>>>>>>>                                
glusterfs-extra-xlators-3.6.6-1.el6.x86_64
>>>>>>>
>>>>>>>
>>>>>>>                                
_______________________________________________
>>>>>>>                                 Gluster-devel
mailing list
>>>>>>>                                 Gluster-devel at
gluster.org  <mailto:Gluster-devel at gluster.org>
>>>>>>>                                
http://www.gluster.org/mailman/listinfo/gluster-devel
>>>>>>
>>>>>>
>>>>>>                                
_______________________________________________
>>>>>>                                 Gluster-users mailing
list
>>>>>>                                 Gluster-users at
gluster.org
>>>>>>                                
<mailto:Gluster-users at gluster.org>
>>>>>>                                
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160125/80f29e03/attachment.html>

David Robinson

2016-Jan-25 17:40 UTC

head link

[Gluster-users] [Gluster-devel] heal hanging

It is doing it again... statedump from gfs02a is attached...



------ Original Message ------
From: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
To: "Glomski, Patrick" <patrick.glomski at corvidtec.com>
Cc: "David Robinson" <drobinson at corvidtec.com>; 
"gluster-users at gluster.org" <gluster-users at gluster.org>;
"Gluster Devel"
<gluster-devel at gluster.org>
Sent: 1/24/2016 9:27:02 PM
Subject: Re: [Gluster-users] [Gluster-devel] heal hanging
>It seems like there is a lot of finodelk/inodelk traffic. I wonder why 
>that is. I think the next steps is to collect statedump of the brick 
>which is taking lot of CPU, using "gluster volume statedump
<volname>"
>
>Pranith
>On 01/22/2016 08:36 AM, Glomski, Patrick wrote:
>>Pranith, attached are stack traces collected every second for 20 
>>seconds from the high-%cpu glusterfsd process.
>>
>>Patrick
>>
>>On Thu, Jan 21, 2016 at 9:46 PM, Glomski, Patrick 
>><patrick.glomski at corvidtec.com> wrote:
>>>Last entry for get_real_filename on any of the bricks was when we 
>>>turned off the samba gfapi vfs plugin earlier today:
>>>
>>>/var/log/glusterfs/bricks/data-brick01a-homegfs.log:[2016-01-21 
>>>15:13:00.008239] E [server-rpc-fops.c:768:server_getxattr_cbk] 
>>>0-homegfs-server: 105: GETXATTR /wks_backup 
>>>(40e582d6-b0c7-4099-ba88-9168a3c32ca6) 
>>>(glusterfs.get_real_filename:desktop.ini) ==> (Permission denied)
>>>
>>>We'll get back to you with those traces when %cpu spikes again.
As
>>>with most sporadic problems, as soon as you want something out of
it,
>>>the issue becomes harder to reproduce.
>>>
>>>
>>>On Thu, Jan 21, 2016 at 9:21 PM, Pranith Kumar Karampuri 
>>><pkarampu at redhat.com> wrote:
>>>>
>>>>
>>>>On 01/22/2016 07:25 AM, Glomski, Patrick wrote:
>>>>>Unfortunately, all samba mounts to the gluster volume
through the
>>>>>gfapi vfs plugin have been disabled for the last 6 hours or
so and
>>>>>frequency of %cpu spikes is increased. We had switched to
sharing a
>>>>>fuse mount through samba, but I just disabled that as well.
There
>>>>>are no samba shares of this volume now. The spikes now
happen every
>>>>>thirty minutes or so. We've resorted to just rebooting
the machine
>>>>>with high load for the present.
>>>>
>>>>Could you see if the logs of following type are not at all
coming?
>>>>[2016-01-21 15:13:00.005736] E 
>>>>[server-rpc-fops.c:768:server_getxattr_cbk] 0-homegfs-server:
110:
>>>>GETXATTR /wks_backup (40e582d6-b0c7-4099-ba88-9168a3c
>>>>32ca6) (glusterfs.get_real_filename:desktop.ini) ==>
(Permission
>>>>denied)
>>>>
>>>>These are operations that failed. Operations that succeed are
the
>>>>ones that will scan the directory. But I don't have a way to
find
>>>>them other than using tcpdumps.
>>>>
>>>>At the moment I have 2 theories:
>>>>1) these get_real_filename calls
>>>>2) [2016-01-21 16:10:38.017828] E
[server-helpers.c:46:gid_resolve]
>>>>0-gid-cache: getpwuid_r(494) failed
>>>>"
>>>>Yessir they are.  Normally, sssd would look to the local cache
file
>>>>in /var/lib/sss/db/ first, to get any group or userid
information,
>>>>then go out to the domain controller.  I put the options that we
are
>>>>using on our GFS volumes below?  Thanks for your help.
>>>>
>>>>
>>>>
>>>>We had been running sssd with sssd_nss and sssd_be sub-processes
on
>>>>these systems for a long time, under the GFS 3.5.2 code, and not
run
>>>>into the problem that David described with the high cpu usage on
>>>>sssd_nss.
>>>>
>>>>"
>>>>That was Tom Young's email 1.5 years back when we debugged
it. But
>>>>the process which was consuming lot of cpu is sssd_nss. So I am
not
>>>>sure if it is same issue. Let us debug to see '1)'
doesn't happen.
>>>>The gstack traces I asked for should also help.
>>>>
>>>>
>>>>Pranith
>>>>>
>>>>>On Thu, Jan 21, 2016 at 8:49 PM, Pranith Kumar Karampuri 
>>>>><pkarampu at redhat.com> wrote:
>>>>>>
>>>>>>
>>>>>>On 01/22/2016 07:13 AM, Glomski, Patrick wrote:
>>>>>>>We use the samba glusterfs virtual filesystem (the
current
>>>>>>>version provided on download.gluster.org), but no
windows clients
>>>>>>>connecting directly.
>>>>>>
>>>>>>Hmm.. Is there a way to disable using this and check if
the CPU%
>>>>>>still increases? What getxattr of
"glusterfs.get_real_filename
>>>>>><filanme>" does is to scan the entire
directory looking for
>>>>>>strcasecmp(<filname>, <scanned-filename>).
If anything matches
>>>>>>then it will return the <scanned-filename>. But
the problem is the
>>>>>>scan is costly. So I wonder if this is the reason for
the CPU
>>>>>>spikes.
>>>>>>
>>>>>>Pranith
>>>>>>
>>>>>>>
>>>>>>>On Thu, Jan 21, 2016 at 8:37 PM, Pranith Kumar
Karampuri
>>>>>>><pkarampu at redhat.com> wrote:
>>>>>>>>Do you have any windows clients? I see a lot of
getxattr calls
>>>>>>>>for "glusterfs.get_real_filename"
which lead to full readdirs of
>>>>>>>>the directories on the brick.
>>>>>>>>
>>>>>>>>Pranith
>>>>>>>>
>>>>>>>>On 01/22/2016 12:51 AM, Glomski, Patrick wrote:
>>>>>>>>>Pranith, could this kind of behavior be
self-inflicted by us
>>>>>>>>>deleting files directly from the bricks? We
have done that in
>>>>>>>>>the past to clean up an issues where gluster
wouldn't allow us
>>>>>>>>>to delete from the mount.
>>>>>>>>>
>>>>>>>>>If so, is it feasible to clean them up by
running a search on
>>>>>>>>>the .glusterfs directories directly and
removing files with a
>>>>>>>>>reference count of 1 that are non-zero size
(or directly
>>>>>>>>>checking the xattrs to be sure that it's
not a DHT link).
>>>>>>>>>
>>>>>>>>>find /data/brick01a/homegfs/.glusterfs -type
f -not -empty
>>>>>>>>>-links -2 -exec rm -f "{}" \;
>>>>>>>>>
>>>>>>>>>Is there anything I'm inherently missing
with that approach
>>>>>>>>>that will further corrupt the system?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>On Thu, Jan 21, 2016 at 1:02 PM, Glomski,
Patrick
>>>>>>>>><patrick.glomski at corvidtec.com>
wrote:
>>>>>>>>>>Load spiked again: ~1200%cpu on gfs02a
for glusterfsd. Crawl
>>>>>>>>>>has been running on one of the bricks on
gfs02b for 25 min or
>>>>>>>>>>so and users cannot access the volume.
>>>>>>>>>>
>>>>>>>>>>I re-listed the xattrop directories as
well as a 'top' entry
>>>>>>>>>>and heal statistics. Then I restarted
the gluster services on
>>>>>>>>>>gfs02a.
>>>>>>>>>>
>>>>>>>>>>=================== top
==================>>>>>>>>>>PID USER      PR  NI 
VIRT  RES  SHR S %CPU %MEM    TIME+
>>>>>>>>>>COMMAND
>>>>>>>>>>  8969 root      20   0 2815m 204m 3588
S 1181.0  0.6 591:06.93
>>>>>>>>>>glusterfsd
>>>>>>>>>>
>>>>>>>>>>=================== xattrop
==================>>>>>>>>>>/data/brick01a/homegfs/.glusterfs/indices/xattrop:
>>>>>>>>>>xattrop-41f19453-91e4-437c-afa9-3b25614de210
>>>>>>>>>>xattrop-9b815879-2f4d-402b-867c-a6d65087788c
>>>>>>>>>>
>>>>>>>>>>/data/brick02a/homegfs/.glusterfs/indices/xattrop:
>>>>>>>>>>xattrop-70131855-3cfb-49af-abce-9d23f57fb393
>>>>>>>>>>xattrop-dfb77848-a39d-4417-a725-9beca75d78c6
>>>>>>>>>>
>>>>>>>>>>/data/brick01b/homegfs/.glusterfs/indices/xattrop:
>>>>>>>>>>e6e47ed9-309b-42a7-8c44-28c29b9a20f8
>>>>>>>>>>xattrop-5c797a64-bde7-4eac-b4fc-0befc632e125
>>>>>>>>>>xattrop-38ec65a1-00b5-4544-8a6c-bf0f531a1934
>>>>>>>>>>xattrop-ef0980ad-f074-4163-979f-16d5ef85b0a0
>>>>>>>>>>
>>>>>>>>>>/data/brick02b/homegfs/.glusterfs/indices/xattrop:
>>>>>>>>>>xattrop-7402438d-0ee7-4fcf-b9bb-b561236f99bc
>>>>>>>>>>xattrop-8ffbf5f7-ace3-497d-944e-93ac85241413
>>>>>>>>>>
>>>>>>>>>>/data/brick01a/homegfs/.glusterfs/indices/xattrop:
>>>>>>>>>>xattrop-0115acd0-caae-4dfd-b3b4-7cc42a0ff531
>>>>>>>>>>
>>>>>>>>>>/data/brick02a/homegfs/.glusterfs/indices/xattrop:
>>>>>>>>>>xattrop-7e20fdb1-5224-4b9a-be06-568708526d70
>>>>>>>>>>
>>>>>>>>>>/data/brick01b/homegfs/.glusterfs/indices/xattrop:
>>>>>>>>>>8034bc06-92cd-4fa5-8aaf-09039e79d2c8  
>>>>>>>>>>c9ce22ed-6d8b-471b-a111-b39e57f0b512
>>>>>>>>>>94fa1d60-45ad-4341-b69c-315936b51e8d  
>>>>>>>>>>xattrop-9c04623a-64ce-4f66-8b23-dbaba49119c7
>>>>>>>>>>
>>>>>>>>>>/data/brick02b/homegfs/.glusterfs/indices/xattrop:
>>>>>>>>>>xattrop-b8c8f024-d038-49a2-9a53-c54ead09111d
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>=================== heal stats
==================>>>>>>>>>>
>>>>>>>>>>homegfs [b0-gfsib01a] : Starting time of
crawl       : Thu Jan
>>>>>>>>>>21 12:36:45 2016
>>>>>>>>>>homegfs [b0-gfsib01a] : Ending time of
crawl         : Thu Jan
>>>>>>>>>>21 12:36:45 2016
>>>>>>>>>>homegfs [b0-gfsib01a] : Type of crawl:
INDEX
>>>>>>>>>>homegfs [b0-gfsib01a] : No. of entries
healed        : 0
>>>>>>>>>>homegfs [b0-gfsib01a] : No. of entries
in split-brain: 0
>>>>>>>>>>homegfs [b0-gfsib01a] : No. of heal
failed entries   : 0
>>>>>>>>>>
>>>>>>>>>>homegfs [b1-gfsib01b] : Starting time of
crawl       : Thu Jan
>>>>>>>>>>21 12:36:19 2016
>>>>>>>>>>homegfs [b1-gfsib01b] : Ending time of
crawl         : Thu Jan
>>>>>>>>>>21 12:36:19 2016
>>>>>>>>>>homegfs [b1-gfsib01b] : Type of crawl:
INDEX
>>>>>>>>>>homegfs [b1-gfsib01b] : No. of entries
healed        : 0
>>>>>>>>>>homegfs [b1-gfsib01b] : No. of entries
in split-brain: 0
>>>>>>>>>>homegfs [b1-gfsib01b] : No. of heal
failed entries   : 1
>>>>>>>>>>
>>>>>>>>>>homegfs [b2-gfsib01a] : Starting time of
crawl       : Thu Jan
>>>>>>>>>>21 12:36:48 2016
>>>>>>>>>>homegfs [b2-gfsib01a] : Ending time of
crawl         : Thu Jan
>>>>>>>>>>21 12:36:48 2016
>>>>>>>>>>homegfs [b2-gfsib01a] : Type of crawl:
INDEX
>>>>>>>>>>homegfs [b2-gfsib01a] : No. of entries
healed        : 0
>>>>>>>>>>homegfs [b2-gfsib01a] : No. of entries
in split-brain: 0
>>>>>>>>>>homegfs [b2-gfsib01a] : No. of heal
failed entries   : 0
>>>>>>>>>>
>>>>>>>>>>homegfs [b3-gfsib01b] : Starting time of
crawl       : Thu Jan
>>>>>>>>>>21 12:36:47 2016
>>>>>>>>>>homegfs [b3-gfsib01b] : Ending time of
crawl         : Thu Jan
>>>>>>>>>>21 12:36:47 2016
>>>>>>>>>>homegfs [b3-gfsib01b] : Type of crawl:
INDEX
>>>>>>>>>>homegfs [b3-gfsib01b] : No. of entries
healed        : 0
>>>>>>>>>>homegfs [b3-gfsib01b] : No. of entries
in split-brain: 0
>>>>>>>>>>homegfs [b3-gfsib01b] : No. of heal
failed entries   : 0
>>>>>>>>>>
>>>>>>>>>>homegfs [b4-gfsib02a] : Starting time of
crawl       : Thu Jan
>>>>>>>>>>21 12:36:06 2016
>>>>>>>>>>homegfs [b4-gfsib02a] : Ending time of
crawl         : Thu Jan
>>>>>>>>>>21 12:36:06 2016
>>>>>>>>>>homegfs [b4-gfsib02a] : Type of crawl:
INDEX
>>>>>>>>>>homegfs [b4-gfsib02a] : No. of entries
healed        : 0
>>>>>>>>>>homegfs [b4-gfsib02a] : No. of entries
in split-brain: 0
>>>>>>>>>>homegfs [b4-gfsib02a] : No. of heal
failed entries   : 0
>>>>>>>>>>
>>>>>>>>>>homegfs [b5-gfsib02b] : Starting time of
crawl       : Thu Jan
>>>>>>>>>>21 12:13:40 2016
>>>>>>>>>>homegfs [b5-gfsib02b] :                 
***
>>>>>>>>>>Crawl is in progress ***
>>>>>>>>>>homegfs [b5-gfsib02b] : Type of crawl:
INDEX
>>>>>>>>>>homegfs [b5-gfsib02b] : No. of entries
healed        : 0
>>>>>>>>>>homegfs [b5-gfsib02b] : No. of entries
in split-brain: 0
>>>>>>>>>>homegfs [b5-gfsib02b] : No. of heal
failed entries   : 0
>>>>>>>>>>
>>>>>>>>>>homegfs [b6-gfsib02a] : Starting time of
crawl       : Thu Jan
>>>>>>>>>>21 12:36:58 2016
>>>>>>>>>>homegfs [b6-gfsib02a] : Ending time of
crawl         : Thu Jan
>>>>>>>>>>21 12:36:58 2016
>>>>>>>>>>homegfs [b6-gfsib02a] : Type of crawl:
INDEX
>>>>>>>>>>homegfs [b6-gfsib02a] : No. of entries
healed        : 0
>>>>>>>>>>homegfs [b6-gfsib02a] : No. of entries
in split-brain: 0
>>>>>>>>>>homegfs [b6-gfsib02a] : No. of heal
failed entries   : 0
>>>>>>>>>>
>>>>>>>>>>homegfs [b7-gfsib02b] : Starting time of
crawl       : Thu Jan
>>>>>>>>>>21 12:36:50 2016
>>>>>>>>>>homegfs [b7-gfsib02b] : Ending time of
crawl         : Thu Jan
>>>>>>>>>>21 12:36:50 2016
>>>>>>>>>>homegfs [b7-gfsib02b] : Type of crawl:
INDEX
>>>>>>>>>>homegfs [b7-gfsib02b] : No. of entries
healed        : 0
>>>>>>>>>>homegfs [b7-gfsib02b] : No. of entries
in split-brain: 0
>>>>>>>>>>homegfs [b7-gfsib02b] : No. of heal
failed entries   : 0
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>=======================================================================================>>>>>>>>>>I
waited a few minutes for the heals to finish and ran the
>>>>>>>>>>heal statistics and info again. one file
is in split-brain.
>>>>>>>>>>Aside from the split-brain, the load on
all systems is down
>>>>>>>>>>now and they are behaving normally.
glustershd.log is
>>>>>>>>>>attached. What is going on???
>>>>>>>>>>
>>>>>>>>>>Thu Jan 21 12:53:50 EST 2016
>>>>>>>>>>
>>>>>>>>>>=================== homegfs
==================>>>>>>>>>>
>>>>>>>>>>homegfs [b0-gfsib01a] : Starting time of
crawl       : Thu Jan
>>>>>>>>>>21 12:53:02 2016
>>>>>>>>>>homegfs [b0-gfsib01a] : Ending time of
crawl         : Thu Jan
>>>>>>>>>>21 12:53:02 2016
>>>>>>>>>>homegfs [b0-gfsib01a] : Type of crawl:
INDEX
>>>>>>>>>>homegfs [b0-gfsib01a] : No. of entries
healed        : 0
>>>>>>>>>>homegfs [b0-gfsib01a] : No. of entries
in split-brain: 0
>>>>>>>>>>homegfs [b0-gfsib01a] : No. of heal
failed entries   : 0
>>>>>>>>>>
>>>>>>>>>>homegfs [b1-gfsib01b] : Starting time of
crawl       : Thu Jan
>>>>>>>>>>21 12:53:38 2016
>>>>>>>>>>homegfs [b1-gfsib01b] : Ending time of
crawl         : Thu Jan
>>>>>>>>>>21 12:53:38 2016
>>>>>>>>>>homegfs [b1-gfsib01b] : Type of crawl:
INDEX
>>>>>>>>>>homegfs [b1-gfsib01b] : No. of entries
healed        : 0
>>>>>>>>>>homegfs [b1-gfsib01b] : No. of entries
in split-brain: 0
>>>>>>>>>>homegfs [b1-gfsib01b] : No. of heal
failed entries   : 1
>>>>>>>>>>
>>>>>>>>>>homegfs [b2-gfsib01a] : Starting time of
crawl       : Thu Jan
>>>>>>>>>>21 12:53:04 2016
>>>>>>>>>>homegfs [b2-gfsib01a] : Ending time of
crawl         : Thu Jan
>>>>>>>>>>21 12:53:04 2016
>>>>>>>>>>homegfs [b2-gfsib01a] : Type of crawl:
INDEX
>>>>>>>>>>homegfs [b2-gfsib01a] : No. of entries
healed        : 0
>>>>>>>>>>homegfs [b2-gfsib01a] : No. of entries
in split-brain: 0
>>>>>>>>>>homegfs [b2-gfsib01a] : No. of heal
failed entries   : 0
>>>>>>>>>>
>>>>>>>>>>homegfs [b3-gfsib01b] : Starting time of
crawl       : Thu Jan
>>>>>>>>>>21 12:53:04 2016
>>>>>>>>>>homegfs [b3-gfsib01b] : Ending time of
crawl         : Thu Jan
>>>>>>>>>>21 12:53:04 2016
>>>>>>>>>>homegfs [b3-gfsib01b] : Type of crawl:
INDEX
>>>>>>>>>>homegfs [b3-gfsib01b] : No. of entries
healed        : 0
>>>>>>>>>>homegfs [b3-gfsib01b] : No. of entries
in split-brain: 0
>>>>>>>>>>homegfs [b3-gfsib01b] : No. of heal
failed entries   : 0
>>>>>>>>>>
>>>>>>>>>>homegfs [b4-gfsib02a] : Starting time of
crawl       : Thu Jan
>>>>>>>>>>21 12:53:33 2016
>>>>>>>>>>homegfs [b4-gfsib02a] : Ending time of
crawl         : Thu Jan
>>>>>>>>>>21 12:53:33 2016
>>>>>>>>>>homegfs [b4-gfsib02a] : Type of crawl:
INDEX
>>>>>>>>>>homegfs [b4-gfsib02a] : No. of entries
healed        : 0
>>>>>>>>>>homegfs [b4-gfsib02a] : No. of entries
in split-brain: 0
>>>>>>>>>>homegfs [b4-gfsib02a] : No. of heal
failed entries   : 1
>>>>>>>>>>
>>>>>>>>>>homegfs [b5-gfsib02b] : Starting time of
crawl       : Thu Jan
>>>>>>>>>>21 12:53:14 2016
>>>>>>>>>>homegfs [b5-gfsib02b] : Ending time of
crawl         : Thu Jan
>>>>>>>>>>21 12:53:15 2016
>>>>>>>>>>homegfs [b5-gfsib02b] : Type of crawl:
INDEX
>>>>>>>>>>homegfs [b5-gfsib02b] : No. of entries
healed        : 0
>>>>>>>>>>homegfs [b5-gfsib02b] : No. of entries
in split-brain: 0
>>>>>>>>>>homegfs [b5-gfsib02b] : No. of heal
failed entries   : 3
>>>>>>>>>>
>>>>>>>>>>homegfs [b6-gfsib02a] : Starting time of
crawl       : Thu Jan
>>>>>>>>>>21 12:53:04 2016
>>>>>>>>>>homegfs [b6-gfsib02a] : Ending time of
crawl         : Thu Jan
>>>>>>>>>>21 12:53:04 2016
>>>>>>>>>>homegfs [b6-gfsib02a] : Type of crawl:
INDEX
>>>>>>>>>>homegfs [b6-gfsib02a] : No. of entries
healed        : 0
>>>>>>>>>>homegfs [b6-gfsib02a] : No. of entries
in split-brain: 0
>>>>>>>>>>homegfs [b6-gfsib02a] : No. of heal
failed entries   : 0
>>>>>>>>>>
>>>>>>>>>>homegfs [b7-gfsib02b] : Starting time of
crawl       : Thu Jan
>>>>>>>>>>21 12:53:09 2016
>>>>>>>>>>homegfs [b7-gfsib02b] : Ending time of
crawl         : Thu Jan
>>>>>>>>>>21 12:53:09 2016
>>>>>>>>>>homegfs [b7-gfsib02b] : Type of crawl:
INDEX
>>>>>>>>>>homegfs [b7-gfsib02b] : No. of entries
healed        : 0
>>>>>>>>>>homegfs [b7-gfsib02b] : No. of entries
in split-brain: 0
>>>>>>>>>>homegfs [b7-gfsib02b] : No. of heal
failed entries   : 0
>>>>>>>>>>
>>>>>>>>>>*** gluster bug in 'gluster volume
heal homegfs statistics'
>>>>>>>>>>***
>>>>>>>>>>*** Use 'gluster volume heal homegfs
info' until bug is fixed
>>>>>>>>>>***
>>>>>>>>>>
>>>>>>>>>>Brick
gfs01a.corvidtec.com:/data/brick01a/homegfs/
>>>>>>>>>>Number of entries: 0
>>>>>>>>>>
>>>>>>>>>>Brick
gfs01b.corvidtec.com:/data/brick01b/homegfs/
>>>>>>>>>>Number of entries: 0
>>>>>>>>>>
>>>>>>>>>>Brick
gfs01a.corvidtec.com:/data/brick02a/homegfs/
>>>>>>>>>>Number of entries: 0
>>>>>>>>>>
>>>>>>>>>>Brick
gfs01b.corvidtec.com:/data/brick02b/homegfs/
>>>>>>>>>>Number of entries: 0
>>>>>>>>>>
>>>>>>>>>>Brick
gfs02a.corvidtec.com:/data/brick01a/homegfs/
>>>>>>>>>>/users/bangell/.gconfd - Is in
split-brain
>>>>>>>>>>
>>>>>>>>>>Number of entries: 1
>>>>>>>>>>
>>>>>>>>>>Brick
gfs02b.corvidtec.com:/data/brick01b/homegfs/
>>>>>>>>>>/users/bangell/.gconfd - Is in
split-brain
>>>>>>>>>>
>>>>>>>>>>/users/bangell/.gconfd/saved_state
>>>>>>>>>>Number of entries: 2
>>>>>>>>>>
>>>>>>>>>>Brick
gfs02a.corvidtec.com:/data/brick02a/homegfs/
>>>>>>>>>>Number of entries: 0
>>>>>>>>>>
>>>>>>>>>>Brick
gfs02b.corvidtec.com:/data/brick02b/homegfs/
>>>>>>>>>>Number of entries: 0
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>On Thu, Jan 21, 2016 at 11:10 AM,
Pranith Kumar Karampuri
>>>>>>>>>><pkarampu at redhat.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>On 01/21/2016 09:26 PM, Glomski,
Patrick wrote:
>>>>>>>>>>>>I should mention that the
problem is not currently occurring
>>>>>>>>>>>>and there are no heals (output
appended). By restarting the
>>>>>>>>>>>>gluster services, we can stop
the crawl, which lowers the
>>>>>>>>>>>>load for a while. Subsequent
crawls seem to finish properly.
>>>>>>>>>>>>For what it's worth,
files/folders that show up in the
>>>>>>>>>>>>'volume info' output
during a hung crawl don't seem to be
>>>>>>>>>>>>anything out of the ordinary.
>>>>>>>>>>>>
>>>>>>>>>>>>Over the past four days, the
typical time before the problem
>>>>>>>>>>>>recurs after suppressing it in
this manner is an hour. Last
>>>>>>>>>>>>night when we reached out to you
was the last time it
>>>>>>>>>>>>happened and the load has been
low since (a relief).  David
>>>>>>>>>>>>believes that recursively
listing the files (ls -alR or
>>>>>>>>>>>>similar) from a client mount can
force the issue to happen,
>>>>>>>>>>>>but obviously I'd rather not
unless we have some precise
>>>>>>>>>>>>thing we're looking for. Let
me know if you'd like me to
>>>>>>>>>>>>attempt to drive the system
unstable like that and what I
>>>>>>>>>>>>should look for. As it's a
production system, I'd rather not
>>>>>>>>>>>>leave it in this state for long.
>>>>>>>>>>>
>>>>>>>>>>>Will it be possible to send
glustershd, mount logs of the
>>>>>>>>>>>past 4 days? I would like to see if
this is because of
>>>>>>>>>>>directory self-heal going wild (Ravi
is working on throttling
>>>>>>>>>>>feature for 3.8, which will allow to
put breaks on self-heal
>>>>>>>>>>>traffic)
>>>>>>>>>>>
>>>>>>>>>>>Pranith
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>[root at gfs01a xattrop]#
gluster volume heal homegfs info
>>>>>>>>>>>>Brick
gfs01a.corvidtec.com:/data/brick01a/homegfs/
>>>>>>>>>>>>Number of entries: 0
>>>>>>>>>>>>
>>>>>>>>>>>>Brick
gfs01b.corvidtec.com:/data/brick01b/homegfs/
>>>>>>>>>>>>Number of entries: 0
>>>>>>>>>>>>
>>>>>>>>>>>>Brick
gfs01a.corvidtec.com:/data/brick02a/homegfs/
>>>>>>>>>>>>Number of entries: 0
>>>>>>>>>>>>
>>>>>>>>>>>>Brick
gfs01b.corvidtec.com:/data/brick02b/homegfs/
>>>>>>>>>>>>Number of entries: 0
>>>>>>>>>>>>
>>>>>>>>>>>>Brick
gfs02a.corvidtec.com:/data/brick01a/homegfs/
>>>>>>>>>>>>Number of entries: 0
>>>>>>>>>>>>
>>>>>>>>>>>>Brick
gfs02b.corvidtec.com:/data/brick01b/homegfs/
>>>>>>>>>>>>Number of entries: 0
>>>>>>>>>>>>
>>>>>>>>>>>>Brick
gfs02a.corvidtec.com:/data/brick02a/homegfs/
>>>>>>>>>>>>Number of entries: 0
>>>>>>>>>>>>
>>>>>>>>>>>>Brick
gfs02b.corvidtec.com:/data/brick02b/homegfs/
>>>>>>>>>>>>Number of entries: 0
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>On Thu, Jan 21, 2016 at 10:40
AM, Pranith Kumar Karampuri
>>>>>>>>>>>><pkarampu at redhat.com>
wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>On 01/21/2016 08:25 PM,
Glomski, Patrick wrote:
>>>>>>>>>>>>>>Hello, Pranith. The
typical behavior is that the %cpu on a
>>>>>>>>>>>>>>glusterfsd process jumps
to number of processor cores
>>>>>>>>>>>>>>available (800% or
1200%, depending on the pair of nodes
>>>>>>>>>>>>>>involved) and the load
average on the machine goes very
>>>>>>>>>>>>>>high (~20). The
volume's heal statistics output shows that
>>>>>>>>>>>>>>it is crawling one of
the bricks and trying to heal, but
>>>>>>>>>>>>>>this crawl hangs and
never seems to finish.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>The number of files in
the xattrop directory varies over
>>>>>>>>>>>>>>time, so I ran a wc -l
as you requested periodically for
>>>>>>>>>>>>>>some time and then
started including a datestamped list of
>>>>>>>>>>>>>>the files that were in
the xattrops directory on each
>>>>>>>>>>>>>>brick to see which were
persistent. All bricks had files
>>>>>>>>>>>>>>in the xattrop folder,
so all results are attached.
>>>>>>>>>>>>>Thanks this info is helpful.
I don't see a lot of files.
>>>>>>>>>>>>>Could you give output of
"gluster volume heal <volname>
>>>>>>>>>>>>>info"? Is there any
directory in there which is LARGE?
>>>>>>>>>>>>>
>>>>>>>>>>>>>Pranith
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>Please let me know if
there is anything else I can
>>>>>>>>>>>>>>provide.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>Patrick
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>On Thu, Jan 21, 2016 at
12:01 AM, Pranith Kumar Karampuri
>>>>>>>>>>>>>><pkarampu at
redhat.com> wrote:
>>>>>>>>>>>>>>>hey,
>>>>>>>>>>>>>>>        Which
process is consuming so much cpu? I went
>>>>>>>>>>>>>>>through the logs you
gave me. I see that the following
>>>>>>>>>>>>>>>files are in gfid
mismatch state:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>><066e4525-8f8b-43aa-b7a1-86bbcecc68b9/safebrowsing-backup>,
>>>>>>>>>>>>>>><1d48754b-b38c-403d-94e2-0f5c41d5f885/recovery.bak>,
>>>>>>>>>>>>>>><ddc92637-303a-4059-9c56-ab23b1bb6ae9/patch0008.cnvrg>,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>Could you give me
the output of "ls
>>>>>>>>>>>>>>><brick-path>/indices/xattrop
| wc -l" output on all the
>>>>>>>>>>>>>>>bricks which are
acting this way? This will tell us the
>>>>>>>>>>>>>>>number of pending
self-heals on the system.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>Pranith
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>On 01/20/2016 09:26
PM, David Robinson wrote:
>>>>>>>>>>>>>>>>resending with
parsed logs...
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>I am
having issues with 3.6.6 where the load will
>>>>>>>>>>>>>>>>>>spike up
to 800% for one of the glusterfsd processes
>>>>>>>>>>>>>>>>>>and the
users can no longer access the system.  If I
>>>>>>>>>>>>>>>>>>reboot
the node, the heal will finish normally after a
>>>>>>>>>>>>>>>>>>few
minutes and the system will be responsive, but a
>>>>>>>>>>>>>>>>>>few
hours later the issue will start again.  It look
>>>>>>>>>>>>>>>>>>like it
is hanging in a heal and spinning up the load
>>>>>>>>>>>>>>>>>>on one
of the bricks.  The heal gets stuck and says it
>>>>>>>>>>>>>>>>>>is
crawling and never returns.  After a few minutes of
>>>>>>>>>>>>>>>>>>the heal
saying it is crawling, the load spikes up and
>>>>>>>>>>>>>>>>>>the
mounts become unresponsive.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>Any
suggestions on how to fix this?  It has us stopped
>>>>>>>>>>>>>>>>>>cold as
the user can no longer access the systems when
>>>>>>>>>>>>>>>>>>the load
spikes... Logs attached.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>System
setup info is:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>[root at
gfs01a ~]# gluster volume info homegfs
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>Volume
Name: homegfs
>>>>>>>>>>>>>>>>>>Type:
Distributed-Replicate
>>>>>>>>>>>>>>>>>>Volume
ID: 1e32672a-f1b7-4b58-ba94-58c085e59071
>>>>>>>>>>>>>>>>>>Status:
Started
>>>>>>>>>>>>>>>>>>Number
of Bricks: 4 x 2 = 8
>>>>>>>>>>>>>>>>>>Transport-type:
tcp
>>>>>>>>>>>>>>>>>>Bricks:
>>>>>>>>>>>>>>>>>>Brick1:
gfsib01a.corvidtec.com:/data/brick01a/homegfs
>>>>>>>>>>>>>>>>>>Brick2:
gfsib01b.corvidtec.com:/data/brick01b/homegfs
>>>>>>>>>>>>>>>>>>Brick3:
gfsib01a.corvidtec.com:/data/brick02a/homegfs
>>>>>>>>>>>>>>>>>>Brick4:
gfsib01b.corvidtec.com:/data/brick02b/homegfs
>>>>>>>>>>>>>>>>>>Brick5:
gfsib02a.corvidtec.com:/data/brick01a/homegfs
>>>>>>>>>>>>>>>>>>Brick6:
gfsib02b.corvidtec.com:/data/brick01b/homegfs
>>>>>>>>>>>>>>>>>>Brick7:
gfsib02a.corvidtec.com:/data/brick02a/homegfs
>>>>>>>>>>>>>>>>>>Brick8:
gfsib02b.corvidtec.com:/data/brick02b/homegfs
>>>>>>>>>>>>>>>>>>Options
Reconfigured:
>>>>>>>>>>>>>>>>>>performance.io-thread-count:
32
>>>>>>>>>>>>>>>>>>performance.cache-size:
128MB
>>>>>>>>>>>>>>>>>>performance.write-behind-window-size:
128MB
>>>>>>>>>>>>>>>>>>server.allow-insecure:
on
>>>>>>>>>>>>>>>>>>network.ping-timeout:
42
>>>>>>>>>>>>>>>>>>storage.owner-gid:
100
>>>>>>>>>>>>>>>>>>geo-replication.indexing:
off
>>>>>>>>>>>>>>>>>>geo-replication.ignore-pid-check:
on
>>>>>>>>>>>>>>>>>>changelog.changelog:
off
>>>>>>>>>>>>>>>>>>changelog.fsync-interval:
3
>>>>>>>>>>>>>>>>>>changelog.rollover-time:
15
>>>>>>>>>>>>>>>>>>server.manage-gids:
on
>>>>>>>>>>>>>>>>>>diagnostics.client-log-level:
WARNING
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>[root at
gfs01a ~]# rpm -qa | grep gluster
>>>>>>>>>>>>>>>>>>gluster-nagios-common-0.1.1-0.el6.noarch
>>>>>>>>>>>>>>>>>>glusterfs-fuse-3.6.6-1.el6.x86_64
>>>>>>>>>>>>>>>>>>glusterfs-debuginfo-3.6.6-1.el6.x86_64
>>>>>>>>>>>>>>>>>>glusterfs-libs-3.6.6-1.el6.x86_64
>>>>>>>>>>>>>>>>>>glusterfs-geo-replication-3.6.6-1.el6.x86_64
>>>>>>>>>>>>>>>>>>glusterfs-api-3.6.6-1.el6.x86_64
>>>>>>>>>>>>>>>>>>glusterfs-devel-3.6.6-1.el6.x86_64
>>>>>>>>>>>>>>>>>>glusterfs-api-devel-3.6.6-1.el6.x86_64
>>>>>>>>>>>>>>>>>>glusterfs-3.6.6-1.el6.x86_64
>>>>>>>>>>>>>>>>>>glusterfs-cli-3.6.6-1.el6.x86_64
>>>>>>>>>>>>>>>>>>glusterfs-rdma-3.6.6-1.el6.x86_64
>>>>>>>>>>>>>>>>>>samba-vfs-glusterfs-4.1.11-2.el6.x86_64
>>>>>>>>>>>>>>>>>>glusterfs-server-3.6.6-1.el6.x86_64
>>>>>>>>>>>>>>>>>>glusterfs-extra-xlators-3.6.6-1.el6.x86_64
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>_______________________________________________
>>>>>>>>>>>>>>>>Gluster-devel
mailing list
>>>>>>>>>>>>>>>>Gluster-devel at
gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-devel
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>_______________________________________________
>>>>>>>>>>>>>>>Gluster-users
mailing list
>>>>>>>>>>>>>>>Gluster-users at
gluster.org
>>>>>>>>>>>>>>>http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160125/4018f0a2/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: data-brick02a-homegfs.4066.dump.1453742225.gz
Type: application/x-gzip
Size: 1138050 bytes
Desc: not available
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160125/4018f0a2/attachment-0002.gz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: data-brick01a-homegfs.4061.dump.1453742224.gz
Type: application/x-gzip
Size: 640151 bytes
Desc: not available
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160125/4018f0a2/attachment-0003.gz>

Gluster users - Jan 2016 - [Gluster-devel] heal hanging

[Gluster-users] [Gluster-devel] heal hanging

[Gluster-users] [Gluster-devel] heal hanging