thr3ads.net - Gluster users - [Gluster-users] server hangs with gluster-3.7.4 [Oct 2015]

If this information is useful, please help other people find it:
Share via:
Alexander Zubkov
2015-Oct-30 21:29 UTC
[Gluster-users] server hangs with gluster-3.7.4

Hello!
I have tried to google for similar problems, but have not found any 
relevant or helpful information.
I have server with gluster-3.7.4 installed. Now it is configured as 3 
pairs of 2 replicas on 6 local hard drives. In current setup it writes 
log files from several network sources in directory structure like: 
process/server/month/hour.log. Periodically other process traverses the 
tree and compress relatively old files. The problem is that server is 
hung up at different intervals - it can work for 2 weeks without a 
problem or it can hung several days in a row. Looks like it bacame 
IO-hung, because kernel responses to pings and to ssh port, but I can 
not login to it or do anything else. I do not know how to properly debug 
it. Can somebody help with it?
Originally bricks was located on ext4 filesystem, I have tried to change 
it to xfs, but it does not helped. I have setup netconsole logging from 
kernel, file is attached. Here is some additional information:

# gluster --version
glusterfs 3.7.4 built on Sep 19 2015 11:44:12
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU 
General Public License.

# gluster volume info gv0

Volume Name: gv0
Type: Distributed-Replicate
Volume ID: b3167dd1-dbc1-48dd-8c8e-ca56a37f78a8
Status: Started
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: log0:/data/brick/b/gv0
Brick2: log0:/data/brick/a/gv0
Brick3: log0:/data/brick/d/gv0
Brick4: log0:/data/brick/c/gv0
Brick5: log0:/data/brick/e/gv0
Brick6: log0:/data/brick/f/gv0
Options Reconfigured:
performance.readdir-ahead: on
cluster.self-heal-daemon: enable

# gluster volume status gv0
Status of volume: gv0
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick log0:/data/brick/b/gv0                49163     0          Y 
  4385
Brick log0:/data/brick/a/gv0                49166     0          Y 
  4407
Brick log0:/data/brick/d/gv0                49164     0          Y 
  4397
Brick log0:/data/brick/c/gv0                49167     0          Y 
  4418
Brick log0:/data/brick/e/gv0                49165     0          Y 
  4379
Brick log0:/data/brick/f/gv0                49168     0          Y 
  4391
NFS Server on localhost                     N/A       N/A        N 
  N/A
Self-heal Daemon on localhost               N/A       N/A        Y 
  4358

Task Status of Volume gv0
------------------------------------------------------------------------------
There are no active volume tasks

# uname -a
Linux log0 4.1.4-hardened #1 SMP Fri Aug 14 10:32:50 MSK 2015 x86_64 
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz GenuineIntel GNU/Linux
(it is gentoo-hardened kernel)

# cat /proc/mounts | grep /data/brick
/dev/sda4 /data/brick/a xfs rw,noatime,attr2,inode64,noquota 0 0
/dev/sdb4 /data/brick/b xfs rw,noatime,attr2,inode64,noquota 0 0
/dev/sdc4 /data/brick/c xfs rw,noatime,attr2,inode64,noquota 0 0
/dev/sdd4 /data/brick/d xfs rw,noatime,attr2,inode64,noquota 0 0
/dev/sde4 /data/brick/e xfs rw,noatime,attr2,inode64,noquota 0 0
/dev/sdf4 /data/brick/f xfs rw,noatime,attr2,inode64,noquota 0 0
/dev/sdg4 /data/brick/g xfs rw,noatime,attr2,inode64,noquota 0 0

I can provide additional information if needed.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: logserver.log
Type: text/x-log
Size: 59545 bytes
Desc: not available
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20151031/30c2c82d/attachment.bin>
Gluster users - Oct 2015 - server hangs with gluster-3.7.4

[Gluster-users] server hangs with gluster-3.7.4