thr3ads.net - Gluster users - [Gluster-users] glusterfs client crashes [Feb 2016]

If this information is useful, please help other people find it:
Share via:

Gaurav Garg

2016-Feb-21 18:27 UTC

[Gluster-users] glusterfs client crashes

Hi Dj,

Its seems that your brick process are offline or all brick process have crashed.
Could you paste output of #gluster volume status   and #gluster volume info
command and attach core file.

ccing dht-team member.

Thanks,

~Gaurav



----- Original Message -----
From: "Dj Merrill" <gluster at deej.net>
To: gluster-users at gluster.org
Sent: Sunday, February 21, 2016 10:37:02 PM
Subject: [Gluster-users] glusterfs client crashes

Several weeks ago we started seeing some weird behaviour on our Gluster 
client systems.  Things would be working fine for several days, then the 
client could no longer access the Gluster filesystems, giving an error:

ls: cannot access /mnt/hpc: Transport endpoint is not connected

We were running version 3.7.6 and this version had been working fine for 
a few months until the above started happening.  Thinking that it may be 
an OS or kernel update causing the issue, when 3.7.8 came out, we 
upgraded in hopes that the issue might be addressed, but we are still 
getting having the issue.

All client machines are running Centos 7.2 with the latest updates, and 
the problem is happening on several machines.  Not every Gluster client 
machine has had the problem, but enough different machines to make us 
think that this is more of a generic issue versus one that only affects 
specific types of machines (Both Intel and AMD CPUs, different system 
manufacturers, etc).

The log file included below from /var/log/glusterfs seems to be showing 
a crash of the glusterfs process if I am interpreting it correctly.  At 
the top you can see an entry made on the 17th, then no further entries 
until the crash today on the 21st.

We would greatly appreciate any help in tracking down the cause and 
possible fix for this.

The only way to temporarily "fix" the machines seems to be a reboot, 
which allows the machines to work properly for a few days before the 
issue happens again (random amount of days, no pattern).


[2016-02-17 23:56:39.685754] I [MSGID: 109036] 
[dht-common.c:8043:dht_log_new_layout_for_dir_selfheal] 0-gv0-dht: 
Setting layout of /tmp/ktreraya/gms-scr/tmp/123277 with [Subvol_name: 
gv0-replicate-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ],
pending frames:
frame : type(1) op(GETXATTR)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(1) op(GETXATTR)
frame : type(1) op(GETXATTR)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(1) op(GETXATTR)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(1) op(GETXATTR)
frame : type(0) op(0)
frame : type(1) op(GETXATTR)
frame : type(0) op(0)
frame : type(1) op(GETXATTR)
frame : type(0) op(0)
frame : type(1) op(GETXATTR)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 6
time of crash:
2016-02-21 08:10:40
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.7.8
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7ff56ddcd042]
/lib64/libglusterfs.so.0(gf_print_trace+0x31d)[0x7ff56dde950d]
/lib64/libc.so.6(+0x35670)[0x7ff56c4bb670]
/lib64/libc.so.6(gsignal+0x37)[0x7ff56c4bb5f7]
/lib64/libc.so.6(abort+0x148)[0x7ff56c4bcce8]
/lib64/libc.so.6(+0x75317)[0x7ff56c4fb317]
/lib64/libc.so.6(+0x7d023)[0x7ff56c503023]
/usr/lib64/glusterfs/3.7.8/xlator/protocol/client.so(client_local_wipe+0x39)[0x7ff5600a46b9]
/usr/lib64/glusterfs/3.7.8/xlator/protocol/client.so(client3_3_getxattr_cbk+0x182)[0x7ff5600a7f62]
/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0x90)[0x7ff56db9ba20]
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1bf)[0x7ff56db9bcdf]
/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7ff56db97823]
/usr/lib64/glusterfs/3.7.8/rpc-transport/socket.so(+0x6636)[0x7ff5627a8636]
/usr/lib64/glusterfs/3.7.8/rpc-transport/socket.so(+0x9294)[0x7ff5627ab294]
/lib64/libglusterfs.so.0(+0x878ea)[0x7ff56de2e8ea]
/lib64/libpthread.so.0(+0x7dc5)[0x7ff56cc35dc5]
/lib64/libc.so.6(clone+0x6d)[0x7ff56c57c28d]


Thank you,

-Dj
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Gaurav Garg

2016-Feb-21 18:28 UTC

head link

[Gluster-users] glusterfs client crashes

Hi Dj,

Its seems that your brick process are offline or all brick process have crashed.
Could you paste output of #gluster volume status   and #gluster volume info
command and attach core file.

ccing dht-team member.

Thanks,

~Gaurav



----- Original Message -----
From: "Dj Merrill" <gluster at deej.net>
To: gluster-users at gluster.org
Sent: Sunday, February 21, 2016 10:37:02 PM
Subject: [Gluster-users] glusterfs client crashes

Several weeks ago we started seeing some weird behaviour on our Gluster 
client systems.  Things would be working fine for several days, then the 
client could no longer access the Gluster filesystems, giving an error:

ls: cannot access /mnt/hpc: Transport endpoint is not connected

We were running version 3.7.6 and this version had been working fine for 
a few months until the above started happening.  Thinking that it may be 
an OS or kernel update causing the issue, when 3.7.8 came out, we 
upgraded in hopes that the issue might be addressed, but we are still 
getting having the issue.

All client machines are running Centos 7.2 with the latest updates, and 
the problem is happening on several machines.  Not every Gluster client 
machine has had the problem, but enough different machines to make us 
think that this is more of a generic issue versus one that only affects 
specific types of machines (Both Intel and AMD CPUs, different system 
manufacturers, etc).

The log file included below from /var/log/glusterfs seems to be showing 
a crash of the glusterfs process if I am interpreting it correctly.  At 
the top you can see an entry made on the 17th, then no further entries 
until the crash today on the 21st.

We would greatly appreciate any help in tracking down the cause and 
possible fix for this.

The only way to temporarily "fix" the machines seems to be a reboot, 
which allows the machines to work properly for a few days before the 
issue happens again (random amount of days, no pattern).


[2016-02-17 23:56:39.685754] I [MSGID: 109036] 
[dht-common.c:8043:dht_log_new_layout_for_dir_selfheal] 0-gv0-dht: 
Setting layout of /tmp/ktreraya/gms-scr/tmp/123277 with [Subvol_name: 
gv0-replicate-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ],
pending frames:
frame : type(1) op(GETXATTR)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(1) op(GETXATTR)
frame : type(1) op(GETXATTR)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(1) op(GETXATTR)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(1) op(GETXATTR)
frame : type(0) op(0)
frame : type(1) op(GETXATTR)
frame : type(0) op(0)
frame : type(1) op(GETXATTR)
frame : type(0) op(0)
frame : type(1) op(GETXATTR)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 6
time of crash:
2016-02-21 08:10:40
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.7.8
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7ff56ddcd042]
/lib64/libglusterfs.so.0(gf_print_trace+0x31d)[0x7ff56dde950d]
/lib64/libc.so.6(+0x35670)[0x7ff56c4bb670]
/lib64/libc.so.6(gsignal+0x37)[0x7ff56c4bb5f7]
/lib64/libc.so.6(abort+0x148)[0x7ff56c4bcce8]
/lib64/libc.so.6(+0x75317)[0x7ff56c4fb317]
/lib64/libc.so.6(+0x7d023)[0x7ff56c503023]
/usr/lib64/glusterfs/3.7.8/xlator/protocol/client.so(client_local_wipe+0x39)[0x7ff5600a46b9]
/usr/lib64/glusterfs/3.7.8/xlator/protocol/client.so(client3_3_getxattr_cbk+0x182)[0x7ff5600a7f62]
/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0x90)[0x7ff56db9ba20]
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1bf)[0x7ff56db9bcdf]
/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7ff56db97823]
/usr/lib64/glusterfs/3.7.8/rpc-transport/socket.so(+0x6636)[0x7ff5627a8636]
/usr/lib64/glusterfs/3.7.8/rpc-transport/socket.so(+0x9294)[0x7ff5627ab294]
/lib64/libglusterfs.so.0(+0x878ea)[0x7ff56de2e8ea]
/lib64/libpthread.so.0(+0x7dc5)[0x7ff56cc35dc5]
/lib64/libc.so.6(clone+0x6d)[0x7ff56c57c28d]


Thank you,

-Dj
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Dj Merrill

2016-Feb-21 19:23 UTC

head link

[Gluster-users] glusterfs client crashes

On 2/21/2016 1:27 PM, Gaurav Garg wrote:> Its seems that your brick process are offline or all brick process have
crashed. Could you paste output of #gluster volume status   and #gluster volume
info command and attach core file.

Very interesting.  They were reporting both bricks offline, but the 
processes on both servers were still running.  Restarting glusterfsd on 
one of the servers brought them both back online.

I am going to have to take a closer look at the logs on the servers.

Even after bringing them back up, the client is still reporting 
"Transport endpoint is not connected".  Is there anything other than a
reboot that will change this state on the client?


# gluster volume status
Status of volume: gv0
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick glusterfs1:/export/brick1/sdb1        49152     0          Y 
  15073
Brick glusterfs2:/export/brick1/sdb1        49152     0          Y 
  14068
Self-heal Daemon on localhost               N/A       N/A        Y 
  14063
Self-heal Daemon on glusterfs1              N/A       N/A        Y 
  7732

Task Status of Volume gv0
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: gv1
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick glusterfs1:/export/brick2/sdb2        49154     0          Y 
  15089
Brick glusterfs2:/export/brick2/sdb2        49157     0          Y 
  14073
Self-heal Daemon on localhost               N/A       N/A        Y 
  14063
Self-heal Daemon on glusterfs1              N/A       N/A        Y 
  7732

Task Status of Volume gv1
------------------------------------------------------------------------------
There are no active volume tasks


# gluster volume info

Volume Name: gv0
Type: Replicate
Volume ID: 1d31ea3c-a240-49fe-a68d-4218ac051b6d
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: glusterfs1:/export/brick1/sdb1
Brick2: glusterfs2:/export/brick1/sdb1
Options Reconfigured:
performance.cache-max-file-size: 750MB
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
features.quota-timeout: 30
features.quota: off
performance.io-thread-count: 16
performance.write-behind-window-size: 1GB
performance.cache-size: 1GB
nfs.volume-access: read-only
nfs.disable: on
cluster.self-heal-daemon: enable

Volume Name: gv1
Type: Replicate
Volume ID: 7127b90b-e208-4aea-a920-4db195295d7a
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: glusterfs1:/export/brick2/sdb2
Brick2: glusterfs2:/export/brick2/sdb2
Options Reconfigured:
performance.cache-size: 1GB
performance.write-behind-window-size: 1GB
nfs.disable: on
nfs.volume-access: read-only
performance.cache-max-file-size: 750MB
cluster.self-heal-daemon: enable


-Dj

Gluster users - Feb 2016 - glusterfs client crashes

[Gluster-users] glusterfs client crashes

[Gluster-users] glusterfs client crashes

[Gluster-users] glusterfs client crashes