thr3ads.net - Gluster users - [Gluster-users] glusterfs client crashes [Feb 2016]

If this information is useful, please help other people find it:
Share via:

Fredrik Widlund

2016-Feb-23 12:21 UTC

[Gluster-users] glusterfs client crashes

Hi,

I have experienced what looks like a very similar crash. Gluster 3.7.6 on
CentOS 7. No errors on the bricks or on other at the time mounted clients.
Relatively high load at the time.

Remounting the filesystem brought it back online.

pending frames:
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 6
time of crash:
2016-02-22 10:28:45
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.7.6
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7f83387f7012]
/lib64/libglusterfs.so.0(gf_print_trace+0x31d)[0x7f83388134dd]
/lib64/libc.so.6(+0x35670)[0x7f8336ee5670]
/lib64/libc.so.6(gsignal+0x37)[0x7f8336ee55f7]
/lib64/libc.so.6(abort+0x148)[0x7f8336ee6ce8]
/lib64/libc.so.6(+0x75317)[0x7f8336f25317]
/lib64/libc.so.6(+0x7cfe1)[0x7f8336f2cfe1]
/lib64/libglusterfs.so.0(loc_wipe+0x27)[0x7f83387f4d47]
/usr/lib64/glusterfs/3.7.6/xlator/performance/md-cache.so(mdc_local_wipe+0x11)[0x7f8329c8e5f1]
/usr/lib64/glusterfs/3.7.6/xlator/performance/md-cache.so(mdc_stat_cbk+0x10c)[0x7f8329c8f4fc]
/lib64/libglusterfs.so.0(default_stat_cbk+0xac)[0x7f83387fcc5c]
/usr/lib64/glusterfs/3.7.6/xlator/cluster/distribute.so(dht_file_attr_cbk+0x149)[0x7f832ab2a409]
/usr/lib64/glusterfs/3.7.6/xlator/protocol/client.so(client3_3_stat_cbk+0x3c6)[0x7f832ad6d266]
/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0x90)[0x7f83385c5b80]
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1bf)[0x7f83385c5e3f]
/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f83385c1983]
/usr/lib64/glusterfs/3.7.6/rpc-transport/socket.so(+0x9506)[0x7f832d261506]
/usr/lib64/glusterfs/3.7.6/rpc-transport/socket.so(+0xc3f4)[0x7f832d2643f4]
/lib64/libglusterfs.so.0(+0x878ea)[0x7f83388588ea]
/lib64/libpthread.so.0(+0x7dc5)[0x7f833765fdc5]
/lib64/libc.so.6(clone+0x6d)[0x7f8336fa621d]

Kind regards,
Fredrik Widlund

On Tue, Feb 23, 2016 at 1:00 PM, <gluster-users-request at gluster.org>
wrote:
> Date: Mon, 22 Feb 2016 15:08:47 -0500
> From: Dj Merrill <gluster at deej.net>
> To: Gaurav Garg <ggarg at redhat.com>
> Cc: gluster-users at gluster.org
> Subject: Re: [Gluster-users] glusterfs client crashes
> Message-ID: <56CB6ACF.5080408 at deej.net>
> Content-Type: text/plain; charset=utf-8; format=flowed
>
> On 2/21/2016 2:23 PM, Dj Merrill wrote:
>  > Very interesting.  They were reporting both bricks offline, but the
>  > processes on both servers were still running.  Restarting glusterfsd
on
>  > one of the servers brought them both back online.
>
> I realize I wasn't clear in my comments yesterday and would like to
> elaborate on this a bit further. The "very interesting" comment
was
> sparked because when we were running 3.7.6, the bricks were not
> reporting as offline when a client was having an issue, so this is new
> behaviour now that we are running 3.7.8 (or a different issue entirely).
>
> The other point that I was not clear on is that we may have one client
> reporting the "Transport endpoint is not connected" error, but
the other
> 40+ clients all continue to work properly. This is the case with both
> 3.7.6 and 3.7.8.
>
> Curious, how can the other clients continue to work fine if both Gluster
> 3.7.8 servers are reporting the bricks as offline?
>
> What does "offline" mean in this context?
>
>
> Re: the server logs, here is what I've found so far listed on both
> gluster servers (glusterfs1 and glusterfs2):
>
> [2016-02-21 08:06:02.785788] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk]
> 0-glusterfs: No change in volfile, continuing
> [2016-02-21 18:48:20.677010] W [socket.c:588:__socket_rwv]
> 0-gv0-client-1: readv on (sanitized IP of glusterfs2):49152 failed (No
> data available)
> [2016-02-21 18:48:20.677096] I [MSGID: 114018]
> [client.c:2030:client_rpc_notify] 0-gv0-client-1: disconnected from
> gv0-client-1. Client process will keep trying to connect to glusterd
> until brick's port is available
> [2016-02-21 18:48:31.148564] E [MSGID: 114058]
> [client-handshake.c:1524:client_query_portmap_cbk] 0-gv0-client-1:
> failed to get the port number for remote subvolume. Please run 'gluster
> volume status' on server to see if brick process is running.
> [2016-02-21 18:48:40.941715] W [socket.c:588:__socket_rwv] 0-glusterfs:
> readv on (sanitized IP of glusterfs2):24007 failed (No data available)
> [2016-02-21 18:48:51.184424] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk]
> 0-glusterfs: No change in volfile, continuing
> [2016-02-21 18:48:51.972068] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec]
> 0-mgmt: Volume file changed
> [2016-02-21 18:48:51.980210] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec]
> 0-mgmt: Volume file changed
> [2016-02-21 18:48:51.985211] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec]
> 0-mgmt: Volume file changed
> [2016-02-21 18:48:51.995002] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec]
> 0-mgmt: Volume file changed
> [2016-02-21 18:48:53.006079] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk]
> 0-glusterfs: No change in volfile, continuing
> [2016-02-21 18:48:53.018104] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk]
> 0-glusterfs: No change in volfile, continuing
> [2016-02-21 18:48:53.024060] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk]
> 0-glusterfs: No change in volfile, continuing
> [2016-02-21 18:48:53.035170] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk]
> 0-glusterfs: No change in volfile, continuing
> [2016-02-21 18:48:53.045637] I [rpc-clnt.c:1847:rpc_clnt_reconfig]
> 0-gv0-client-1: changing port to 49152 (from 0)
> [2016-02-21 18:48:53.051991] I [MSGID: 114057]
> [client-handshake.c:1437:select_server_supported_programs]
> 0-gv0-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
> [2016-02-21 18:48:53.052439] I [MSGID: 114046]
> [client-handshake.c:1213:client_setvolume_cbk] 0-gv0-client-1: Connected
> to gv0-client-1, attached to remote volume '/export/brick1/sdb1'.
> [2016-02-21 18:48:53.052486] I [MSGID: 114047]
> [client-handshake.c:1224:client_setvolume_cbk] 0-gv0-client-1: Server
> and Client lk-version numbers are not same, reopening the fds
> [2016-02-21 18:48:53.052668] I [MSGID: 114035]
> [client-handshake.c:193:client_set_lk_version_cbk] 0-gv0-client-1:
> Server lk version = 1
> [2016-02-21 18:48:31.148706] I [MSGID: 114018]
> [client.c:2030:client_rpc_notify] 0-gv0-client-1: disconnected from
> gv0-client-1. Client process will keep trying to connect to glusterd
> until brick's port is available
> [2016-02-21 18:49:12.271865] W [socket.c:588:__socket_rwv] 0-glusterfs:
> readv on (sanitized IP of glusterfs2):24007 failed (No data available)
> [2016-02-21 18:49:15.637745] W [socket.c:588:__socket_rwv]
> 0-gv0-client-1: readv on (sanitized IP of glusterfs2):49152 failed (No
> data available)
> [2016-02-21 18:49:15.637824] I [MSGID: 114018]
> [client.c:2030:client_rpc_notify] 0-gv0-client-1: disconnected from
> gv0-client-1. Client process will keep trying to connect to glusterd
> until brick's port is available
> [2016-02-21 18:49:24.198431] E [socket.c:2278:socket_connect_finish]
> 0-glusterfs: connection to (sanitized IP of glusterfs2):24007 failed
> (Connection refused)
> [2016-02-21 18:49:26.204811] E [socket.c:2278:socket_connect_finish]
> 0-gv0-client-1: connection to (sanitized IP of glusterfs2):24007 failed
> (Connection refused)
> [2016-02-21 18:49:38.366559] I [MSGID: 108031]
> [afr-common.c:1883:afr_local_discovery_cbk] 0-gv0-replicate-0: selecting
> local read_child gv0-client-0
> [2016-02-21 18:50:54.605535] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk]
> 0-glusterfs: No change in volfile, continuing
> [2016-02-21 18:50:54.605639] E [MSGID: 114058]
> [client-handshake.c:1524:client_query_portmap_cbk] 0-gv0-client-1:
> failed to get the port number for remote subvolume. Please run 'gluster
> volume status' on server to see if brick process is running.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160223/ff90d3f9/attachment.html>

Gaurav Garg

2016-Feb-23 13:00 UTC

head link

[Gluster-users] glusterfs client crashes

ccing md-chache team member for this issue.

Thanks,
~Gaurav

----- Original Message -----
From: "Fredrik Widlund" <fredrik.widlund at gmail.com>
To: gluster at deej.net
Cc: gluster-users at gluster.org
Sent: Tuesday, February 23, 2016 5:51:37 PM
Subject: Re: [Gluster-users] glusterfs client crashes

Hi, 

I have experienced what looks like a very similar crash. Gluster 3.7.6 on CentOS
7. No errors on the bricks or on other at the time mounted clients. Relatively
high load at the time.

Remounting the filesystem brought it back online. 

pending frames: 
frame : type(1) op(READ) 
frame : type(1) op(READ) 
frame : type(1) op(READ) 
frame : type(1) op(READ) 
frame : type(1) op(READ) 
frame : type(1) op(READ) 
frame : type(1) op(READ) 
frame : type(1) op(READ) 
frame : type(1) op(READ) 
frame : type(1) op(READ) 
frame : type(1) op(READ) 
frame : type(1) op(READ) 
frame : type(1) op(READ) 
frame : type(1) op(READ) 
frame : type(1) op(READ) 
frame : type(1) op(READ) 
frame : type(1) op(READ) 
frame : type(1) op(READ) 
frame : type(1) op(READ) 
frame : type(1) op(READ) 
frame : type(1) op(READ) 
frame : type(1) op(STAT) 
frame : type(1) op(STAT) 
frame : type(1) op(READ) 
frame : type(1) op(READ) 
frame : type(1) op(READ) 
frame : type(1) op(READ) 
frame : type(1) op(READ) 
frame : type(1) op(READ) 
frame : type(1) op(READ) 
frame : type(0) op(0) 
patchset: git:// git.gluster.com/glusterfs.git 
signal received: 6 
time of crash: 
2016-02-22 10:28:45 
configuration details: 
argp 1 
backtrace 1 
dlfcn 1 
libpthread 1 
llistxattr 1 
setfsid 1 
spinlock 1 
epoll.h 1 
xattr.h 1 
st_atim.tv_nsec 1 
package-string: glusterfs 3.7.6 
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7f83387f7012] 
/lib64/libglusterfs.so.0(gf_print_trace+0x31d)[0x7f83388134dd] 
/lib64/libc.so.6(+0x35670)[0x7f8336ee5670] 
/lib64/libc.so.6(gsignal+0x37)[0x7f8336ee55f7] 
/lib64/libc.so.6(abort+0x148)[0x7f8336ee6ce8] 
/lib64/libc.so.6(+0x75317)[0x7f8336f25317] 
/lib64/libc.so.6(+0x7cfe1)[0x7f8336f2cfe1] 
/lib64/libglusterfs.so.0(loc_wipe+0x27)[0x7f83387f4d47] 
/usr/lib64/glusterfs/3.7.6/xlator/performance/md-cache.so(mdc_local_wipe+0x11)[0x7f8329c8e5f1]
/usr/lib64/glusterfs/3.7.6/xlator/performance/md-cache.so(mdc_stat_cbk+0x10c)[0x7f8329c8f4fc]
/lib64/libglusterfs.so.0(default_stat_cbk+0xac)[0x7f83387fcc5c] 
/usr/lib64/glusterfs/3.7.6/xlator/cluster/distribute.so(dht_file_attr_cbk+0x149)[0x7f832ab2a409]
/usr/lib64/glusterfs/3.7.6/xlator/protocol/client.so(client3_3_stat_cbk+0x3c6)[0x7f832ad6d266]
/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0x90)[0x7f83385c5b80] 
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1bf)[0x7f83385c5e3f] 
/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f83385c1983] 
/usr/lib64/glusterfs/3.7.6/rpc-transport/socket.so(+0x9506)[0x7f832d261506] 
/usr/lib64/glusterfs/3.7.6/rpc-transport/socket.so(+0xc3f4)[0x7f832d2643f4] 
/lib64/libglusterfs.so.0(+0x878ea)[0x7f83388588ea] 
/lib64/libpthread.so.0(+0x7dc5)[0x7f833765fdc5] 
/lib64/libc.so.6(clone+0x6d)[0x7f8336fa621d] 

Kind regards, 
Fredrik Widlund 

On Tue, Feb 23, 2016 at 1:00 PM, < gluster-users-request at gluster.org >
wrote:

Date: Mon, 22 Feb 2016 15:08:47 -0500 
From: Dj Merrill < gluster at deej.net > 
To: Gaurav Garg < ggarg at redhat.com > 
Cc: gluster-users at gluster.org 
Subject: Re: [Gluster-users] glusterfs client crashes 
Message-ID: < 56CB6ACF.5080408 at deej.net > 
Content-Type: text/plain; charset=utf-8; format=flowed 

On 2/21/2016 2:23 PM, Dj Merrill wrote: > Very interesting. They were reporting both bricks offline, but the 
> processes on both servers were still running. Restarting glusterfsd on 
> one of the servers brought them both back online. 
I realize I wasn't clear in my comments yesterday and would like to 
elaborate on this a bit further. The "very interesting" comment was 
sparked because when we were running 3.7.6, the bricks were not 
reporting as offline when a client was having an issue, so this is new 
behaviour now that we are running 3.7.8 (or a different issue entirely). 

The other point that I was not clear on is that we may have one client 
reporting the "Transport endpoint is not connected" error, but the
other
40+ clients all continue to work properly. This is the case with both 
3.7.6 and 3.7.8. 

Curious, how can the other clients continue to work fine if both Gluster 
3.7.8 servers are reporting the bricks as offline? 

What does "offline" mean in this context? 

Re: the server logs, here is what I've found so far listed on both 
gluster servers (glusterfs1 and glusterfs2): 

[2016-02-21 08:06:02.785788] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk] 
0-glusterfs: No change in volfile, continuing 
[2016-02-21 18:48:20.677010] W [socket.c:588:__socket_rwv] 
0-gv0-client-1: readv on (sanitized IP of glusterfs2):49152 failed (No 
data available) 
[2016-02-21 18:48:20.677096] I [MSGID: 114018] 
[client.c:2030:client_rpc_notify] 0-gv0-client-1: disconnected from 
gv0-client-1. Client process will keep trying to connect to glusterd 
until brick's port is available 
[2016-02-21 18:48:31.148564] E [MSGID: 114058] 
[client-handshake.c:1524:client_query_portmap_cbk] 0-gv0-client-1: 
failed to get the port number for remote subvolume. Please run 'gluster 
volume status' on server to see if brick process is running. 
[2016-02-21 18:48:40.941715] W [socket.c:588:__socket_rwv] 0-glusterfs: 
readv on (sanitized IP of glusterfs2):24007 failed (No data available) 
[2016-02-21 18:48:51.184424] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk] 
0-glusterfs: No change in volfile, continuing 
[2016-02-21 18:48:51.972068] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec] 
0-mgmt: Volume file changed 
[2016-02-21 18:48:51.980210] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec] 
0-mgmt: Volume file changed 
[2016-02-21 18:48:51.985211] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec] 
0-mgmt: Volume file changed 
[2016-02-21 18:48:51.995002] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec] 
0-mgmt: Volume file changed 
[2016-02-21 18:48:53.006079] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk] 
0-glusterfs: No change in volfile, continuing 
[2016-02-21 18:48:53.018104] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk] 
0-glusterfs: No change in volfile, continuing 
[2016-02-21 18:48:53.024060] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk] 
0-glusterfs: No change in volfile, continuing 
[2016-02-21 18:48:53.035170] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk] 
0-glusterfs: No change in volfile, continuing 
[2016-02-21 18:48:53.045637] I [rpc-clnt.c:1847:rpc_clnt_reconfig] 
0-gv0-client-1: changing port to 49152 (from 0) 
[2016-02-21 18:48:53.051991] I [MSGID: 114057] 
[client-handshake.c:1437:select_server_supported_programs] 
0-gv0-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) 
[2016-02-21 18:48:53.052439] I [MSGID: 114046] 
[client-handshake.c:1213:client_setvolume_cbk] 0-gv0-client-1: Connected 
to gv0-client-1, attached to remote volume '/export/brick1/sdb1'. 
[2016-02-21 18:48:53.052486] I [MSGID: 114047] 
[client-handshake.c:1224:client_setvolume_cbk] 0-gv0-client-1: Server 
and Client lk-version numbers are not same, reopening the fds 
[2016-02-21 18:48:53.052668] I [MSGID: 114035] 
[client-handshake.c:193:client_set_lk_version_cbk] 0-gv0-client-1: 
Server lk version = 1 
[2016-02-21 18:48:31.148706] I [MSGID: 114018] 
[client.c:2030:client_rpc_notify] 0-gv0-client-1: disconnected from 
gv0-client-1. Client process will keep trying to connect to glusterd 
until brick's port is available 
[2016-02-21 18:49:12.271865] W [socket.c:588:__socket_rwv] 0-glusterfs: 
readv on (sanitized IP of glusterfs2):24007 failed (No data available) 
[2016-02-21 18:49:15.637745] W [socket.c:588:__socket_rwv] 
0-gv0-client-1: readv on (sanitized IP of glusterfs2):49152 failed (No 
data available) 
[2016-02-21 18:49:15.637824] I [MSGID: 114018] 
[client.c:2030:client_rpc_notify] 0-gv0-client-1: disconnected from 
gv0-client-1. Client process will keep trying to connect to glusterd 
until brick's port is available 
[2016-02-21 18:49:24.198431] E [socket.c:2278:socket_connect_finish] 
0-glusterfs: connection to (sanitized IP of glusterfs2):24007 failed 
(Connection refused) 
[2016-02-21 18:49:26.204811] E [socket.c:2278:socket_connect_finish] 
0-gv0-client-1: connection to (sanitized IP of glusterfs2):24007 failed 
(Connection refused) 
[2016-02-21 18:49:38.366559] I [MSGID: 108031] 
[afr-common.c:1883:afr_local_discovery_cbk] 0-gv0-replicate-0: selecting 
local read_child gv0-client-0 
[2016-02-21 18:50:54.605535] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk] 
0-glusterfs: No change in volfile, continuing 
[2016-02-21 18:50:54.605639] E [MSGID: 114058] 
[client-handshake.c:1524:client_query_portmap_cbk] 0-gv0-client-1: 
failed to get the port number for remote subvolume. Please run 'gluster 
volume status' on server to see if brick process is running. 

_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Raghavendra Gowdappa

2016-Feb-23 15:27 UTC

head link

[Gluster-users] glusterfs client crashes

Came across a glibc bug which could've caused some corruptions. On googling
about possible problems, we found that there is an issue
(https://bugzilla.redhat.com/show_bug.cgi?id=1305406) fixed in
glibc-2.17-121.el7. From the bug we found the following test-script to determine
if the glibc is buggy. And on running it, we ran it on the local setup using the
following method given in the bug:

----------------
# objdump -r -d /lib64/libc.so.6 | grep -C 20 _int_free | grep -C 10 cmpxchg |
head -21 | grep -A 3 cmpxchg | tail -1 | (grep '%r' && echo
"Your libc is likely buggy." || echo "Your libc looks OK.")

   7cc36:    48 85 c9                 test   %rcx,%rcx
Your libc is likely buggy.
----------------

Could you check if the above command on your setup gives the same output which
says "Your libc is likely buggy."

Thanks to Nithya, Krutika and Pranith for working on this.

----- Original Message -----> From: "Fredrik Widlund" <fredrik.widlund at gmail.com>
> To: gluster at deej.net
> Cc: gluster-users at gluster.org
> Sent: Tuesday, February 23, 2016 5:51:37 PM
> Subject: Re: [Gluster-users] glusterfs client crashes
> 
> Hi,
> 
> I have experienced what looks like a very similar crash. Gluster 3.7.6 on
> CentOS 7. No errors on the bricks or on other at the time mounted clients.
> Relatively high load at the time.
> 
> Remounting the filesystem brought it back online.
> 
> 
> pending frames:
> frame : type(1) op(READ)
> frame : type(1) op(READ)
> frame : type(1) op(READ)
> frame : type(1) op(READ)
> frame : type(1) op(READ)
> frame : type(1) op(READ)
> frame : type(1) op(READ)
> frame : type(1) op(READ)
> frame : type(1) op(READ)
> frame : type(1) op(READ)
> frame : type(1) op(READ)
> frame : type(1) op(READ)
> frame : type(1) op(READ)
> frame : type(1) op(READ)
> frame : type(1) op(READ)
> frame : type(1) op(READ)
> frame : type(1) op(READ)
> frame : type(1) op(READ)
> frame : type(1) op(READ)
> frame : type(1) op(READ)
> frame : type(1) op(READ)
> frame : type(1) op(STAT)
> frame : type(1) op(STAT)
> frame : type(1) op(READ)
> frame : type(1) op(READ)
> frame : type(1) op(READ)
> frame : type(1) op(READ)
> frame : type(1) op(READ)
> frame : type(1) op(READ)
> frame : type(1) op(READ)
> frame : type(0) op(0)
> patchset: git:// git.gluster.com/glusterfs.git
> signal received: 6
> time of crash:
> 2016-02-22 10:28:45
> configuration details:
> argp 1
> backtrace 1
> dlfcn 1
> libpthread 1
> llistxattr 1
> setfsid 1
> spinlock 1
> epoll.h 1
> xattr.h 1
> st_atim.tv_nsec 1
> package-string: glusterfs 3.7.6
> /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7f83387f7012]
> /lib64/libglusterfs.so.0(gf_print_trace+0x31d)[0x7f83388134dd]
> /lib64/libc.so.6(+0x35670)[0x7f8336ee5670]
> /lib64/libc.so.6(gsignal+0x37)[0x7f8336ee55f7]
> /lib64/libc.so.6(abort+0x148)[0x7f8336ee6ce8]
> /lib64/libc.so.6(+0x75317)[0x7f8336f25317]
> /lib64/libc.so.6(+0x7cfe1)[0x7f8336f2cfe1]
> /lib64/libglusterfs.so.0(loc_wipe+0x27)[0x7f83387f4d47]
>
/usr/lib64/glusterfs/3.7.6/xlator/performance/md-cache.so(mdc_local_wipe+0x11)[0x7f8329c8e5f1]
>
/usr/lib64/glusterfs/3.7.6/xlator/performance/md-cache.so(mdc_stat_cbk+0x10c)[0x7f8329c8f4fc]
> /lib64/libglusterfs.so.0(default_stat_cbk+0xac)[0x7f83387fcc5c]
>
/usr/lib64/glusterfs/3.7.6/xlator/cluster/distribute.so(dht_file_attr_cbk+0x149)[0x7f832ab2a409]
>
/usr/lib64/glusterfs/3.7.6/xlator/protocol/client.so(client3_3_stat_cbk+0x3c6)[0x7f832ad6d266]
> /lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0x90)[0x7f83385c5b80]
> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x1bf)[0x7f83385c5e3f]
> /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f83385c1983]
> /usr/lib64/glusterfs/3.7.6/rpc-transport/socket.so(+0x9506)[0x7f832d261506]
> /usr/lib64/glusterfs/3.7.6/rpc-transport/socket.so(+0xc3f4)[0x7f832d2643f4]
> /lib64/libglusterfs.so.0(+0x878ea)[0x7f83388588ea]
> /lib64/libpthread.so.0(+0x7dc5)[0x7f833765fdc5]
> /lib64/libc.so.6(clone+0x6d)[0x7f8336fa621d]
> 
> 
> 
> Kind regards,
> Fredrik Widlund
> 
> On Tue, Feb 23, 2016 at 1:00 PM, < gluster-users-request at gluster.org
> wrote:
> 
> 
> Date: Mon, 22 Feb 2016 15:08:47 -0500
> From: Dj Merrill < gluster at deej.net >
> To: Gaurav Garg < ggarg at redhat.com >
> Cc: gluster-users at gluster.org
> Subject: Re: [Gluster-users] glusterfs client crashes
> Message-ID: < 56CB6ACF.5080408 at deej.net >
> Content-Type: text/plain; charset=utf-8; format=flowed
> 
> On 2/21/2016 2:23 PM, Dj Merrill wrote:
> > Very interesting. They were reporting both bricks offline, but the
> > processes on both servers were still running. Restarting glusterfsd on
> > one of the servers brought them both back online.
> 
> I realize I wasn't clear in my comments yesterday and would like to
> elaborate on this a bit further. The "very interesting" comment
was
> sparked because when we were running 3.7.6, the bricks were not
> reporting as offline when a client was having an issue, so this is new
> behaviour now that we are running 3.7.8 (or a different issue entirely).
> 
> The other point that I was not clear on is that we may have one client
> reporting the "Transport endpoint is not connected" error, but
the other
> 40+ clients all continue to work properly. This is the case with both
> 3.7.6 and 3.7.8.
> 
> Curious, how can the other clients continue to work fine if both Gluster
> 3.7.8 servers are reporting the bricks as offline?
> 
> What does "offline" mean in this context?
> 
> 
> Re: the server logs, here is what I've found so far listed on both
> gluster servers (glusterfs1 and glusterfs2):
> 
> [2016-02-21 08:06:02.785788] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk]
> 0-glusterfs: No change in volfile, continuing
> [2016-02-21 18:48:20.677010] W [socket.c:588:__socket_rwv]
> 0-gv0-client-1: readv on (sanitized IP of glusterfs2):49152 failed (No
> data available)
> [2016-02-21 18:48:20.677096] I [MSGID: 114018]
> [client.c:2030:client_rpc_notify] 0-gv0-client-1: disconnected from
> gv0-client-1. Client process will keep trying to connect to glusterd
> until brick's port is available
> [2016-02-21 18:48:31.148564] E [MSGID: 114058]
> [client-handshake.c:1524:client_query_portmap_cbk] 0-gv0-client-1:
> failed to get the port number for remote subvolume. Please run 'gluster
> volume status' on server to see if brick process is running.
> [2016-02-21 18:48:40.941715] W [socket.c:588:__socket_rwv] 0-glusterfs:
> readv on (sanitized IP of glusterfs2):24007 failed (No data available)
> [2016-02-21 18:48:51.184424] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk]
> 0-glusterfs: No change in volfile, continuing
> [2016-02-21 18:48:51.972068] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec]
> 0-mgmt: Volume file changed
> [2016-02-21 18:48:51.980210] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec]
> 0-mgmt: Volume file changed
> [2016-02-21 18:48:51.985211] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec]
> 0-mgmt: Volume file changed
> [2016-02-21 18:48:51.995002] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec]
> 0-mgmt: Volume file changed
> [2016-02-21 18:48:53.006079] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk]
> 0-glusterfs: No change in volfile, continuing
> [2016-02-21 18:48:53.018104] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk]
> 0-glusterfs: No change in volfile, continuing
> [2016-02-21 18:48:53.024060] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk]
> 0-glusterfs: No change in volfile, continuing
> [2016-02-21 18:48:53.035170] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk]
> 0-glusterfs: No change in volfile, continuing
> [2016-02-21 18:48:53.045637] I [rpc-clnt.c:1847:rpc_clnt_reconfig]
> 0-gv0-client-1: changing port to 49152 (from 0)
> [2016-02-21 18:48:53.051991] I [MSGID: 114057]
> [client-handshake.c:1437:select_server_supported_programs]
> 0-gv0-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
> [2016-02-21 18:48:53.052439] I [MSGID: 114046]
> [client-handshake.c:1213:client_setvolume_cbk] 0-gv0-client-1: Connected
> to gv0-client-1, attached to remote volume '/export/brick1/sdb1'.
> [2016-02-21 18:48:53.052486] I [MSGID: 114047]
> [client-handshake.c:1224:client_setvolume_cbk] 0-gv0-client-1: Server
> and Client lk-version numbers are not same, reopening the fds
> [2016-02-21 18:48:53.052668] I [MSGID: 114035]
> [client-handshake.c:193:client_set_lk_version_cbk] 0-gv0-client-1:
> Server lk version = 1
> [2016-02-21 18:48:31.148706] I [MSGID: 114018]
> [client.c:2030:client_rpc_notify] 0-gv0-client-1: disconnected from
> gv0-client-1. Client process will keep trying to connect to glusterd
> until brick's port is available
> [2016-02-21 18:49:12.271865] W [socket.c:588:__socket_rwv] 0-glusterfs:
> readv on (sanitized IP of glusterfs2):24007 failed (No data available)
> [2016-02-21 18:49:15.637745] W [socket.c:588:__socket_rwv]
> 0-gv0-client-1: readv on (sanitized IP of glusterfs2):49152 failed (No
> data available)
> [2016-02-21 18:49:15.637824] I [MSGID: 114018]
> [client.c:2030:client_rpc_notify] 0-gv0-client-1: disconnected from
> gv0-client-1. Client process will keep trying to connect to glusterd
> until brick's port is available
> [2016-02-21 18:49:24.198431] E [socket.c:2278:socket_connect_finish]
> 0-glusterfs: connection to (sanitized IP of glusterfs2):24007 failed
> (Connection refused)
> [2016-02-21 18:49:26.204811] E [socket.c:2278:socket_connect_finish]
> 0-gv0-client-1: connection to (sanitized IP of glusterfs2):24007 failed
> (Connection refused)
> [2016-02-21 18:49:38.366559] I [MSGID: 108031]
> [afr-common.c:1883:afr_local_discovery_cbk] 0-gv0-replicate-0: selecting
> local read_child gv0-client-0
> [2016-02-21 18:50:54.605535] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk]
> 0-glusterfs: No change in volfile, continuing
> [2016-02-21 18:50:54.605639] E [MSGID: 114058]
> [client-handshake.c:1524:client_query_portmap_cbk] 0-gv0-client-1:
> failed to get the port number for remote subvolume. Please run 'gluster
> volume status' on server to see if brick process is running.
> 
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

Gluster users - Feb 2016 - glusterfs client crashes

[Gluster-users] glusterfs client crashes

[Gluster-users] glusterfs client crashes

[Gluster-users] glusterfs client crashes