thr3ads.net - Gluster users - [Gluster-users] Gluster 3.7.13 NFS Crash [Aug 2016]

If this information is useful, please help other people find it:
Share via:

Krutika Dhananjay

2016-Aug-05 05:18 UTC

[Gluster-users] Gluster 3.7.13 NFS Crash

Also, could you print local->fop please?

-Krutika

On Fri, Aug 5, 2016 at 10:46 AM, Krutika Dhananjay <kdhananj at
redhat.com>
wrote:
> Were the images being renamed (specifically to a pathname that already
> exists) while they were being written to?
>
> -Krutika
>
> On Thu, Aug 4, 2016 at 1:14 PM, Mahdi Adnan <mahdi.adnan at
outlook.com>
> wrote:
>
>> Hi,
>>
>> Kindly check the following link for all 7 bricks logs;
>>
>> https://db.tt/YP5qTGXk
>>
>>
>> --
>>
>> Respectfully
>> *Mahdi A. Mahdi*
>>
>>
>>
>> ------------------------------
>> From: kdhananj at redhat.com
>> Date: Thu, 4 Aug 2016 13:00:43 +0530
>>
>> Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash
>> To: mahdi.adnan at outlook.com
>> CC: gluster-users at gluster.org
>>
>> Could you also attach the brick logs please?
>>
>> -Krutika
>>
>> On Thu, Aug 4, 2016 at 12:48 PM, Mahdi Adnan <mahdi.adnan at
outlook.com>
>> wrote:
>>
>> appreciate your help,
>>
>> (gdb) frame 2
>> #2  0x00007f195deb1787 in shard_common_inode_write_do
>> (frame=0x7f19699f1164, this=0x7f195802ac10) at shard.c:3716
>> 3716                        anon_fd = fd_anonymous
>> (local->inode_list[i]);
>> (gdb) p local->inode_list[0]
>> $4 = (inode_t *) 0x7f195c532b18
>> (gdb) p local->inode_list[1]
>> $5 = (inode_t *) 0x0
>> (gdb)
>>
>>
>> --
>>
>> Respectfully
>> *Mahdi A. Mahdi*
>>
>>
>>
>> ------------------------------
>> From: kdhananj at redhat.com
>> Date: Thu, 4 Aug 2016 12:43:10 +0530
>>
>> Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash
>> To: mahdi.adnan at outlook.com
>> CC: gluster-users at gluster.org
>>
>> OK.
>> Could you also print the values of the following variables from the
>> original core:
>> i. i
>> ii. local->inode_list[0]
>> iii. local->inode_list[1]
>>
>> -Krutika
>>
>> On Wed, Aug 3, 2016 at 9:01 PM, Mahdi Adnan <mahdi.adnan at
outlook.com>
>> wrote:
>>
>> Hi,
>>
>> Unfortunately no, but i can setup a test bench and see if it gets the
>> same results.
>>
>> --
>>
>> Respectfully
>> *Mahdi A. Mahdi*
>>
>>
>>
>> ------------------------------
>> From: kdhananj at redhat.com
>> Date: Wed, 3 Aug 2016 20:59:50 +0530
>>
>> Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash
>> To: mahdi.adnan at outlook.com
>> CC: gluster-users at gluster.org
>>
>> Do you have a test case that consistently recreates this problem?
>>
>> -Krutika
>>
>> On Wed, Aug 3, 2016 at 8:32 PM, Mahdi Adnan <mahdi.adnan at
outlook.com>
>> wrote:
>>
>> Hi,
>>
>>  So i have updated to 3.7.14 and i still have the same issue with NFS.
>> based on what i have provided so far from logs and dumps do you think
>> it's an NFS issue ? should i switch to nfs-ganesha ?
>> the problem is, the current setup is used in a production environment,
>> and switching the mount point of  +50 VMs from native nfs to
nfs-ganesha is
>> not going to be smooth and without downtime, so i really appreciate
your
>> thoughts on this matter.
>>
>> --
>>
>> Respectfully
>> *Mahdi A. Mahdi*
>>
>>
>>
>> ------------------------------
>> From: mahdi.adnan at outlook.com
>> To: kdhananj at redhat.com
>> Date: Tue, 2 Aug 2016 08:44:16 +0300
>>
>> CC: gluster-users at gluster.org
>> Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash
>>
>> Hi,
>>
>> The NFS just crashed again, latest bt;
>>
>> (gdb) bt
>> #0  0x00007f0b71a9f210 in pthread_spin_lock () from
/lib64/libpthread.so.0
>> #1  0x00007f0b72c6fcd5 in fd_anonymous (inode=0x0) at fd.c:804
>> #2  0x00007f0b64ca5787 in shard_common_inode_write_do
>> (frame=0x7f0b707c062c, this=0x7f0b6002ac10) at shard.c:3716
>> #3  0x00007f0b64ca5a53 in
shard_common_inode_write_post_lookup_shards_handler
>> (frame=<optimized out>, this=<optimized out>) at
shard.c:3769
>> #4  0x00007f0b64c9eff5 in shard_common_lookup_shards_cbk
>> (frame=0x7f0b707c062c, cookie=<optimized out>,
this=0x7f0b6002ac10,
>> op_ret=0,
>>     op_errno=<optimized out>, inode=<optimized out>,
buf=0x7f0b51407640,
>> xdata=0x7f0b72f57648, postparent=0x7f0b514076b0) at shard.c:1601
>> #5  0x00007f0b64efe141 in dht_lookup_cbk (frame=0x7f0b7075fcdc,
>> cookie=<optimized out>, this=<optimized out>, op_ret=0,
op_errno=0,
>> inode=0x7f0b5f1d1f58,
>>     stbuf=0x7f0b51407640, xattr=0x7f0b72f57648,
>> postparent=0x7f0b514076b0) at dht-common.c:2174
>> #6  0x00007f0b651871f3 in afr_lookup_done (frame=frame at
entry=0x7f0b7079a4c8,
>> this=this at entry=0x7f0b60023ba0) at afr-common.c:1825
>> #7  0x00007f0b65187b84 in afr_lookup_metadata_heal_check
>> (frame=frame at entry=0x7f0b7079a4c8, this=0x7f0b60023ba0, this at
entry
>> =0xca0bd88259f5a800)
>>     at afr-common.c:2068
>> #8  0x00007f0b6518834f in afr_lookup_entry_heal (frame=frame at entry
>> =0x7f0b7079a4c8, this=0xca0bd88259f5a800, this at entry=0x7f0b60023ba0)
at
>> afr-common.c:2157
>> #9  0x00007f0b6518867d in afr_lookup_cbk (frame=0x7f0b7079a4c8,
>> cookie=<optimized out>, this=0x7f0b60023ba0, op_ret=<optimized
out>,
>>     op_errno=<optimized out>, inode=<optimized out>,
buf=0x7f0b564e9940,
>> xdata=0x7f0b72f708c8, postparent=0x7f0b564e99b0) at afr-common.c:2205
>> #10 0x00007f0b653d6e42 in client3_3_lookup_cbk (req=<optimized
out>,
>> iov=<optimized out>, count=<optimized out>,
myframe=0x7f0b7076354c)
>>     at client-rpc-fops.c:2981
>> #11 0x00007f0b72a00a30 in rpc_clnt_handle_reply (clnt=clnt at entry
>> =0x7f0b603393c0, pollin=pollin at entry=0x7f0b50c1c2d0) at
rpc-clnt.c:764
>> #12 0x00007f0b72a00cef in rpc_clnt_notify (trans=<optimized out>,
>> mydata=0x7f0b603393f0, event=<optimized out>,
data=0x7f0b50c1c2d0) at
>> rpc-clnt.c:925
>> #13 0x00007f0b729fc7c3 in rpc_transport_notify (this=this at entry
>> =0x7f0b60349040, event=event at entry=RPC_TRANSPORT_MSG_RECEIVED,
>> data=data at entry=0x7f0b50c1c2d0)
>>     at rpc-transport.c:546
>> #14 0x00007f0b678c39a4 in socket_event_poll_in (this=this at entry
>> =0x7f0b60349040) at socket.c:2353
>> #15 0x00007f0b678c65e4 in socket_event_handler (fd=fd at entry=29,
>> idx=idx at entry=17, data=0x7f0b60349040, poll_in=1, poll_out=0,
>> poll_err=0) at socket.c:2466
>> #16 0x00007f0b72ca0f7a in event_dispatch_epoll_handler
>> (event=0x7f0b564e9e80, event_pool=0x7f0b7349bf20) at event-epoll.c:575
>> #17 event_dispatch_epoll_worker (data=0x7f0b60152d40) at
event-epoll.c:678
>> #18 0x00007f0b71a9adc5 in start_thread () from /lib64/libpthread.so.0
>> #19 0x00007f0b713dfced in clone () from /lib64/libc.so.6
>>
>>
>> --
>>
>> Respectfully
>> *Mahdi A. Mahdi*
>>
>> ------------------------------
>> From: mahdi.adnan at outlook.com
>> To: kdhananj at redhat.com
>> Date: Mon, 1 Aug 2016 16:31:50 +0300
>> CC: gluster-users at gluster.org
>> Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash
>>
>> Many thanks,
>>
>> here's the results;
>>
>>
>> (gdb) p cur_block
>> $15 = 4088
>> (gdb) p last_block
>> $16 = 4088
>> (gdb) p local->first_block
>> $17 = 4087
>> (gdb) p odirect
>> $18 = _gf_false
>> (gdb) p fd->flags
>> $19 = 2
>> (gdb) p local->call_count
>> $20 = 2
>>
>>
>> If you need more core dumps, i have several files i can upload.
>>
>> --
>>
>> Respectfully
>> *Mahdi A. Mahdi*
>>
>>
>>
>> ------------------------------
>> From: kdhananj at redhat.com
>> Date: Mon, 1 Aug 2016 18:39:27 +0530
>> Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash
>> To: mahdi.adnan at outlook.com
>> CC: gluster-users at gluster.org
>>
>> Sorry I didn't make myself  clear. The reason I asked YOU to do it
is
>> because i tried it on my system and im not getting the backtrace
(it's all
>> question marks).
>>
>> Attach the core to gdb.
>> At the gdb prompt, go to frame 2 by typing
>> (gdb) f 2
>>
>> There, for each of the variables i asked you to get the values of, type
p
>> followed by the variable name.
>> For instance, to get the value of the variable 'odirect', do
this:
>>
>> (gdb) p odirect
>>
>> and gdb will print its value for you in response.
>>
>> -Krutika
>>
>> On Mon, Aug 1, 2016 at 4:55 PM, Mahdi Adnan <mahdi.adnan at
outlook.com>
>> wrote:
>>
>> Hi,
>>
>> How to get the results of the below variables ? i cant get the results
>> from gdb.
>>
>>
>> --
>>
>> Respectfully
>> *Mahdi A. Mahdi*
>>
>>
>>
>> ------------------------------
>> From: kdhananj at redhat.com
>> Date: Mon, 1 Aug 2016 15:51:38 +0530
>> Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash
>> To: mahdi.adnan at outlook.com
>> CC: gluster-users at gluster.org
>>
>>
>> Could you also print and share the values of the following variables
from
>> the backtrace please:
>>
>> i. cur_block
>> ii. last_block
>> iii. local->first_block
>> iv. odirect
>> v. fd->flags
>> vi. local->call_count
>>
>> -Krutika
>>
>> On Sat, Jul 30, 2016 at 5:04 PM, Mahdi Adnan <mahdi.adnan at
outlook.com>
>> wrote:
>>
>> Hi,
>>
>> I really appreciate if someone can help me fix my nfs crash, its
>> happening a lot and it's causing lots of issues to my VMs;
>> the problem is every few hours the native nfs crash and the volume
become
>> unavailable from the affected node unless i restart glusterd.
>> the volume is used by vmware esxi as a datastore for it's VMs with
the
>> following options;
>>
>>
>> OS: CentOS 7.2
>> Gluster: 3.7.13
>>
>> Volume Name: vlm01
>> Type: Distributed-Replicate
>> Volume ID: eacd8248-dca3-4530-9aed-7714a5a114f2
>> Status: Started
>> Number of Bricks: 7 x 3 = 21
>> Transport-type: tcp
>> Bricks:
>> Brick1: gfs01:/bricks/b01/vlm01
>> Brick2: gfs02:/bricks/b01/vlm01
>> Brick3: gfs03:/bricks/b01/vlm01
>> Brick4: gfs01:/bricks/b02/vlm01
>> Brick5: gfs02:/bricks/b02/vlm01
>> Brick6: gfs03:/bricks/b02/vlm01
>> Brick7: gfs01:/bricks/b03/vlm01
>> Brick8: gfs02:/bricks/b03/vlm01
>> Brick9: gfs03:/bricks/b03/vlm01
>> Brick10: gfs01:/bricks/b04/vlm01
>> Brick11: gfs02:/bricks/b04/vlm01
>> Brick12: gfs03:/bricks/b04/vlm01
>> Brick13: gfs01:/bricks/b05/vlm01
>> Brick14: gfs02:/bricks/b05/vlm01
>> Brick15: gfs03:/bricks/b05/vlm01
>> Brick16: gfs01:/bricks/b06/vlm01
>> Brick17: gfs02:/bricks/b06/vlm01
>> Brick18: gfs03:/bricks/b06/vlm01
>> Brick19: gfs01:/bricks/b07/vlm01
>> Brick20: gfs02:/bricks/b07/vlm01
>> Brick21: gfs03:/bricks/b07/vlm01
>> Options Reconfigured:
>> performance.readdir-ahead: off
>> performance.quick-read: off
>> performance.read-ahead: off
>> performance.io-cache: off
>> performance.stat-prefetch: off
>> cluster.eager-lock: enable
>> network.remote-dio: enable
>> cluster.quorum-type: auto
>> cluster.server-quorum-type: server
>> performance.strict-write-ordering: on
>> performance.write-behind: off
>> cluster.data-self-heal-algorithm: full
>> cluster.self-heal-window-size: 128
>> features.shard-block-size: 16MB
>> features.shard: on
>> auth.allow:
192.168.221.50,192.168.221.51,192.168.221.52,192.168.221.56,
>> 192.168.208.130,192.168.208.131,192.168.208.132,192.168.208.
>> 89,192.168.208.85,192.168.208.208.86
>> network.ping-timeout: 10
>>
>>
>> latest bt;
>>
>>
>> (gdb) bt
>> #0  0x00007f196acab210 in pthread_spin_lock () from
/lib64/libpthread.so.0
>> #1  0x00007f196be7bcd5 in fd_anonymous (inode=0x0) at fd.c:804
>> #2  0x00007f195deb1787 in shard_common_inode_write_do
>> (frame=0x7f19699f1164, this=0x7f195802ac10) at shard.c:3716
>> #3  0x00007f195deb1a53 in
shard_common_inode_write_post_lookup_shards_handler
>> (frame=<optimized out>, this=<optimized out>) at
shard.c:3769
>> #4  0x00007f195deaaff5 in shard_common_lookup_shards_cbk
>> (frame=0x7f19699f1164, cookie=<optimized out>,
this=0x7f195802ac10,
>> op_ret=0,
>>     op_errno=<optimized out>, inode=<optimized out>,
buf=0x7f194970bc40,
>> xdata=0x7f196c15451c, postparent=0x7f194970bcb0) at shard.c:1601
>> #5  0x00007f195e10a141 in dht_lookup_cbk (frame=0x7f196998e7d4,
>> cookie=<optimized out>, this=<optimized out>, op_ret=0,
op_errno=0,
>> inode=0x7f195c532b18,
>>     stbuf=0x7f194970bc40, xattr=0x7f196c15451c,
>> postparent=0x7f194970bcb0) at dht-common.c:2174
>> #6  0x00007f195e3931f3 in afr_lookup_done (frame=frame at
entry=0x7f196997f8a4,
>> this=this at entry=0x7f1958022a20) at afr-common.c:1825
>> #7  0x00007f195e393b84 in afr_lookup_metadata_heal_check
>> (frame=frame at entry=0x7f196997f8a4, this=0x7f1958022a20, this at
entry
>> =0xe3a929e0b67fa500)
>>     at afr-common.c:2068
>> #8  0x00007f195e39434f in afr_lookup_entry_heal (frame=frame at entry
>> =0x7f196997f8a4, this=0xe3a929e0b67fa500, this at entry=0x7f1958022a20)
at
>> afr-common.c:2157
>> #9  0x00007f195e39467d in afr_lookup_cbk (frame=0x7f196997f8a4,
>> cookie=<optimized out>, this=0x7f1958022a20, op_ret=<optimized
out>,
>>     op_errno=<optimized out>, inode=<optimized out>,
buf=0x7f195effa940,
>> xdata=0x7f196c1853b0, postparent=0x7f195effa9b0) at afr-common.c:2205
>> #10 0x00007f195e5e2e42 in client3_3_lookup_cbk (req=<optimized
out>,
>> iov=<optimized out>, count=<optimized out>,
myframe=0x7f196999952c)
>>     at client-rpc-fops.c:2981
>> #11 0x00007f196bc0ca30 in rpc_clnt_handle_reply (clnt=clnt at entry
>> =0x7f19583adaf0, pollin=pollin at entry=0x7f195907f930) at
rpc-clnt.c:764
>> #12 0x00007f196bc0ccef in rpc_clnt_notify (trans=<optimized out>,
>> mydata=0x7f19583adb20, event=<optimized out>,
data=0x7f195907f930) at
>> rpc-clnt.c:925
>> #13 0x00007f196bc087c3 in rpc_transport_notify (this=this at entry
>> =0x7f19583bd770, event=event at entry=RPC_TRANSPORT_MSG_RECEIVED,
>> data=data at entry=0x7f195907f930)
>>     at rpc-transport.c:546
>> #14 0x00007f1960acf9a4 in socket_event_poll_in (this=this at entry
>> =0x7f19583bd770) at socket.c:2353
>> #15 0x00007f1960ad25e4 in socket_event_handler (fd=fd at entry=25,
>> idx=idx at entry=14, data=0x7f19583bd770, poll_in=1, poll_out=0,
>> poll_err=0) at socket.c:2466
>> #16 0x00007f196beacf7a in event_dispatch_epoll_handler
>> (event=0x7f195effae80, event_pool=0x7f196dbf5f20) at event-epoll.c:575
>> #17 event_dispatch_epoll_worker (data=0x7f196dc41e10) at
event-epoll.c:678
>> #18 0x00007f196aca6dc5 in start_thread () from /lib64/libpthread.so.0
>> #19 0x00007f196a5ebced in clone () from /lib64/libc.so.6
>>
>>
>>
>>
>> nfs logs and the core dump can be found in the dropbox link below;
>> https://db.tt/rZrC9d7f
>>
>>
>> thanks in advance.
>>
>> Respectfully
>> *Mahdi A. Mahdi*
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>>
>> _______________________________________________ Gluster-users mailing
>> list Gluster-users at gluster.org http://www.gluster.org/mailman
>> /listinfo/gluster-users
>>
>> _______________________________________________ Gluster-users mailing
>> list Gluster-users at gluster.org http://www.gluster.org/mailman
>> /listinfo/gluster-users
>>
>>
>>
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160805/ba0c245c/attachment.html>

Mahdi Adnan

2016-Aug-05 06:28 UTC

head link

[Gluster-users] Gluster 3.7.13 NFS Crash

Hi,
Yes, i got some messages regarding an existing file name in the renaming process
while the VMs are online.
and here's the output;(gdb) frame 2#2  0x00007f195deb1787 in
shard_common_inode_write_do (frame=0x7f19699f1164, this=0x7f195802ac10) at
shard.c:37163716	                        anon_fd = fd_anonymous
(local->inode_list[i]);(gdb) p local->fop$1 = GF_FOP_WRITE(gdb)


-- 



Respectfully

    Mahdi A. Mahdi



From: kdhananj at redhat.com
Date: Fri, 5 Aug 2016 10:48:36 +0530
Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash
To: mahdi.adnan at outlook.com
CC: gluster-users at gluster.org

Also, could you print local->fop please?

-Krutika

On Fri, Aug 5, 2016 at 10:46 AM, Krutika Dhananjay <kdhananj at
redhat.com> wrote:
Were the images being renamed (specifically to a pathname that already exists)
while they were being written to?

-Krutika

On Thu, Aug 4, 2016 at 1:14 PM, Mahdi Adnan <mahdi.adnan at outlook.com>
wrote:



Hi,
Kindly check the following link for all 7 bricks logs;
https://db.tt/YP5qTGXk



-- 



Respectfully

    Mahdi A. Mahdi



From: kdhananj at redhat.com
Date: Thu, 4 Aug 2016 13:00:43 +0530
Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash
To: mahdi.adnan at outlook.com
CC: gluster-users at gluster.org

Could you also attach the brick logs please?

-Krutika

On Thu, Aug 4, 2016 at 12:48 PM, Mahdi Adnan <mahdi.adnan at outlook.com>
wrote:



appreciate your help,
(gdb) frame 2#2  0x00007f195deb1787 in shard_common_inode_write_do
(frame=0x7f19699f1164, this=0x7f195802ac10) at shard.c:37163716	                
anon_fd = fd_anonymous (local->inode_list[i]);(gdb) p
local->inode_list[0]$4 = (inode_t *) 0x7f195c532b18(gdb) p
local->inode_list[1]$5 = (inode_t *) 0x0(gdb)



-- 



Respectfully

    Mahdi A. Mahdi



From: kdhananj at redhat.com
Date: Thu, 4 Aug 2016 12:43:10 +0530
Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash
To: mahdi.adnan at outlook.com
CC: gluster-users at gluster.org

OK.
Could you also print the values of the following variables from the original
core:
i. i
ii. local->inode_list[0]
iii. local->inode_list[1]

-Krutika

On Wed, Aug 3, 2016 at 9:01 PM, Mahdi Adnan <mahdi.adnan at outlook.com>
wrote:



Hi,
Unfortunately no, but i can setup a test bench and see if it gets the same
results.


-- 



Respectfully

    Mahdi A. Mahdi



From: kdhananj at redhat.com
Date: Wed, 3 Aug 2016 20:59:50 +0530
Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash
To: mahdi.adnan at outlook.com
CC: gluster-users at gluster.org

Do you have a test case that consistently recreates this problem?

-Krutika

On Wed, Aug 3, 2016 at 8:32 PM, Mahdi Adnan <mahdi.adnan at outlook.com>
wrote:



Hi,
 So i have updated to 3.7.14 and i still have the same issue with NFS.based on
what i have provided so far from logs and dumps do you think it's an NFS
issue ? should i switch to nfs-ganesha ?
the problem is, the current setup is used in a production environment, and
switching the mount point of  +50 VMs from native nfs to nfs-ganesha is not
going to be smooth and without downtime, so i really appreciate your thoughts on
this matter.


-- 



Respectfully

    Mahdi A. Mahdi



From: mahdi.adnan at outlook.com
To: kdhananj at redhat.com
Date: Tue, 2 Aug 2016 08:44:16 +0300
CC: gluster-users at gluster.org
Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash




Hi,
The NFS just crashed again, latest bt;
(gdb) bt#0  0x00007f0b71a9f210 in pthread_spin_lock () from
/lib64/libpthread.so.0#1  0x00007f0b72c6fcd5 in fd_anonymous (inode=0x0) at
fd.c:804#2  0x00007f0b64ca5787 in shard_common_inode_write_do
(frame=0x7f0b707c062c, this=0x7f0b6002ac10) at shard.c:3716#3 
0x00007f0b64ca5a53 in shard_common_inode_write_post_lookup_shards_handler
(frame=<optimized out>, this=<optimized out>) at shard.c:3769#4 
0x00007f0b64c9eff5 in shard_common_lookup_shards_cbk (frame=0x7f0b707c062c,
cookie=<optimized out>, this=0x7f0b6002ac10, op_ret=0,    
op_errno=<optimized out>, inode=<optimized out>, buf=0x7f0b51407640,
xdata=0x7f0b72f57648, postparent=0x7f0b514076b0) at shard.c:1601#5 
0x00007f0b64efe141 in dht_lookup_cbk (frame=0x7f0b7075fcdc, cookie=<optimized
out>, this=<optimized out>, op_ret=0, op_errno=0, inode=0x7f0b5f1d1f58,
stbuf=0x7f0b51407640, xattr=0x7f0b72f57648, postparent=0x7f0b514076b0) at
dht-common.c:2174#6  0x00007f0b651871f3 in afr_lookup_done (frame=frame at
entry=0x7f0b7079a4c8, this=this at entry=0x7f0b60023ba0) at afr-common.c:1825#7 
0x00007f0b65187b84 in afr_lookup_metadata_heal_check (frame=frame at
entry=0x7f0b7079a4c8, this=0x7f0b60023ba0, this at entry=0xca0bd88259f5a800)   
at afr-common.c:2068#8  0x00007f0b6518834f in afr_lookup_entry_heal (frame=frame
at entry=0x7f0b7079a4c8, this=0xca0bd88259f5a800, this at entry=0x7f0b60023ba0)
at afr-common.c:2157#9  0x00007f0b6518867d in afr_lookup_cbk
(frame=0x7f0b7079a4c8, cookie=<optimized out>, this=0x7f0b60023ba0,
op_ret=<optimized out>,     op_errno=<optimized out>,
inode=<optimized out>, buf=0x7f0b564e9940, xdata=0x7f0b72f708c8,
postparent=0x7f0b564e99b0) at afr-common.c:2205#10 0x00007f0b653d6e42 in
client3_3_lookup_cbk (req=<optimized out>, iov=<optimized out>,
count=<optimized out>, myframe=0x7f0b7076354c)    at
client-rpc-fops.c:2981#11 0x00007f0b72a00a30 in rpc_clnt_handle_reply (clnt=clnt
at entry=0x7f0b603393c0, pollin=pollin at entry=0x7f0b50c1c2d0) at
rpc-clnt.c:764#12 0x00007f0b72a00cef in rpc_clnt_notify (trans=<optimized
out>, mydata=0x7f0b603393f0, event=<optimized out>,
data=0x7f0b50c1c2d0) at rpc-clnt.c:925#13 0x00007f0b729fc7c3 in
rpc_transport_notify (this=this at entry=0x7f0b60349040, event=event at
entry=RPC_TRANSPORT_MSG_RECEIVED, data=data at entry=0x7f0b50c1c2d0)    at
rpc-transport.c:546#14 0x00007f0b678c39a4 in socket_event_poll_in (this=this at
entry=0x7f0b60349040) at socket.c:2353#15 0x00007f0b678c65e4 in
socket_event_handler (fd=fd at entry=29, idx=idx at entry=17,
data=0x7f0b60349040, poll_in=1, poll_out=0, poll_err=0) at socket.c:2466#16
0x00007f0b72ca0f7a in event_dispatch_epoll_handler (event=0x7f0b564e9e80,
event_pool=0x7f0b7349bf20) at event-epoll.c:575#17 event_dispatch_epoll_worker
(data=0x7f0b60152d40) at event-epoll.c:678#18 0x00007f0b71a9adc5 in start_thread
() from /lib64/libpthread.so.0#19 0x00007f0b713dfced in clone () from
/lib64/libc.so.6


-- 



Respectfully

    Mahdi A. Mahdi

From: mahdi.adnan at outlook.com
To: kdhananj at redhat.com
Date: Mon, 1 Aug 2016 16:31:50 +0300
CC: gluster-users at gluster.org
Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash




Many thanks,
here's the results;

(gdb) p cur_block$15 = 4088(gdb) p last_block$16 = 4088(gdb) p
local->first_block$17 = 4087(gdb) p odirect$18 = _gf_false(gdb) p
fd->flags$19 = 2(gdb) p local->call_count$20 = 2

If you need more core dumps, i have several files i can upload.

-- 



Respectfully

    Mahdi A. Mahdi



From: kdhananj at redhat.com
Date: Mon, 1 Aug 2016 18:39:27 +0530
Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash
To: mahdi.adnan at outlook.com
CC: gluster-users at gluster.org

Sorry I didn't make myself  clear. The reason I asked YOU to do it is
because i tried it on my system and im not getting the backtrace (it's all
question marks).

Attach the core to gdb.
At the gdb prompt, go to frame 2 by typing
(gdb) f 2

There, for each of the variables i asked you to get the values of, type p
followed by the variable name.
For instance, to get the value of the variable 'odirect', do this:

(gdb) p odirect

and gdb will print its value for you in response.

-Krutika

On Mon, Aug 1, 2016 at 4:55 PM, Mahdi Adnan <mahdi.adnan at outlook.com>
wrote:



Hi,
How to get the results of the below variables ? i cant get the results from gdb.



-- 



Respectfully

    Mahdi A. Mahdi



From: kdhananj at redhat.com
Date: Mon, 1 Aug 2016 15:51:38 +0530
Subject: Re: [Gluster-users] Gluster 3.7.13 NFS Crash
To: mahdi.adnan at outlook.com
CC: gluster-users at gluster.org

Could you also print and share the values of the following variables from the
backtrace please:

i. cur_block
ii. last_block
iii. local->first_block
iv. odirect
v. fd->flags
vi. local->call_count

-Krutika

On Sat, Jul 30, 2016 at 5:04 PM, Mahdi Adnan <mahdi.adnan at outlook.com>
wrote:



Hi,
I really appreciate if someone can help me fix my nfs crash, its happening a lot
and it's causing lots of issues to my VMs;the problem is every few hours the
native nfs crash and the volume become unavailable from the affected node unless
i restart glusterd.the volume is used by vmware esxi as a datastore for it's
VMs with the following options;

OS: CentOS 7.2Gluster: 3.7.13
Volume Name: vlm01Type: Distributed-ReplicateVolume ID:
eacd8248-dca3-4530-9aed-7714a5a114f2Status: StartedNumber of Bricks: 7 x 3 =
21Transport-type: tcpBricks:Brick1: gfs01:/bricks/b01/vlm01Brick2:
gfs02:/bricks/b01/vlm01Brick3: gfs03:/bricks/b01/vlm01Brick4:
gfs01:/bricks/b02/vlm01Brick5: gfs02:/bricks/b02/vlm01Brick6:
gfs03:/bricks/b02/vlm01Brick7: gfs01:/bricks/b03/vlm01Brick8:
gfs02:/bricks/b03/vlm01Brick9: gfs03:/bricks/b03/vlm01Brick10:
gfs01:/bricks/b04/vlm01Brick11: gfs02:/bricks/b04/vlm01Brick12:
gfs03:/bricks/b04/vlm01Brick13: gfs01:/bricks/b05/vlm01Brick14:
gfs02:/bricks/b05/vlm01Brick15: gfs03:/bricks/b05/vlm01Brick16:
gfs01:/bricks/b06/vlm01Brick17: gfs02:/bricks/b06/vlm01Brick18:
gfs03:/bricks/b06/vlm01Brick19: gfs01:/bricks/b07/vlm01Brick20:
gfs02:/bricks/b07/vlm01Brick21: gfs03:/bricks/b07/vlm01Options
Reconfigured:performance.readdir-ahead: offperformance.quick-read:
offperformance.read-ahead: offperformance.io-cache:
offperformance.stat-prefetch: offcluster.eager-lock: enablenetwork.remote-dio:
enablecluster.quorum-type: autocluster.server-quorum-type:
serverperformance.strict-write-ordering: onperformance.write-behind:
offcluster.data-self-heal-algorithm: fullcluster.self-heal-window-size:
128features.shard-block-size: 16MBfeatures.shard: onauth.allow:
192.168.221.50,192.168.221.51,192.168.221.52,192.168.221.56,192.168.208.130,192.168.208.131,192.168.208.132,192.168.208.89,192.168.208.85,192.168.208.208.86network.ping-timeout:
10

latest bt;

(gdb) bt #0  0x00007f196acab210 in pthread_spin_lock () from
/lib64/libpthread.so.0#1  0x00007f196be7bcd5 in fd_anonymous (inode=0x0) at
fd.c:804#2  0x00007f195deb1787 in shard_common_inode_write_do
(frame=0x7f19699f1164, this=0x7f195802ac10) at shard.c:3716#3 
0x00007f195deb1a53 in shard_common_inode_write_post_lookup_shards_handler
(frame=<optimized out>, this=<optimized out>) at shard.c:3769#4 
0x00007f195deaaff5 in shard_common_lookup_shards_cbk (frame=0x7f19699f1164,
cookie=<optimized out>, this=0x7f195802ac10, op_ret=0,    
op_errno=<optimized out>, inode=<optimized out>, buf=0x7f194970bc40,
xdata=0x7f196c15451c, postparent=0x7f194970bcb0) at shard.c:1601#5 
0x00007f195e10a141 in dht_lookup_cbk (frame=0x7f196998e7d4, cookie=<optimized
out>, this=<optimized out>, op_ret=0, op_errno=0, inode=0x7f195c532b18,
stbuf=0x7f194970bc40, xattr=0x7f196c15451c, postparent=0x7f194970bcb0) at
dht-common.c:2174#6  0x00007f195e3931f3 in afr_lookup_done (frame=frame at
entry=0x7f196997f8a4, this=this at entry=0x7f1958022a20) at afr-common.c:1825#7 
0x00007f195e393b84 in afr_lookup_metadata_heal_check (frame=frame at
entry=0x7f196997f8a4, this=0x7f1958022a20, this at entry=0xe3a929e0b67fa500)   
at afr-common.c:2068#8  0x00007f195e39434f in afr_lookup_entry_heal (frame=frame
at entry=0x7f196997f8a4, this=0xe3a929e0b67fa500, this at entry=0x7f1958022a20)
at afr-common.c:2157#9  0x00007f195e39467d in afr_lookup_cbk
(frame=0x7f196997f8a4, cookie=<optimized out>, this=0x7f1958022a20,
op_ret=<optimized out>,     op_errno=<optimized out>,
inode=<optimized out>, buf=0x7f195effa940, xdata=0x7f196c1853b0,
postparent=0x7f195effa9b0) at afr-common.c:2205#10 0x00007f195e5e2e42 in
client3_3_lookup_cbk (req=<optimized out>, iov=<optimized out>,
count=<optimized out>, myframe=0x7f196999952c)    at
client-rpc-fops.c:2981#11 0x00007f196bc0ca30 in rpc_clnt_handle_reply (clnt=clnt
at entry=0x7f19583adaf0, pollin=pollin at entry=0x7f195907f930) at
rpc-clnt.c:764#12 0x00007f196bc0ccef in rpc_clnt_notify (trans=<optimized
out>, mydata=0x7f19583adb20, event=<optimized out>,
data=0x7f195907f930) at rpc-clnt.c:925#13 0x00007f196bc087c3 in
rpc_transport_notify (this=this at entry=0x7f19583bd770, event=event at
entry=RPC_TRANSPORT_MSG_RECEIVED, data=data at entry=0x7f195907f930)    at
rpc-transport.c:546#14 0x00007f1960acf9a4 in socket_event_poll_in (this=this at
entry=0x7f19583bd770) at socket.c:2353#15 0x00007f1960ad25e4 in
socket_event_handler (fd=fd at entry=25, idx=idx at entry=14,
data=0x7f19583bd770, poll_in=1, poll_out=0, poll_err=0) at socket.c:2466#16
0x00007f196beacf7a in event_dispatch_epoll_handler (event=0x7f195effae80,
event_pool=0x7f196dbf5f20) at event-epoll.c:575#17 event_dispatch_epoll_worker
(data=0x7f196dc41e10) at event-epoll.c:678#18 0x00007f196aca6dc5 in start_thread
() from /lib64/libpthread.so.0#19 0x00007f196a5ebced in clone () from
/lib64/libc.so.6



nfs logs and the core dump can be found in the dropbox link
below;https://db.tt/rZrC9d7f


thanks in advance.Respectfully
Mahdi A. Mahdi

 		 	   		  

_______________________________________________

Gluster-users mailing list

Gluster-users at gluster.org

http://www.gluster.org/mailman/listinfo/gluster-users

 		 	   		  

 		 	   		  

_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users 		 	   		  

_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users 		 	   		  

 		 	   		  

 		 	   		  

 		 	   		  



 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160805/1c378b69/attachment.html>

Gluster users - Aug 2016 - Gluster 3.7.13 NFS Crash

[Gluster-users] Gluster 3.7.13 NFS Crash

[Gluster-users] Gluster 3.7.13 NFS Crash