Pranith Kumar Karampuri
2019-Jun-06 07:02 UTC
[Gluster-users] write request hung in write-behind
On Tue, Jun 4, 2019 at 7:36 AM Xie Changlong <zgrep at 139.com> wrote:> To me, all 'df' commands on specific(not all) nfs client hung forever. > The temporary solution is disable performance.nfs.write-behind and > cluster.eager-lock. > > I'll try to get more info back if encounter this problem again . >If you observe this issue again, take successive (at least a minute apart) statedumps of the processes and run https://github.com/gluster/glusterfs/blob/master/extras/identify-hangs.sh on it which will give the information about the hangs.> > > > ???: Raghavendra Gowdappa <rgowdapp at redhat.com> > ??: 2019/06/04(???)09:55 > ???: Xie Changlong <zgrep at 139.com>;Ravishankar Narayanankutty > <ranaraya at redhat.com>;Karampuri, Pranith <pkarampu at redhat.com>; > ???: gluster-users <gluster-users at gluster.org>; > ??: Re: Re: write request hung in write-behind > > > > On Mon, Jun 3, 2019 at 1:11 PM Xie Changlong <zgrep at 139.com> wrote: > >> Firstly i correct myself, write request followed by 771(not 1545) FLUSH >> requests. I've attach gnfs dump file, totally 774 pending call-stacks, >> 771 of them pending on write-behind and the deepest call-stack is afr. >> > > +Ravishankar Narayanankutty <ranaraya at redhat.com> +Karampuri, Pranith > <pkarampu at redhat.com> > > Are you sure these were not call-stacks of in-progress ops? One way of > confirming that would be to take statedumps periodically (say 3 min apart). > Hung call stacks will be common to all the statedumps. > > >> [global.callpool.stack.771] >> stack=0x7f517f557f60 >> uid=0 >> gid=0 >> pid=0 >> unique=0 >> lk-owner>> op=stack >> type=0 >> cnt=3 >> >> [global.callpool.stack.771.frame.1] >> frame=0x7f517f655880 >> ref_count=0 >> translator=cl35vol01-replicate-7 >> complete=0 >> parent=cl35vol01-dht >> wind_from=dht_writev >> wind_to=subvol->fops->writev >> unwind_to=dht_writev_cbk >> >> [global.callpool.stack.771.frame.2] >> frame=0x7f518ed90340 >> ref_count=1 >> translator=cl35vol01-dht >> complete=0 >> parent=cl35vol01-write-behind >> wind_from=wb_fulfill_head >> wind_to=FIRST_CHILD (frame->this)->fops->writev >> unwind_to=wb_fulfill_cbk >> >> [global.callpool.stack.771.frame.3] >> frame=0x7f516d3baf10 >> ref_count=1 >> translator=cl35vol01-write-behind >> complete=0 >> >> [global.callpool.stack.772] >> stack=0x7f51607a5a20 >> uid=0 >> gid=0 >> pid=0 >> unique=0 >> lk-owner=a0715b77517f0000 >> op=stack >> type=0 >> cnt=1 >> >> [global.callpool.stack.772.frame.1] >> frame=0x7f516ca2d1b0 >> ref_count=0 >> translator=cl35vol01-replicate-7 >> complete=0 >> >> [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5 >> glusterdump.20106.dump.1559038081 |grep translator | wc -l >> 774 >> [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5 >> glusterdump.20106.dump.1559038081 |grep complete |wc -l >> 774 >> [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5 >> glusterdump.20106.dump.1559038081 |grep -E "complete=0" |wc -l >> 774 >> [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5 >> glusterdump.20106.dump.1559038081 |grep translator | grep write-behind >> |wc -l >> 771 >> [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5 >> glusterdump.20106.dump.1559038081 |grep translator | grep replicate-7 | >> wc -l >> 2 >> [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5 >> glusterdump.20106.dump.1559038081 |grep translator | grep glusterfs | wc >> -l >> 1 >> >> >> >> >> ???: Raghavendra Gowdappa <rgowdapp at redhat.com> >> ??: 2019/06/03(???)14:46 >> ???: Xie Changlong <zgrep at 139.com>; >> ???: gluster-users <gluster-users at gluster.org>; >> ??: Re: write request hung in write-behind >> >> >> >> On Mon, Jun 3, 2019 at 11:57 AM Xie Changlong <zgrep at 139.com> wrote: >> >>> Hi all >>> >>> Test gluster 3.8.4-54.15 gnfs, i saw a write request hung in >>> write-behind followed by 1545 FLUSH requests. I found a similar >>> bugfix https://bugzilla.redhat.com/show_bug.cgi?id=1626787, but not >>> sure if it's the right one. >>> >>> [xlator.performance.write-behind.wb_inode] >>> path=/575/1e/5751e318f21f605f2aac241bf042e7a8.jpg >>> inode=0x7f51775b71a0 >>> window_conf=1073741824 >>> window_current=293822 >>> transit-size=293822 >>> dontsync=0 >>> >>> [.WRITE] >>> request-ptr=0x7f516eec2060 >>> refcount=1 >>> wound=yes >>> generation-number=1 >>> req->op_ret=293822 >>> req->op_errno=0 >>> sync-attempts=1 >>> sync-in-progress=yes >>> >> >> Note that the sync is still in progress. This means, write-behind has >> wound the write-request to its children and yet to receive the response >> (unless there is a bug in accounting of sync-in-progress). So, its likely >> that there are callstacks into children of write-behind, which are not >> complete yet. Are you sure the deepest hung call-stack is in write-behind? >> Can you check for frames with "complete=0"? >> >> size=293822 >>> offset=1048576 >>> lied=-1 >>> append=0 >>> fulfilled=0 >>> go=-1 >>> >>> [.FLUSH] >>> request-ptr=0x7f517c2badf0 >>> refcount=1 >>> wound=no >>> generation-number=2 >>> req->op_ret=-1 >>> req->op_errno=116 >>> sync-attempts=0 >>> >>> [.FLUSH] >>> request-ptr=0x7f5173e9f7b0 >>> refcount=1 >>> wound=no >>> generation-number=2 >>> req->op_ret=0 >>> req->op_errno=0 >>> sync-attempts=0 >>> >>> [.FLUSH] >>> request-ptr=0x7f51640b8ca0 >>> refcount=1 >>> wound=no >>> generation-number=2 >>> req->op_ret=0 >>> req->op_errno=0 >>> sync-attempts=0 >>> >>> [.FLUSH] >>> request-ptr=0x7f516f3979d0 >>> refcount=1 >>> wound=no >>> generation-number=2 >>> req->op_ret=0 >>> req->op_errno=0 >>> sync-attempts=0 >>> >>> [.FLUSH] >>> request-ptr=0x7f516f6ac8d0 >>> refcount=1 >>> wound=no >>> generation-number=2 >>> req->op_ret=0 >>> req->op_errno=0 >>> sync-attempts=0 >>> >>> >>> Any comments would be appreciated! >>> >>> Thanks >>> -Xie >>> >>> >>>-- Pranith -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190606/73d878c2/attachment.html>