Raghavendra Gowdappa
2019-Jun-04 01:55 UTC
[Gluster-users] write request hung in write-behind
On Mon, Jun 3, 2019 at 1:11 PM Xie Changlong <zgrep at 139.com> wrote:> Firstly i correct myself, write request followed by 771(not 1545) FLUSH > requests. I've attach gnfs dump file, totally 774 pending call-stacks, > 771 of them pending on write-behind and the deepest call-stack is afr. >+Ravishankar Narayanankutty <ranaraya at redhat.com> +Karampuri, Pranith <pkarampu at redhat.com> Are you sure these were not call-stacks of in-progress ops? One way of confirming that would be to take statedumps periodically (say 3 min apart). Hung call stacks will be common to all the statedumps.> [global.callpool.stack.771] > stack=0x7f517f557f60 > uid=0 > gid=0 > pid=0 > unique=0 > lk-owner> op=stack > type=0 > cnt=3 > > [global.callpool.stack.771.frame.1] > frame=0x7f517f655880 > ref_count=0 > translator=cl35vol01-replicate-7 > complete=0 > parent=cl35vol01-dht > wind_from=dht_writev > wind_to=subvol->fops->writev > unwind_to=dht_writev_cbk > > [global.callpool.stack.771.frame.2] > frame=0x7f518ed90340 > ref_count=1 > translator=cl35vol01-dht > complete=0 > parent=cl35vol01-write-behind > wind_from=wb_fulfill_head > wind_to=FIRST_CHILD (frame->this)->fops->writev > unwind_to=wb_fulfill_cbk > > [global.callpool.stack.771.frame.3] > frame=0x7f516d3baf10 > ref_count=1 > translator=cl35vol01-write-behind > complete=0 > > [global.callpool.stack.772] > stack=0x7f51607a5a20 > uid=0 > gid=0 > pid=0 > unique=0 > lk-owner=a0715b77517f0000 > op=stack > type=0 > cnt=1 > > [global.callpool.stack.772.frame.1] > frame=0x7f516ca2d1b0 > ref_count=0 > translator=cl35vol01-replicate-7 > complete=0 > > [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5 > glusterdump.20106.dump.1559038081 |grep translator | wc -l > 774 > [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5 > glusterdump.20106.dump.1559038081 |grep complete |wc -l > 774 > [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5 > glusterdump.20106.dump.1559038081 |grep -E "complete=0" |wc -l > 774 > [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5 > glusterdump.20106.dump.1559038081 |grep translator | grep write-behind > |wc -l > 771 > [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5 > glusterdump.20106.dump.1559038081 |grep translator | grep replicate-7 | > wc -l > 2 > [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5 > glusterdump.20106.dump.1559038081 |grep translator | grep glusterfs | wc > -l > 1 > > > > > ???: Raghavendra Gowdappa <rgowdapp at redhat.com> > ??: 2019/06/03(???)14:46 > ???: Xie Changlong <zgrep at 139.com>; > ???: gluster-users <gluster-users at gluster.org>; > ??: Re: write request hung in write-behind > > > > On Mon, Jun 3, 2019 at 11:57 AM Xie Changlong <zgrep at 139.com> wrote: > >> Hi all >> >> Test gluster 3.8.4-54.15 gnfs, i saw a write request hung in write-behind >> followed by 1545 FLUSH requests. I found a similar >> bugfix https://bugzilla.redhat.com/show_bug.cgi?id=1626787, but not sure >> if it's the right one. >> >> [xlator.performance.write-behind.wb_inode] >> path=/575/1e/5751e318f21f605f2aac241bf042e7a8.jpg >> inode=0x7f51775b71a0 >> window_conf=1073741824 >> window_current=293822 >> transit-size=293822 >> dontsync=0 >> >> [.WRITE] >> request-ptr=0x7f516eec2060 >> refcount=1 >> wound=yes >> generation-number=1 >> req->op_ret=293822 >> req->op_errno=0 >> sync-attempts=1 >> sync-in-progress=yes >> > > Note that the sync is still in progress. This means, write-behind has > wound the write-request to its children and yet to receive the response > (unless there is a bug in accounting of sync-in-progress). So, its likely > that there are callstacks into children of write-behind, which are not > complete yet. Are you sure the deepest hung call-stack is in write-behind? > Can you check for frames with "complete=0"? > > size=293822 >> offset=1048576 >> lied=-1 >> append=0 >> fulfilled=0 >> go=-1 >> >> [.FLUSH] >> request-ptr=0x7f517c2badf0 >> refcount=1 >> wound=no >> generation-number=2 >> req->op_ret=-1 >> req->op_errno=116 >> sync-attempts=0 >> >> [.FLUSH] >> request-ptr=0x7f5173e9f7b0 >> refcount=1 >> wound=no >> generation-number=2 >> req->op_ret=0 >> req->op_errno=0 >> sync-attempts=0 >> >> [.FLUSH] >> request-ptr=0x7f51640b8ca0 >> refcount=1 >> wound=no >> generation-number=2 >> req->op_ret=0 >> req->op_errno=0 >> sync-attempts=0 >> >> [.FLUSH] >> request-ptr=0x7f516f3979d0 >> refcount=1 >> wound=no >> generation-number=2 >> req->op_ret=0 >> req->op_errno=0 >> sync-attempts=0 >> >> [.FLUSH] >> request-ptr=0x7f516f6ac8d0 >> refcount=1 >> wound=no >> generation-number=2 >> req->op_ret=0 >> req->op_errno=0 >> sync-attempts=0 >> >> >> Any comments would be appreciated! >> >> Thanks >> -Xie >> >> >>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190604/d50329f4/attachment.html>