thr3ads.net - Gluster users - [Gluster-users] hanging httpd processes. [Mar 2017]

If this information is useful, please help other people find it:
Share via:

Mohit Agrawal

2017-Mar-31 08:57 UTC

[Gluster-users] hanging httpd processes.

Hi,

As per attached glusterdump/stackdump  it seems it is a known issue
(https://bugzilla.redhat.com/show_bug.cgi?id=1372211) and issue is
already fixed from the patch (https://review.gluster.org/#/c/15380/).

The issue is happened in this case
Assume a file is opened with fd1 and fd2.
1. some WRITE opto fd1 got error, they were add back to 'todo' queue
   because of those error.
2. fd2 closed, a FLUSH op is send to write-behind.
3. FLUSH can not be unwind because it's not a legal waiter for those
   failed write(as func __wb_request_waiting_on() say). and those failed
   WRITE also can not be ended if fd1 is not closed. fd2 stuck in close
   syscall.

As per statedump it also shows flush op fd is not same as write op fd.
Kindly upgrade the package on 3.10.1 and share the result.



Thanks
Mohit Agrawal


On Fri, Mar 31, 2017 at 12:29 PM, Amar Tumballi <atumball at
redhat.com <http://lists.gluster.org/mailman/listinfo/gluster-users>>
wrote:
>* Hi Alvin,*>>* Thanks for the dump output. It helped a bit.
*>>* For now, recommend turning off open-behind and read-ahead performance
*>* translators for you to get rid of this situation, As I noticed hung FLUSH
*>* operations from these translators.
*>
Looks like I gave wrong advise by looking at below snippet:

[global.callpool.stack.61]>* stack=0x7f6c6f628f04*>* uid=48
*>* gid=48
*>* pid=11077
*>* unique=10048797
*>* lk-owner=a73ae5bdb5fcd0d2
*>* op=FLUSH
*>* type=1
*>* cnt=5
*>>* [global.callpool.stack.61.frame.1]
*>* frame=0x7f6c6f793d88
*>* ref_count=0
*>* translator=edocs-production-write-behind
*>* complete=0
*>* parent=edocs-production-read-ahead
*>* wind_from=ra_flush
*>* wind_to=FIRST_CHILD (this)->fops->flush
*>* unwind_to=ra_flush_cbk
*>>* [global.callpool.stack.61.frame.2]
*>* frame=0x7f6c6f796c90
*>* ref_count=1
*>* translator=edocs-production-read-ahead
*>* complete=0
*>* parent=edocs-production-open-behind
*>* wind_from=default_flush_resume
*>* wind_to=FIRST_CHILD(this)->fops->flush
*>* unwind_to=default_flush_cbk
*>>* [global.callpool.stack.61.frame.3]
*>* frame=0x7f6c6f79b724
*>* ref_count=1
*>* translator=edocs-production-open-behind
*>* complete=0
*>* parent=edocs-production
*>* wind_from=io_stats_flush
*>* wind_to=FIRST_CHILD(this)->fops->flush
*>* unwind_to=io_stats_flush_cbk
*>>* [global.callpool.stack.61.frame.4]
*>* frame=0x7f6c6f79b474
*>* ref_count=1
*>* translator=edocs-production
*>* complete=0
*>* parent=fuse
*>* wind_from=fuse_flush_resume
*>* wind_to=FIRST_CHILD(this)->fops->flush
*>* unwind_to=fuse_err_cbk
*>>* [global.callpool.stack.61.frame.5]
*>* frame=0x7f6c6f796684
*>* ref_count=1
*>* translator=fuse
*>* complete=0
*>
Mos probably, issue is with write-behind's flush. So please turn off
write-behind and test. If you don't have any hung httpd processes, please
let us know.

-Amar

>* -Amar*>>* On Wed, Mar 29, 2017 at 6:56 AM, Alvin Starr <alvin at netvel.net
<http://lists.gluster.org/mailman/listinfo/gluster-users>> wrote:
*>>>* We are running gluster 3.8.9-1 on Centos 7.3.1611 for the servers
and on
*>>* the clients 3.7.11-2 on Centos 6.8
*>>>>* We are seeing httpd processes hang in fuse_request_send or
sync_page.
*>>>>* These calls are from PHP 5.3.3-48 scripts
*>>>>* I am attaching  a tgz file that contains the process dump
from glusterfsd
*>>* and the hung pids along with the offending pid's stacks from
*>>* /proc/{pid}/stack.
*>>>>* This has been a low level annoyance for a while but it has
become a much
*>>* bigger issue because the number of hung processes went from a few a
week to
*>>* a few hundred a day.
*>>>>>>* --
*>>* Alvin Starr                   ||   voice: (905)513-7688
*>>* Netvel Inc.                   ||   Cell:  (416)806-0133
*>>* alvin at netvel.net
<http://lists.gluster.org/mailman/listinfo/gluster-users>
||
*>>>>>>* _______________________________________________
*>>* Gluster-users mailing list
*>>* Gluster-users at gluster.org
<http://lists.gluster.org/mailman/listinfo/gluster-users>
*>>* http://lists.gluster.org/mailman/listinfo/gluster-users
<http://lists.gluster.org/mailman/listinfo/gluster-users>
*>>>>>>* --
*>* Amar Tumballi (amarts)
*>


--
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170331/35479477/attachment.html>

Mohit Agrawal

2017-Mar-31 09:54 UTC

head link

[Gluster-users] hanging httpd processes.

Hi,

As you have mentioned client/server version in thread it shows package
version are different on both(client,server).
We would recommend you to upgrade both servers and clients to rhs-3.10.1.
If it is not possible to upgrade both(client,server) then in this case it
is required to upgrade client only.

Thanks
Mohit Agrawal

On Fri, Mar 31, 2017 at 2:27 PM, Mohit Agrawal <moagrawa at redhat.com>
wrote:
> Hi,
>
> As per attached glusterdump/stackdump  it seems it is a known issue
(https://bugzilla.redhat.com/show_bug.cgi?id=1372211) and issue is already fixed
from the patch (https://review.gluster.org/#/c/15380/).
>
> The issue is happened in this case
> Assume a file is opened with fd1 and fd2.
> 1. some WRITE opto fd1 got error, they were add back to 'todo'
queue
>    because of those error.
> 2. fd2 closed, a FLUSH op is send to write-behind.
> 3. FLUSH can not be unwind because it's not a legal waiter for those
>    failed write(as func __wb_request_waiting_on() say). and those failed
>    WRITE also can not be ended if fd1 is not closed. fd2 stuck in close
>    syscall.
>
> As per statedump it also shows flush op fd is not same as write op fd.
> Kindly upgrade the package on 3.10.1 and share the result.
>
>
>
> Thanks
> Mohit Agrawal
>
>
> On Fri, Mar 31, 2017 at 12:29 PM, Amar Tumballi <atumball at redhat.com
<http://lists.gluster.org/mailman/listinfo/gluster-users>> wrote:
>
> >* Hi Alvin,
> *>>* Thanks for the dump output. It helped a bit.
> *>>* For now, recommend turning off open-behind and read-ahead
performance
> *>* translators for you to get rid of this situation, As I noticed hung
FLUSH
> *>* operations from these translators.
> *>
> Looks like I gave wrong advise by looking at below snippet:
>
> [global.callpool.stack.61]
> >* stack=0x7f6c6f628f04
> *>* uid=48
> *>* gid=48
> *>* pid=11077
> *>* unique=10048797
> *>* lk-owner=a73ae5bdb5fcd0d2
> *>* op=FLUSH
> *>* type=1
> *>* cnt=5
> *>>* [global.callpool.stack.61.frame.1]
> *>* frame=0x7f6c6f793d88
> *>* ref_count=0
> *>* translator=edocs-production-write-behind
> *>* complete=0
> *>* parent=edocs-production-read-ahead
> *>* wind_from=ra_flush
> *>* wind_to=FIRST_CHILD (this)->fops->flush
> *>* unwind_to=ra_flush_cbk
> *>>* [global.callpool.stack.61.frame.2]
> *>* frame=0x7f6c6f796c90
> *>* ref_count=1
> *>* translator=edocs-production-read-ahead
> *>* complete=0
> *>* parent=edocs-production-open-behind
> *>* wind_from=default_flush_resume
> *>* wind_to=FIRST_CHILD(this)->fops->flush
> *>* unwind_to=default_flush_cbk
> *>>* [global.callpool.stack.61.frame.3]
> *>* frame=0x7f6c6f79b724
> *>* ref_count=1
> *>* translator=edocs-production-open-behind
> *>* complete=0
> *>* parent=edocs-production
> *>* wind_from=io_stats_flush
> *>* wind_to=FIRST_CHILD(this)->fops->flush
> *>* unwind_to=io_stats_flush_cbk
> *>>* [global.callpool.stack.61.frame.4]
> *>* frame=0x7f6c6f79b474
> *>* ref_count=1
> *>* translator=edocs-production
> *>* complete=0
> *>* parent=fuse
> *>* wind_from=fuse_flush_resume
> *>* wind_to=FIRST_CHILD(this)->fops->flush
> *>* unwind_to=fuse_err_cbk
> *>>* [global.callpool.stack.61.frame.5]
> *>* frame=0x7f6c6f796684
> *>* ref_count=1
> *>* translator=fuse
> *>* complete=0
> *>
> Mos probably, issue is with write-behind's flush. So please turn off
> write-behind and test. If you don't have any hung httpd processes,
please
> let us know.
>
> -Amar
>
>
> >* -Amar
> *>>* On Wed, Mar 29, 2017 at 6:56 AM, Alvin Starr <alvin at
netvel.net <http://lists.gluster.org/mailman/listinfo/gluster-users>>
wrote:
> *>>>* We are running gluster 3.8.9-1 on Centos 7.3.1611 for the
servers and on
> *>>* the clients 3.7.11-2 on Centos 6.8
> *>>>>* We are seeing httpd processes hang in fuse_request_send
or sync_page.
> *>>>>* These calls are from PHP 5.3.3-48 scripts
> *>>>>* I am attaching  a tgz file that contains the process
dump from glusterfsd
> *>>* and the hung pids along with the offending pid's stacks from
> *>>* /proc/{pid}/stack.
> *>>>>* This has been a low level annoyance for a while but it
has become a much
> *>>* bigger issue because the number of hung processes went from a
few a week to
> *>>* a few hundred a day.
> *>>>>>>* --
> *>>* Alvin Starr                   ||   voice: (905)513-7688
> *>>* Netvel Inc.                   ||   Cell:  (416)806-0133
> *>>* alvin at netvel.net
<http://lists.gluster.org/mailman/listinfo/gluster-users>              ||
> *>>>>>>* _______________________________________________
> *>>* Gluster-users mailing list
> *>>* Gluster-users at gluster.org
<http://lists.gluster.org/mailman/listinfo/gluster-users>
> *>>* http://lists.gluster.org/mailman/listinfo/gluster-users
<http://lists.gluster.org/mailman/listinfo/gluster-users>
> *>>>>>>* --
> *>* Amar Tumballi (amarts)
> *>
>
>
> --
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170331/947837ad/attachment.html>

Alvin Starr

2017-Apr-07 18:32 UTC

head link

[Gluster-users] hanging httpd processes.

Thanks for the help.

That seems to have fixed it.

We were seeing hangs clocking up at a rate of a few hundred a day and 
for the last week there have been none.



On 03/31/2017 05:54 AM, Mohit Agrawal wrote:> Hi,
>
> As you have mentioned client/server version in thread it shows package 
> version are different on both(client,server).
> We would recommend you to upgrade both servers and clients to rhs-3.10.1.
> If it is not possible to upgrade both(client,server) then in this case 
> it is required to upgrade client only.
>
> Thanks
> Mohit Agrawal
>
> On Fri, Mar 31, 2017 at 2:27 PM, Mohit Agrawal <moagrawa at redhat.com 
> <mailto:moagrawa at redhat.com>> wrote:
>
>     Hi, As per attached glusterdump/stackdump it seems it is a known
>     issue (https://bugzilla.redhat.com/show_bug.cgi?id=1372211
>     <https://bugzilla.redhat.com/show_bug.cgi?id=1372211>) and issue
>     is already fixed from the patch
>     (https://review.gluster.org/#/c/15380/
>     <https://review.gluster.org/#/c/15380/>). The issue is happened
in
>     this case Assume a file is opened with fd1 and fd2. 1. some WRITE
>     opto fd1 got error, they were add back to 'todo' queue because
of
>     those error. 2. fd2 closed, a FLUSH op is send to write-behind. 3.
>     FLUSH can not be unwind because it's not a legal waiter for those
>     failed write(as func __wb_request_waiting_on() say). and those
>     failed WRITE also can not be ended if fd1 is not closed. fd2 stuck
>     in close syscall. As per statedump it also shows flush op fd is
>     not same as write op fd. Kindly upgrade the package on 3.10.1 and
>     share the result. Thanks Mohit Agrawal
>
>     On Fri, Mar 31, 2017 at 12:29 PM, Amar Tumballi <atumball at
redhat.com
>     <http://lists.gluster.org/mailman/listinfo/gluster-users>>
wrote:
>
>     >/Hi Alvin, />//>/Thanks for the dump output. It helped a bit.
/>//>/For now, recommend turning off open-behind and read-ahead
>     performance />/translators for you to get rid of this situation, As
I noticed
>     hung FLUSH />/operations from these translators. />//
>     Looks like I gave wrong advise by looking at below snippet:
>
>     [global.callpool.stack.61]
>     >/stack=0x7f6c6f628f04 />/uid=48 />/gid=48 />/pid=11077
/>/unique=10048797 />/lk-owner=a73ae5bdb5fcd0d2 />/op=FLUSH
/>/type=1 />/cnt=5 />//>/[global.callpool.stack.61.frame.1]
/>/frame=0x7f6c6f793d88 />/ref_count=0
/>/translator=edocs-production-write-behind />/complete=0
/>/parent=edocs-production-read-ahead />/wind_from=ra_flush
/>/wind_to=FIRST_CHILD (this)->fops->flush />/unwind_to=ra_flush_cbk
/>//>/[global.callpool.stack.61.frame.2] />/frame=0x7f6c6f796c90
/>/ref_count=1 />/translator=edocs-production-read-ahead />/complete=0
/>/parent=edocs-production-open-behind />/wind_from=default_flush_resume
/>/wind_to=FIRST_CHILD(this)->fops->flush
/>/unwind_to=default_flush_cbk />//>/[global.callpool.stack.61.frame.3]
/>/frame=0x7f6c6f79b724 />/ref_count=1
/>/translator=edocs-production-open-behind />/complete=0
/>/parent=edocs-production />/wind_from=io_stats_flush
/>/wind_to=FIRST_CHILD(this)->fops->flush
/>/unwind_to=io_stats_flush_cbk
/>//>/[global.callpool.stack.61.frame.4] />/frame=0x7f6c6f79b474
/>/ref_count=1 />/translator=edocs-production />/complete=0
/>/parent=fuse />/wind_from=fuse_flush_resume
/>/wind_to=FIRST_CHILD(this)->fops->flush />/unwind_to=fuse_err_cbk
/>//>/[global.callpool.stack.61.frame.5] />/frame=0x7f6c6f796684
/>/ref_count=1 />/translator=fuse />/complete=0 />//
>     Mos probably, issue is with write-behind's flush. So please turn
off
>     write-behind and test. If you don't have any hung httpd processes,
please
>     let us know.
>
>     -Amar
>
>
>     >/-Amar />//>/On Wed, Mar 29, 2017 at 6:56 AM, Alvin Starr
<alvin at netvel.net
>     <http://lists.gluster.org/mailman/listinfo/gluster-users>>
wrote: />//>>/We are running gluster 3.8.9-1 on Centos 7.3.1611 for the
servers
>     and on />>/the clients 3.7.11-2 on Centos 6.8
/>>//>>/We are seeing httpd processes hang in fuse_request_send or
>     sync_page. />>//>>/These calls are from PHP 5.3.3-48
scripts />>//>>/I am attaching a tgz file that contains the process
dump from
>     glusterfsd />>/and the hung pids along with the offending
pid's stacks from />>//proc/{pid}/stack. />>//>>/This has
been a low level annoyance for a while but it has become
>     a much />>/bigger issue because the number of hung processes went
from a few
>     a week to />>/a few hundred a day.
/>>//>>//>>/-- />>/Alvin Starr || voice: (905)513-7688
/>>/Netvel Inc. || Cell: (416)806-0133 />>/alvin at netvel.net
>     <http://lists.gluster.org/mailman/listinfo/gluster-users> ||
/>>//>>//>>/_______________________________________________
/>>/Gluster-users mailing list />>/Gluster-users at gluster.org
>     <http://lists.gluster.org/mailman/listinfo/gluster-users>
/>>/http://lists.gluster.org/mailman/listinfo/gluster-users
>     <http://lists.gluster.org/mailman/listinfo/gluster-users>
/>>//>//>//>//>/-- />/Amar Tumballi (amarts) />// --
>-- 
Alvin Starr                   ||   voice: (905)513-7688
Netvel Inc.                   ||   Cell:  (416)806-0133
alvin at netvel.net              ||
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170407/03e7b551/attachment.html>

Gluster users - Mar 2017 - hanging httpd processes.

[Gluster-users] hanging httpd processes.

[Gluster-users] hanging httpd processes.

[Gluster-users] hanging httpd processes.