On Fri, Mar 31, 2017 at 12:29 PM, Amar Tumballi <atumball at redhat.com> wrote:> Hi Alvin, > > Thanks for the dump output. It helped a bit. > > For now, recommend turning off open-behind and read-ahead performance > translators for you to get rid of this situation, As I noticed hung FLUSH > operations from these translators. >Looks like I gave wrong advise by looking at below snippet: [global.callpool.stack.61]> stack=0x7f6c6f628f04 > uid=48 > gid=48 > pid=11077 > unique=10048797 > lk-owner=a73ae5bdb5fcd0d2 > op=FLUSH > type=1 > cnt=5 > > [global.callpool.stack.61.frame.1] > frame=0x7f6c6f793d88 > ref_count=0 > translator=edocs-production-write-behind > complete=0 > parent=edocs-production-read-ahead > wind_from=ra_flush > wind_to=FIRST_CHILD (this)->fops->flush > unwind_to=ra_flush_cbk > > [global.callpool.stack.61.frame.2] > frame=0x7f6c6f796c90 > ref_count=1 > translator=edocs-production-read-ahead > complete=0 > parent=edocs-production-open-behind > wind_from=default_flush_resume > wind_to=FIRST_CHILD(this)->fops->flush > unwind_to=default_flush_cbk > > [global.callpool.stack.61.frame.3] > frame=0x7f6c6f79b724 > ref_count=1 > translator=edocs-production-open-behind > complete=0 > parent=edocs-production > wind_from=io_stats_flush > wind_to=FIRST_CHILD(this)->fops->flush > unwind_to=io_stats_flush_cbk > > [global.callpool.stack.61.frame.4] > frame=0x7f6c6f79b474 > ref_count=1 > translator=edocs-production > complete=0 > parent=fuse > wind_from=fuse_flush_resume > wind_to=FIRST_CHILD(this)->fops->flush > unwind_to=fuse_err_cbk > > [global.callpool.stack.61.frame.5] > frame=0x7f6c6f796684 > ref_count=1 > translator=fuse > complete=0 >Mos probably, issue is with write-behind's flush. So please turn off write-behind and test. If you don't have any hung httpd processes, please let us know. -Amar> -Amar > > On Wed, Mar 29, 2017 at 6:56 AM, Alvin Starr <alvin at netvel.net> wrote: > >> We are running gluster 3.8.9-1 on Centos 7.3.1611 for the servers and on >> the clients 3.7.11-2 on Centos 6.8 >> >> We are seeing httpd processes hang in fuse_request_send or sync_page. >> >> These calls are from PHP 5.3.3-48 scripts >> >> I am attaching a tgz file that contains the process dump from glusterfsd >> and the hung pids along with the offending pid's stacks from >> /proc/{pid}/stack. >> >> This has been a low level annoyance for a while but it has become a much >> bigger issue because the number of hung processes went from a few a week to >> a few hundred a day. >> >> >> -- >> Alvin Starr || voice: (905)513-7688 >> Netvel Inc. || Cell: (416)806-0133 >> alvin at netvel.net || >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://lists.gluster.org/mailman/listinfo/gluster-users >> > > > > -- > Amar Tumballi (amarts) >-- Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170331/107ea6c1/attachment.html>
Thanks Amar, I?ll consider your recommendations. But why performance is totally different on two nodes? Will data be written to both nodes at the same time? From: Amar Tumballi<mailto:atumball at redhat.com> Sent: Friday, March 31, 2017 3:14 PM To: Alvin Starr<mailto:alvin at netvel.net> Cc: gluster-users at gluster.org List<mailto:gluster-users at gluster.org> Subject: Re: [Gluster-users] hanging httpd processes. On Fri, Mar 31, 2017 at 12:29 PM, Amar Tumballi <atumball at redhat.com<mailto:atumball at redhat.com>> wrote: Hi Alvin, Thanks for the dump output. It helped a bit. For now, recommend turning off open-behind and read-ahead performance translators for you to get rid of this situation, As I noticed hung FLUSH operations from these translators. Looks like I gave wrong advise by looking at below snippet: [global.callpool.stack.61] stack=0x7f6c6f628f04 uid=48 gid=48 pid=11077 unique=10048797 lk-owner=a73ae5bdb5fcd0d2 op=FLUSH type=1 cnt=5 [global.callpool.stack.61.frame.1] frame=0x7f6c6f793d88 ref_count=0 translator=edocs-production-write-behind complete=0 parent=edocs-production-read-ahead wind_from=ra_flush wind_to=FIRST_CHILD (this)->fops->flush unwind_to=ra_flush_cbk [global.callpool.stack.61.frame.2] frame=0x7f6c6f796c90 ref_count=1 translator=edocs-production-read-ahead complete=0 parent=edocs-production-open-behind wind_from=default_flush_resume wind_to=FIRST_CHILD(this)->fops->flush unwind_to=default_flush_cbk [global.callpool.stack.61.frame.3] frame=0x7f6c6f79b724 ref_count=1 translator=edocs-production-open-behind complete=0 parent=edocs-production wind_from=io_stats_flush wind_to=FIRST_CHILD(this)->fops->flush unwind_to=io_stats_flush_cbk [global.callpool.stack.61.frame.4] frame=0x7f6c6f79b474 ref_count=1 translator=edocs-production complete=0 parent=fuse wind_from=fuse_flush_resume wind_to=FIRST_CHILD(this)->fops->flush unwind_to=fuse_err_cbk [global.callpool.stack.61.frame.5] frame=0x7f6c6f796684 ref_count=1 translator=fuse complete=0 Mos probably, issue is with write-behind's flush. So please turn off write-behind and test. If you don't have any hung httpd processes, please let us know. -Amar -Amar On Wed, Mar 29, 2017 at 6:56 AM, Alvin Starr <alvin at netvel.net<mailto:alvin at netvel.net>> wrote: We are running gluster 3.8.9-1 on Centos 7.3.1611 for the servers and on the clients 3.7.11-2 on Centos 6.8 We are seeing httpd processes hang in fuse_request_send or sync_page. These calls are from PHP 5.3.3-48 scripts I am attaching a tgz file that contains the process dump from glusterfsd and the hung pids along with the offending pid's stacks from /proc/{pid}/stack. This has been a low level annoyance for a while but it has become a much bigger issue because the number of hung processes went from a few a week to a few hundred a day. -- Alvin Starr || voice: (905)513-7688 Netvel Inc. || Cell: (416)806-0133 alvin at netvel.net<mailto:alvin at netvel.net> || _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org<mailto:Gluster-users at gluster.org> http://lists.gluster.org/mailman/listinfo/gluster-users -- Amar Tumballi (amarts) -- Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170331/9facb31d/attachment.html>
Sorry, I replied based on the wrong title, forget about this. From: Yong Zhang<mailto:hiscal at outlook.com> Sent: Saturday, April 1, 2017 1:49 AM To: Amar Tumballi<mailto:atumball at redhat.com>; Alvin Starr<mailto:alvin at netvel.net> Cc: gluster-users at gluster.org List<mailto:gluster-users at gluster.org> Subject: Re: [Gluster-users] hanging httpd processes. Thanks Amar, I?ll consider your recommendations. But why performance is totally different on two nodes? Will data be written to both nodes at the same time? From: Amar Tumballi<mailto:atumball at redhat.com> Sent: Friday, March 31, 2017 3:14 PM To: Alvin Starr<mailto:alvin at netvel.net> Cc: gluster-users at gluster.org List<mailto:gluster-users at gluster.org> Subject: Re: [Gluster-users] hanging httpd processes. On Fri, Mar 31, 2017 at 12:29 PM, Amar Tumballi <atumball at redhat.com<mailto:atumball at redhat.com>> wrote: Hi Alvin, Thanks for the dump output. It helped a bit. For now, recommend turning off open-behind and read-ahead performance translators for you to get rid of this situation, As I noticed hung FLUSH operations from these translators. Looks like I gave wrong advise by looking at below snippet: [global.callpool.stack.61] stack=0x7f6c6f628f04 uid=48 gid=48 pid=11077 unique=10048797 lk-owner=a73ae5bdb5fcd0d2 op=FLUSH type=1 cnt=5 [global.callpool.stack.61.frame.1] frame=0x7f6c6f793d88 ref_count=0 translator=edocs-production-write-behind complete=0 parent=edocs-production-read-ahead wind_from=ra_flush wind_to=FIRST_CHILD (this)->fops->flush unwind_to=ra_flush_cbk [global.callpool.stack.61.frame.2] frame=0x7f6c6f796c90 ref_count=1 translator=edocs-production-read-ahead complete=0 parent=edocs-production-open-behind wind_from=default_flush_resume wind_to=FIRST_CHILD(this)->fops->flush unwind_to=default_flush_cbk [global.callpool.stack.61.frame.3] frame=0x7f6c6f79b724 ref_count=1 translator=edocs-production-open-behind complete=0 parent=edocs-production wind_from=io_stats_flush wind_to=FIRST_CHILD(this)->fops->flush unwind_to=io_stats_flush_cbk [global.callpool.stack.61.frame.4] frame=0x7f6c6f79b474 ref_count=1 translator=edocs-production complete=0 parent=fuse wind_from=fuse_flush_resume wind_to=FIRST_CHILD(this)->fops->flush unwind_to=fuse_err_cbk [global.callpool.stack.61.frame.5] frame=0x7f6c6f796684 ref_count=1 translator=fuse complete=0 Mos probably, issue is with write-behind's flush. So please turn off write-behind and test. If you don't have any hung httpd processes, please let us know. -Amar -Amar On Wed, Mar 29, 2017 at 6:56 AM, Alvin Starr <alvin at netvel.net<mailto:alvin at netvel.net>> wrote: We are running gluster 3.8.9-1 on Centos 7.3.1611 for the servers and on the clients 3.7.11-2 on Centos 6.8 We are seeing httpd processes hang in fuse_request_send or sync_page. These calls are from PHP 5.3.3-48 scripts I am attaching a tgz file that contains the process dump from glusterfsd and the hung pids along with the offending pid's stacks from /proc/{pid}/stack. This has been a low level annoyance for a while but it has become a much bigger issue because the number of hung processes went from a few a week to a few hundred a day. -- Alvin Starr || voice: (905)513-7688 Netvel Inc. || Cell: (416)806-0133 alvin at netvel.net<mailto:alvin at netvel.net> || _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org<mailto:Gluster-users at gluster.org> http://lists.gluster.org/mailman/listinfo/gluster-users -- Amar Tumballi (amarts) -- Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170401/da3ac1aa/attachment.html>