Pranith Kumar Karampuri
2014-Nov-22 17:36 UTC
[Gluster-users] glusterfs and glusterfsd process utilization extremely high
On 11/22/2014 11:04 PM, Pranith Kumar Karampuri wrote:> > On 11/22/2014 10:40 PM, Pranith Kumar Karampuri wrote: >> >> On 11/22/2014 10:29 PM, Kyle Harris wrote: >>> Hello, >>> >>> I have an issue with a 3 node replicated cluster. My issue started >>> after reboot a while back. The top command would show the glusterfs >>> and glusterfsd processes eating up almost all the resources on an >>> all three nodes of the cluster. So much so that it would not run >>> the web sites that are hosted on it. The httpd processes would >>> begin to hang. I finally decided to tear down the cluster and >>> rebuild it from the ground up. I did so and then copied all the >>> data back which took all night due to the amount of data. All was >>> well during that entire copy process back to the cluster with no >>> resource spikes. > Assuming you go back to 3.5.2 > Execute the following commands: > # gluster volume set <volname> cluster.entry-self-heal off > > This should prevent httpd hangs. > > If you still find that the CPU usage is very high, execute the > following command: > # gluster volume set <volname> cluster.self-heal-daemon off > > This disables self-healing. But you should probably periodically heal > so that the data is healed by enabling self-heal-daemon using > following command: > # gluster volume set <volname> cluster.self-heal-daemon on > > Once "gluster volume heal <volname> info" shows zero entries, then > healing is complete. > > We took some steps to improve this in 3.6. But readdir in EXT4 is not > working correctly so that is probably giving problems here. Lets wait > for Vijay to merge the patch I mentioned, then things should be fine.Sorry for the inconvenience caused. We found the issue after the release is made :-(. Pranith> > Pranith >>> >>> I should note that this cluster is home to many Apache/PHP based web >>> sites. The problem starts again, however the minute I point traffic >>> back to the sites on the cluster. Before pointing traffic to it, >>> all is fine but as soon as the traffic begins to hit it, the >>> utilization again begins to spike. Note that all the sites run just >>> fine when hosted from a standard EXT4 partition. I noticed another >>> thread labeled "glusterfsd process thrashing CPU" where Pranith asks >>> if the user has directories with lots of files and I do. >>> >>> Here are some other details of my cluster: >>> - OS: CentOS 6.6 with all updates on all 3 nodes as of 11-22-2014 >>> - All 3 nodes have 8 cores with 16 GB of RAM >>> - Nodes are all formatted with EXT4 >>> - All three nodes also have the files systems mounted on them for >>> use with Apache. I have experimented with both NFS and Fuse mounts >>> and it doesn't seem to make a difference which I use for this >>> particular problem. I am currently using Fuse. >>> - Approximately 135 GB of data. Some deep directories with many >>> small files. >>> - No optimization or changes have been made to the cluster . . . it >>> is running with default options >>> - Gluster version 3.6.1-1 installed from RPMs >>> - Note the issue originally occurred on version 3.5.2 but I updated >>> before rebuilding it in hopes that would fix it (it didn't) >>> >>> Can anyone give me guidance on how to tackle this problem? I am >>> hoping perhaps Pranith can give some details as to why the question >>> about many files and how to proceed given my situation. I know >>> others have commented about having many small files with regard to >>> performance but when the processors are not spiked, performance has >>> been acceptable. Any help would be greatly appreciated. >>> >> Kyle, >> 3.6.1 and EXT4 has a problem because of 64 bits offset. Afr-v2 >> implementation introduced this problem. We thought the following >> patch is merged but it didn't :-( http://review.gluster.com/8201. >> Please don't use 3.6.1 with EXT4 >> >> Vijay, >> Please merge http://review.gluster.com/8201 >> >> Pranith >>> -- >>> Kyle >>> >>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> http://supercolony.gluster.org/mailman/listinfo/gluster-users >> > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20141122/756f489a/attachment.html>
Kyle Harris
2014-Nov-22 18:20 UTC
[Gluster-users] glusterfs and glusterfsd process utilization extremely high
Hi Pranith, Thank you very much for the quick reply and the information. I am in the process now of recreating the cluster using XFS. This all brings up a few questions: - I assume the change from EXT4 to XFS will correct the problem with readdir (in other words, the issue is not present in XFS)? - Do you have any idea when the patch for this might be out? My reason for asking is that I have another cluster that has been updated to 3.6 and is running on EXT4 but does not yet have an issue. This concerns me so I am hoping the patch will be out soon? - What exactly does cluster.entry-self-heal do? I can't seem to find a description of it? - I assume from your posts that the reason the cluster is fine until traffic hits it is because the self-heal is not happening until traffic causes the files to be read. Is that how it works? Thank you again for the fast response and the great product! ---- Kyle On Sat, Nov 22, 2014 at 11:36 AM, Pranith Kumar Karampuri < pkarampu at redhat.com> wrote:> > On 11/22/2014 11:04 PM, Pranith Kumar Karampuri wrote: > > > On 11/22/2014 10:40 PM, Pranith Kumar Karampuri wrote: > > > On 11/22/2014 10:29 PM, Kyle Harris wrote: > > Hello, > > I have an issue with a 3 node replicated cluster. My issue started > after reboot a while back. The top command would show the glusterfs and > glusterfsd processes eating up almost all the resources on an all three > nodes of the cluster. So much so that it would not run the web sites that > are hosted on it. The httpd processes would begin to hang. I finally > decided to tear down the cluster and rebuild it from the ground up. I did > so and then copied all the data back which took all night due to the amount > of data. All was well during that entire copy process back to the cluster > with no resource spikes. > > Assuming you go back to 3.5.2 > Execute the following commands: > # gluster volume set <volname> cluster.entry-self-heal off > > This should prevent httpd hangs. > > If you still find that the CPU usage is very high, execute the following > command: > # gluster volume set <volname> cluster.self-heal-daemon off > > This disables self-healing. But you should probably periodically heal so > that the data is healed by enabling self-heal-daemon using following > command: > # gluster volume set <volname> cluster.self-heal-daemon on > > Once "gluster volume heal <volname> info" shows zero entries, then healing > is complete. > > We took some steps to improve this in 3.6. But readdir in EXT4 is not > working correctly so that is probably giving problems here. Lets wait for > Vijay to merge the patch I mentioned, then things should be fine. > > Sorry for the inconvenience caused. We found the issue after the release > is made :-(. > > Pranith > > > Pranith > > > I should note that this cluster is home to many Apache/PHP based web > sites. The problem starts again, however the minute I point traffic back > to the sites on the cluster. Before pointing traffic to it, all is fine > but as soon as the traffic begins to hit it, the utilization again begins > to spike. Note that all the sites run just fine when hosted from a > standard EXT4 partition. I noticed another thread labeled "glusterfsd > process thrashing CPU" where Pranith asks if the user has directories with > lots of files and I do. > > Here are some other details of my cluster: > - OS: CentOS 6.6 with all updates on all 3 nodes as of 11-22-2014 > - All 3 nodes have 8 cores with 16 GB of RAM > - Nodes are all formatted with EXT4 > - All three nodes also have the files systems mounted on them for use with > Apache. I have experimented with both NFS and Fuse mounts and it doesn't > seem to make a difference which I use for this particular problem. I am > currently using Fuse. > - Approximately 135 GB of data. Some deep directories with many small > files. > - No optimization or changes have been made to the cluster . . . it is > running with default options > - Gluster version 3.6.1-1 installed from RPMs > - Note the issue originally occurred on version 3.5.2 but I updated before > rebuilding it in hopes that would fix it (it didn't) > > Can anyone give me guidance on how to tackle this problem? I am hoping > perhaps Pranith can give some details as to why the question about many > files and how to proceed given my situation. I know others have commented > about having many small files with regard to performance but when the > processors are not spiked, performance has been acceptable. Any help would > be greatly appreciated. > > Kyle, > 3.6.1 and EXT4 has a problem because of 64 bits offset. Afr-v2 > implementation introduced this problem. We thought the following patch is > merged but it didn't :-( http://review.gluster.com/8201. Please don't use > 3.6.1 with EXT4 > > Vijay, > Please merge http://review.gluster.com/8201 > > Pranith > > -- > Kyle > > > > _______________________________________________ > Gluster-users mailing listGluster-users at gluster.orghttp://supercolony.gluster.org/mailman/listinfo/gluster-users > > > > > > _______________________________________________ > Gluster-users mailing listGluster-users at gluster.orghttp://supercolony.gluster.org/mailman/listinfo/gluster-users > > >-- Kyle A. Harris Kyle at TheHarrisHome.com 615-364-6752 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20141122/e37f74a8/attachment.html>