Philip Poten
2012-Jun-11 20:00 UTC
[Gluster-users] Memory leak with glusterfs NFS on 3.2.6
Hi, we're running a distributed-replicated setup for our images, and while we use a caching proxy for the hotset, quite a few requests land on glusterfs (3.2.6 on squeeze). Since glusterfs fuse client experiences regular hangs which require reboots (I couldnt yet find a solution to that), we run on NFS. NFS however, specifically the glusterfs process, eats memory like crazy, around 3-4GB a week, until we (have to) restart it. Does/did anybody experience this problem, and if so, did you find a way to mitigate it? regards, Philip
Pranith Kumar Karampuri
2012-Jun-12 06:52 UTC
[Gluster-users] Memory leak with glusterfs NFS on 3.2.6
hi Philip, When this happens could you post the statedump of the process to see what is causing this memory usage. Steps to grab statedump of the process: 1) kill -USR1 <pid-of-nfs-process> 2) the file is located at /tmp/glusterdump.<pid-of-nfs-process> Pranith. ----- Original Message ----- From: "Philip Poten" <philip.poten at gmail.com> To: gluster-users at gluster.org Sent: Tuesday, June 12, 2012 1:30:17 AM Subject: [Gluster-users] Memory leak with glusterfs NFS on 3.2.6 Hi, we're running a distributed-replicated setup for our images, and while we use a caching proxy for the hotset, quite a few requests land on glusterfs (3.2.6 on squeeze). Since glusterfs fuse client experiences regular hangs which require reboots (I couldnt yet find a solution to that), we run on NFS. NFS however, specifically the glusterfs process, eats memory like crazy, around 3-4GB a week, until we (have to) restart it. Does/did anybody experience this problem, and if so, did you find a way to mitigate it? regards, Philip _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Dan Bretherton
2012-Jun-12 10:00 UTC
[Gluster-users] Memory leak with glusterfs NFS on 3.2.6
On 06/12/2012 05:15 AM, gluster-users-request at gluster.org wrote:> Date: Mon, 11 Jun 2012 22:00:17 +0200 > From: Philip Poten<philip.poten at gmail.com> > Subject: [Gluster-users] Memory leak with glusterfs NFS on 3.2.6 > To: gluster-users at gluster.org > Message-ID: > <CAO3z7E6GXZMMfpJA2MTh6DkD44Vr5shc40V1mhUFti3Mh_PB4A at mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > Hi, > > we're running a distributed-replicated setup for our images, and while > we use a caching proxy for the hotset, quite a few requests land on > glusterfs (3.2.6 on squeeze). Since glusterfs fuse client experiences > regular hangs which require reboots (I couldnt yet find a solution to > that), we run on NFS. > > NFS however, specifically the glusterfs process, eats memory like > crazy, around 3-4GB a week, until we (have to) restart it. > > Does/did anybody experience this problem, and if so, did you find a > way to mitigate it? > > regards, > Philip >I wonder if this memory leak is the cause of the NFS performance degradation I reported in April. I just went looking for a link to the relevant thread, but links to mailing lists from the gluster.org web site appear to be broken or are redirected to http://www.gluster.org/about. The best I can do is this (which times out at the time of writing but might be available later). http://thr3ads.net/gluster-users/2012/04/1855706-Frequent-glusterd-restarts-needed-to-avoid-NFS-performance-degradation As I said in that discussion, I have to restart glusterd every day on machines exporting NFS to avoid NFS becoming unusable after a few days. To avoid restarting glusterd on the storage servers (which have other important GlusterFS related things to do besides NFS), and to balance the NFS load, my compute servers export NFS to themselves as described in this Gluster community article. http://community.gluster.org/a/nfs-performance-with-fuse-client-redundancy/ Until recently I thought the daily glusterd restarts were having no adverse side effects, but a couple of users recently reported applications crashing for no apparent reason in the middle of the night and some of my maintenance tasks (eg lengthy chmod and chown operations) have been affected as well. Therefore I really would like to find a better solution to this problem. If it's being caused by a memory leak it should be relatively easy to fix I would have thought (not knowing anything about the code...). It certainly should be easy to reproduce, but as far as I know none of the developers have acknowledged that the performance degradation problem exists. -Dan.