thr3ads.net - Gluster users - [Gluster-users] [Gluster-devel] Memory leak in GlusterFS FUSE client [Jan 2016]

If this information is useful, please help other people find it:
Share via:

Oleksandr Natalenko

2016-Jan-03 14:23 UTC

[Gluster-users] [Gluster-devel] Memory leak in GlusterFS FUSE client

Another Valgrind run.

I did the following:

==valgrind --leak-check=full --show-leak-kinds=all --log-
file="valgrind_fuse.log" /usr/bin/glusterfs -N --volfile-
server=some.server.com --volfile-id=somevolume /mnt/volume
==
then cd to /mnt/volume and find . -type f. After traversing some part of 
hierarchy I've stopped find and did umount /mnt/volume. Here is 
valgrind_fuse.log file:

https://gist.github.com/7e2679e1e72e48f75a2b

On ??????, 31 ?????? 2015 ?. 14:09:03 EET Soumya Koduri
wrote:> On 12/28/2015 02:32 PM, Soumya Koduri wrote:
> > ----- Original Message -----
> > 
> >> From: "Pranith Kumar Karampuri" <pkarampu at
redhat.com>
> >> To: "Oleksandr Natalenko" <oleksandr at
natalenko.name>, "Soumya Koduri"
> >> <skoduri at redhat.com> Cc: gluster-users at gluster.org,
> >> gluster-devel at gluster.org
> >> Sent: Monday, December 28, 2015 9:32:07 AM
> >> Subject: Re: [Gluster-devel] [Gluster-users] Memory leak in
GlusterFS
> >> FUSE client>> 
> >> On 12/26/2015 04:45 AM, Oleksandr Natalenko wrote:
> >>> Also, here is valgrind output with our custom tool, that does
GlusterFS
> >>> volume
> >>> traversing (with simple stats) just like find tool. In this
case
> >>> NFS-Ganesha
> >>> is not used.
> >>> 
> >>> https://gist.github.com/e4602a50d3c98f7a2766
> >> 
> >> hi Oleksandr,
> >> 
> >>         I went through the code. Both NFS Ganesha and the custom
tool use
> >> 
> >> gfapi and the leak is stemming from that. I am not very familiar
with
> >> this part of code but there seems to be one inode_unref() that is
> >> missing in failure path of resolution. Not sure if that is
corresponding
> >> to the leaks.
> >> 
> >> Soumya,
> >> 
> >>          Could this be the issue? review.gluster.org seems to be
down. So
> >> 
> >> couldn't send the patch. Please ping me on IRC.
> >> diff --git a/api/src/glfs-resolve.c b/api/src/glfs-resolve.c
> >> index b5efcba..52b538b 100644
> >> --- a/api/src/glfs-resolve.c
> >> +++ b/api/src/glfs-resolve.c
> >> @@ -467,9 +467,11 @@ priv_glfs_resolve_at (struct glfs *fs,
xlator_t
> >> *subvol, inode_t *at,
> >> 
> >>                   }
> >>           
> >>           }
> >> 
> >> -       if (parent && next_component)
> >> +       if (parent && next_component) {
> >> +               inode_unref (parent);
> >> +               parent = NULL;
> >> 
> >>                   /* resolution failed mid-way */
> >>                   goto out;
> >> 
> >> +        }
> >> 
> >>           /* At this point, all components up to the last parent
> >>           directory
> >>           
> >>              have been resolved successfully (@parent). Resolution
of
> >> 
> >> basename
> > 
> > yes. This could be one of the reasons. There are few leaks with
respect to
> > inode references in gfAPI. See below.
> > 
> > 
> > On GlusterFS side, looks like majority of the leaks are related to
inodes
> > and their contexts. Possible reasons which I can think of are:
> > 
> > 1) When there is a graph switch, old inode table and their entries are
not
> > purged (this is a known issue). There was an effort put to fix this
> > issue. But I think it had other side-effects and hence not been
applied.
> > Maybe we should revive those changes again.
> > 
> > 2) With regard to above, old entries can be purged in case if any
request
> > comes with the reference to old inode (as part of
'glfs_resolve_inode'),
> > provided their reference counts are properly decremented. But this is
not
> > happening at the moment in gfapi.
> > 
> > 3) Applications should hold and release their reference as needed and
> > required. There are certain fixes needed in this area as well
(including
> > the fix provided by Pranith above).> 
> >  From code-inspection, have made changes to fix few leaks of case (2)
&
> >  (3) with respect to gfAPI.>  
> > 	http://review.gluster.org/#/c/13096 (yet to test the changes)
> > 
> > I haven't yet narrowed down any suspects pertaining to only
NFS-Ganesha.
> > Will re-check and update.
> I tried similar tests but with smaller set of files. I could see the
> inode_ctx leak even without graph switches involved. I suspect that
> could be because valgrind checks for memory leaks during the exit of the
> program. We call 'glfs_fini()' to cleanup the memory  being used by
> gfapi during exit. Those inode_ctx leaks are result of some inodes being
> left during inode_table cleanup. I have submitted below patch to address
> this issue.
> 
> http://review.gluster.org/13125
> 
> However this shall help only if there are volume un-exports being
> involved or program being exited. It still doesn't address the actual
> RAM being consumed by the application when active.
> 
> Thanks,
> Soumya
> 
> > Thanks,
> > Soumya
> > 
> >> Pranith
> >> 
> >>> One may see GlusterFS-related leaks here as well.
> >>> 
> >>> On ????????, 25 ?????? 2015 ?. 20:28:13 EET Soumya Koduri
wrote:
> >>>> On 12/24/2015 09:17 PM, Oleksandr Natalenko wrote:
> >>>>> Another addition: it seems to be GlusterFS API library
memory leak
> >>>>> because NFS-Ganesha also consumes huge amount of
memory while doing
> >>>>> ordinary "find . -type f" via NFSv4.2 on
remote client. Here is memory
> >>>>> usage:
> >>>>> 
> >>>>> ==> >>>>> root      5416 34.2 78.5
2047176 1480552 ?     Ssl  12:02 117:54
> >>>>> /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f
> >>>>> /etc/ganesha/ganesha.conf -N NIV_EVENT
> >>>>> ==> >>>>> 
> >>>>> 1.4G is too much for simple stat() :(.
> >>>>> 
> >>>>> Ideas?
> >>>> 
> >>>> nfs-ganesha also has cache layer which can scale to
millions of entries
> >>>> depending on the number of files/directories being looked
upon. However
> >>>> there are parameters to tune it. So either try stat with
few entries or
> >>>> add below block in nfs-ganesha.conf file, set low limits
and check the
> >>>> difference. That may help us narrow down how much memory
actually
> >>>> consumed by core nfs-ganesha and gfAPI.
> >>>> 
> >>>> CACHEINODE {
> >>>> 
> >>>> 	Cache_Size(uint32, range 1 to UINT32_MAX, default 32633);
# cache size
> >>>> 	Entries_HWMark(uint32, range 1 to UINT32_MAX, default
100000); #Max
> >>>> 	no.
> >>>> 
> >>>> of entries in the cache.
> >>>> }
> >>>> 
> >>>> Thanks,
> >>>> Soumya
> >>>> 
> >>>>> 24.12.2015 16:32, Oleksandr Natalenko ???????:
> >>>>>> Still actual issue for 3.7.6. Any suggestions?
> >>>>>> 
> >>>>>> 24.09.2015 10:14, Oleksandr Natalenko ???????:
> >>>>>>> In our GlusterFS deployment we've
encountered something like memory
> >>>>>>> leak in GlusterFS FUSE client.
> >>>>>>> 
> >>>>>>> We use replicated (?2) GlusterFS volume to
store mail (exim+dovecot,
> >>>>>>> maildir format). Here is inode stats for both
bricks and mountpoint:
> >>>>>>> 
> >>>>>>> ==> >>>>>>> Brick 1
(Server 1):
> >>>>>>> 
> >>>>>>> Filesystem                                    
Inodes
> >>>>>>> IUsed
> >>>>>>> 
> >>>>>>>        IFree IUse% Mounted on
> >>>>>>> 
> >>>>>>> /dev/mapper/vg_vd1_misc-lv08_mail             
578768144
> >>>>>>> 10954918
> >>>>>>> 
> >>>>>>>    567813226    2% /bricks/r6sdLV08_vd1_mail
> >>>>>>> 
> >>>>>>> Brick 2 (Server 2):
> >>>>>>> 
> >>>>>>> Filesystem                                    
Inodes
> >>>>>>> IUsed
> >>>>>>> 
> >>>>>>>        IFree IUse% Mounted on
> >>>>>>> 
> >>>>>>> /dev/mapper/vg_vd0_misc-lv07_mail             
578767984
> >>>>>>> 10954913
> >>>>>>> 
> >>>>>>>    567813071    2% /bricks/r6sdLV07_vd0_mail
> >>>>>>> 
> >>>>>>> Mountpoint (Server 3):
> >>>>>>> 
> >>>>>>> Filesystem                              Inodes
IUsed      IFree
> >>>>>>> IUse% Mounted on
> >>>>>>> glusterfs.xxx:mail                   578767760
10954915  567812845
> >>>>>>> 2% /var/spool/mail/virtual
> >>>>>>> ==> >>>>>>> 
> >>>>>>> glusterfs.xxx domain has two A records for
both Server 1 and Server
> >>>>>>> 2.
> >>>>>>> 
> >>>>>>> Here is volume info:
> >>>>>>> 
> >>>>>>> ==> >>>>>>> Volume
Name: mail
> >>>>>>> Type: Replicate
> >>>>>>> Volume ID:
f564e85c-7aa6-4170-9417-1f501aa98cd2
> >>>>>>> Status: Started
> >>>>>>> Number of Bricks: 1 x 2 = 2
> >>>>>>> Transport-type: tcp
> >>>>>>> Bricks:
> >>>>>>> Brick1:
server1.xxx:/bricks/r6sdLV08_vd1_mail/mail
> >>>>>>> Brick2:
server2.xxx:/bricks/r6sdLV07_vd0_mail/mail
> >>>>>>> Options Reconfigured:
> >>>>>>> nfs.rpc-auth-allow: 1.2.4.0/24,4.5.6.0/24
> >>>>>>> features.cache-invalidation-timeout: 10
> >>>>>>> performance.stat-prefetch: off
> >>>>>>> performance.quick-read: on
> >>>>>>> performance.read-ahead: off
> >>>>>>> performance.flush-behind: on
> >>>>>>> performance.write-behind: on
> >>>>>>> performance.io-thread-count: 4
> >>>>>>> performance.cache-max-file-size: 1048576
> >>>>>>> performance.cache-size: 67108864
> >>>>>>> performance.readdir-ahead: off
> >>>>>>> ==> >>>>>>> 
> >>>>>>> Soon enough after mounting and exim/dovecot
start, glusterfs client
> >>>>>>> process begins to consume huge amount of RAM:
> >>>>>>> 
> >>>>>>> ==> >>>>>>> user at
server3 ~$ ps aux | grep glusterfs | grep mail
> >>>>>>> root     28895 14.4 15.0 15510324 14908868 ?  
Ssl  Sep03 4310:05
> >>>>>>> /usr/sbin/glusterfs --fopen-keep-cache
--direct-io-mode=disable
> >>>>>>> --volfile-server=glusterfs.xxx
--volfile-id=mail
> >>>>>>> /var/spool/mail/virtual
> >>>>>>> ==> >>>>>>> 
> >>>>>>> That is, ~15 GiB of RAM.
> >>>>>>> 
> >>>>>>> Also we've tried to use mountpoint withing
separate KVM VM with 2 or
> >>>>>>> 3
> >>>>>>> GiB of RAM, and soon after starting mail
daemons got OOM killer for
> >>>>>>> glusterfs client process.
> >>>>>>> 
> >>>>>>> Mounting same share via NFS works just fine.
Also, we have much less
> >>>>>>> iowait and loadavg on client side with NFS.
> >>>>>>> 
> >>>>>>> Also, we've tried to change IO threads
count and cache size in order
> >>>>>>> to limit memory usage with no luck. As you can
see, total cache size
> >>>>>>> is 4?64==256 MiB (compare to 15 GiB).
> >>>>>>> 
> >>>>>>> Enabling-disabling stat-prefetch, read-ahead
and readdir-ahead
> >>>>>>> didn't
> >>>>>>> help as well.
> >>>>>>> 
> >>>>>>> Here are volume memory stats:
> >>>>>>> 
> >>>>>>> ==> >>>>>>> Memory
status for volume : mail
> >>>>>>> ----------------------------------------------
> >>>>>>> Brick :
server1.xxx:/bricks/r6sdLV08_vd1_mail/mail
> >>>>>>> Mallinfo
> >>>>>>> --------
> >>>>>>> Arena    : 36859904
> >>>>>>> Ordblks  : 10357
> >>>>>>> Smblks   : 519
> >>>>>>> Hblks    : 21
> >>>>>>> Hblkhd   : 30515200
> >>>>>>> Usmblks  : 0
> >>>>>>> Fsmblks  : 53440
> >>>>>>> Uordblks : 18604144
> >>>>>>> Fordblks : 18255760
> >>>>>>> Keepcost : 114112
> >>>>>>> 
> >>>>>>> Mempool Stats
> >>>>>>> -------------
> >>>>>>> Name                            HotCount
ColdCount PaddedSizeof
> >>>>>>> AllocCount MaxAlloc   Misses Max-StdAlloc
> >>>>>>> ----                            --------
--------- ------------
> >>>>>>> ---------- -------- -------- ------------
> >>>>>>> mail-server:fd_t                       0     
1024          108
> >>>>>>> 30773120      137        0            0
> >>>>>>> mail-server:dentry_t               16110      
274           84
> >>>>>>> 235676148    16384  1106499         1152
> >>>>>>> mail-server:inode_t                16363      
21          156
> >>>>>>> 237216876    16384  1876651         1169
> >>>>>>> mail-trash:fd_t                        0     
1024          108
> >>>>>>> 
> >>>>>>>     0        0        0            0
> >>>>>>> 
> >>>>>>> mail-trash:dentry_t                    0    
32768           84
> >>>>>>> 
> >>>>>>>     0        0        0            0
> >>>>>>> 
> >>>>>>> mail-trash:inode_t                     4    
32764          156
> >>>>>>> 
> >>>>>>>     4        4        0            0
> >>>>>>> 
> >>>>>>> mail-trash:trash_local_t               0      
64         8628
> >>>>>>> 
> >>>>>>>     0        0        0            0
> >>>>>>> 
> >>>>>>> mail-changetimerecorder:gf_ctr_local_t        
0        64
> >>>>>>> 16540          0        0        0           
0
> >>>>>>> mail-changelog:rpcsvc_request_t         0     
8         2828
> >>>>>>> 
> >>>>>>>      0        0        0            0
> >>>>>>> 
> >>>>>>> mail-changelog:changelog_local_t         0    
64          116
> >>>>>>> 
> >>>>>>>       0        0        0            0
> >>>>>>> 
> >>>>>>> mail-bitrot-stub:br_stub_local_t         0    
512           84
> >>>>>>> 79204        4        0            0
> >>>>>>> mail-locks:pl_local_t                  0      
32          148
> >>>>>>> 6812757        4        0            0
> >>>>>>> mail-upcall:upcall_local_t             0      
512          108
> >>>>>>> 
> >>>>>>>     0        0        0            0
> >>>>>>> 
> >>>>>>> mail-marker:marker_local_t             0      
128          332
> >>>>>>> 64980        3        0            0
> >>>>>>> mail-quota:quota_local_t               0      
64          476
> >>>>>>> 
> >>>>>>>     0        0        0            0
> >>>>>>> 
> >>>>>>> mail-server:rpcsvc_request_t           0      
512         2828
> >>>>>>> 45462533       34        0            0
> >>>>>>> glusterfs:struct saved_frame           0      
8          124
> >>>>>>> 
> >>>>>>>     2        2        0            0
> >>>>>>> 
> >>>>>>> glusterfs:struct rpc_req               0      
8          588
> >>>>>>> 
> >>>>>>>     2        2        0            0
> >>>>>>> 
> >>>>>>> glusterfs:rpcsvc_request_t             1      
7         2828
> >>>>>>> 
> >>>>>>>     2        1        0            0
> >>>>>>> 
> >>>>>>> glusterfs:log_buf_t                    5      
251          140
> >>>>>>> 3452        6        0            0
> >>>>>>> glusterfs:data_t                     242    
16141           52
> >>>>>>> 480115498      664        0            0
> >>>>>>> glusterfs:data_pair_t                230    
16153           68
> >>>>>>> 179483528      275        0            0
> >>>>>>> glusterfs:dict_t                      23     
4073          140
> >>>>>>> 303751675      627        0            0
> >>>>>>> glusterfs:call_stub_t                  0     
1024         3764
> >>>>>>> 45290655       34        0            0
> >>>>>>> glusterfs:call_stack_t                 1     
1023         1708
> >>>>>>> 43598469       34        0            0
> >>>>>>> glusterfs:call_frame_t                 1     
4095          172
> >>>>>>> 336219655      184        0            0
> >>>>>>> ----------------------------------------------
> >>>>>>> Brick :
server2.xxx:/bricks/r6sdLV07_vd0_mail/mail
> >>>>>>> Mallinfo
> >>>>>>> --------
> >>>>>>> Arena    : 38174720
> >>>>>>> Ordblks  : 9041
> >>>>>>> Smblks   : 507
> >>>>>>> Hblks    : 21
> >>>>>>> Hblkhd   : 30515200
> >>>>>>> Usmblks  : 0
> >>>>>>> Fsmblks  : 51712
> >>>>>>> Uordblks : 19415008
> >>>>>>> Fordblks : 18759712
> >>>>>>> Keepcost : 114848
> >>>>>>> 
> >>>>>>> Mempool Stats
> >>>>>>> -------------
> >>>>>>> Name                            HotCount
ColdCount PaddedSizeof
> >>>>>>> AllocCount MaxAlloc   Misses Max-StdAlloc
> >>>>>>> ----                            --------
--------- ------------
> >>>>>>> ---------- -------- -------- ------------
> >>>>>>> mail-server:fd_t                       0     
1024          108
> >>>>>>> 2373075      133        0            0
> >>>>>>> mail-server:dentry_t               14114     
2270           84
> >>>>>>> 3513654    16384     2300          267
> >>>>>>> mail-server:inode_t                16374      
10          156
> >>>>>>> 6766642    16384   194635         1279
> >>>>>>> mail-trash:fd_t                        0     
1024          108
> >>>>>>> 
> >>>>>>>     0        0        0            0
> >>>>>>> 
> >>>>>>> mail-trash:dentry_t                    0    
32768           84
> >>>>>>> 
> >>>>>>>     0        0        0            0
> >>>>>>> 
> >>>>>>> mail-trash:inode_t                     4    
32764          156
> >>>>>>> 
> >>>>>>>     4        4        0            0
> >>>>>>> 
> >>>>>>> mail-trash:trash_local_t               0      
64         8628
> >>>>>>> 
> >>>>>>>     0        0        0            0
> >>>>>>> 
> >>>>>>> mail-changetimerecorder:gf_ctr_local_t        
0        64
> >>>>>>> 16540          0        0        0           
0
> >>>>>>> mail-changelog:rpcsvc_request_t         0     
8         2828
> >>>>>>> 
> >>>>>>>      0        0        0            0
> >>>>>>> 
> >>>>>>> mail-changelog:changelog_local_t         0    
64          116
> >>>>>>> 
> >>>>>>>       0        0        0            0
> >>>>>>> 
> >>>>>>> mail-bitrot-stub:br_stub_local_t         0    
512           84
> >>>>>>> 71354        4        0            0
> >>>>>>> mail-locks:pl_local_t                  0      
32          148
> >>>>>>> 8135032        4        0            0
> >>>>>>> mail-upcall:upcall_local_t             0      
512          108
> >>>>>>> 
> >>>>>>>     0        0        0            0
> >>>>>>> 
> >>>>>>> mail-marker:marker_local_t             0      
128          332
> >>>>>>> 65005        3        0            0
> >>>>>>> mail-quota:quota_local_t               0      
64          476
> >>>>>>> 
> >>>>>>>     0        0        0            0
> >>>>>>> 
> >>>>>>> mail-server:rpcsvc_request_t           0      
512         2828
> >>>>>>> 12882393       30        0            0
> >>>>>>> glusterfs:struct saved_frame           0      
8          124
> >>>>>>> 
> >>>>>>>     2        2        0            0
> >>>>>>> 
> >>>>>>> glusterfs:struct rpc_req               0      
8          588
> >>>>>>> 
> >>>>>>>     2        2        0            0
> >>>>>>> 
> >>>>>>> glusterfs:rpcsvc_request_t             1      
7         2828
> >>>>>>> 
> >>>>>>>     2        1        0            0
> >>>>>>> 
> >>>>>>> glusterfs:log_buf_t                    5      
251          140
> >>>>>>> 3443        6        0            0
> >>>>>>> glusterfs:data_t                     242    
16141           52
> >>>>>>> 138743429      290        0            0
> >>>>>>> glusterfs:data_pair_t                230    
16153           68
> >>>>>>> 126649864      270        0            0
> >>>>>>> glusterfs:dict_t                      23     
4073          140
> >>>>>>> 20356289       63        0            0
> >>>>>>> glusterfs:call_stub_t                  0     
1024         3764
> >>>>>>> 13678560       31        0            0
> >>>>>>> glusterfs:call_stack_t                 1     
1023         1708
> >>>>>>> 11011561       30        0            0
> >>>>>>> glusterfs:call_frame_t                 1     
4095          172
> >>>>>>> 125764190      193        0            0
> >>>>>>> ----------------------------------------------
> >>>>>>> ==> >>>>>>> 
> >>>>>>> So, my questions are:
> >>>>>>> 
> >>>>>>> 1) what one should do to limit GlusterFS FUSE
client memory usage?
> >>>>>>> 2) what one should do to prevent client high
loadavg because of high
> >>>>>>> iowait because of multiple concurrent volume
users?
> >>>>>>> 
> >>>>>>> Server/client OS is CentOS 7.1, GlusterFS
server version is 3.7.3,
> >>>>>>> GlusterFS client version is 3.7.4.
> >>>>>>> 
> >>>>>>> Any additional info needed?
> >>>>> 
> >>>>> _______________________________________________
> >>>>> Gluster-users mailing list
> >>>>> Gluster-users at gluster.org
> >>>>> http://www.gluster.org/mailman/listinfo/gluster-users
> >>> 
> >>> _______________________________________________
> >>> Gluster-devel mailing list
> >>> Gluster-devel at gluster.org
> >>> http://www.gluster.org/mailman/listinfo/gluster-devel
> > 
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel

Vijay Bellur

2016-Jan-03 18:35 UTC

head link

[Gluster-users] [Gluster-devel] Memory leak in GlusterFS FUSE client

On 01/03/2016 09:23 AM, Oleksandr Natalenko wrote:> Another Valgrind run.
>
> I did the following:
>
> ==> valgrind --leak-check=full --show-leak-kinds=all --log-
> file="valgrind_fuse.log" /usr/bin/glusterfs -N --volfile-
> server=some.server.com --volfile-id=somevolume /mnt/volume
> ==>
> then cd to /mnt/volume and find . -type f. After traversing some part of
> hierarchy I've stopped find and did umount /mnt/volume. Here is
> valgrind_fuse.log file:
>
> https://gist.github.com/7e2679e1e72e48f75a2b
>
Can you please try the same by dropping caches before umount?

echo 3 > /proc/sys/vm/drop_caches

Gluster relies on vfs sending forgets and releases to clean up the 
inodes and the contexts in the inodes maintained by various translators.

Thanks,
Vijay

Gluster users - Jan 2016 - [Gluster-devel] Memory leak in GlusterFS FUSE client

[Gluster-users] [Gluster-devel] Memory leak in GlusterFS FUSE client

[Gluster-users] [Gluster-devel] Memory leak in GlusterFS FUSE client