Oleksandr Natalenko
2016-Jan-03 14:23 UTC
[Gluster-users] [Gluster-devel] Memory leak in GlusterFS FUSE client
Another Valgrind run. I did the following: ==valgrind --leak-check=full --show-leak-kinds=all --log- file="valgrind_fuse.log" /usr/bin/glusterfs -N --volfile- server=some.server.com --volfile-id=somevolume /mnt/volume == then cd to /mnt/volume and find . -type f. After traversing some part of hierarchy I've stopped find and did umount /mnt/volume. Here is valgrind_fuse.log file: https://gist.github.com/7e2679e1e72e48f75a2b On ??????, 31 ?????? 2015 ?. 14:09:03 EET Soumya Koduri wrote:> On 12/28/2015 02:32 PM, Soumya Koduri wrote: > > ----- Original Message ----- > > > >> From: "Pranith Kumar Karampuri" <pkarampu at redhat.com> > >> To: "Oleksandr Natalenko" <oleksandr at natalenko.name>, "Soumya Koduri" > >> <skoduri at redhat.com> Cc: gluster-users at gluster.org, > >> gluster-devel at gluster.org > >> Sent: Monday, December 28, 2015 9:32:07 AM > >> Subject: Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS > >> FUSE client>> > >> On 12/26/2015 04:45 AM, Oleksandr Natalenko wrote: > >>> Also, here is valgrind output with our custom tool, that does GlusterFS > >>> volume > >>> traversing (with simple stats) just like find tool. In this case > >>> NFS-Ganesha > >>> is not used. > >>> > >>> https://gist.github.com/e4602a50d3c98f7a2766 > >> > >> hi Oleksandr, > >> > >> I went through the code. Both NFS Ganesha and the custom tool use > >> > >> gfapi and the leak is stemming from that. I am not very familiar with > >> this part of code but there seems to be one inode_unref() that is > >> missing in failure path of resolution. Not sure if that is corresponding > >> to the leaks. > >> > >> Soumya, > >> > >> Could this be the issue? review.gluster.org seems to be down. So > >> > >> couldn't send the patch. Please ping me on IRC. > >> diff --git a/api/src/glfs-resolve.c b/api/src/glfs-resolve.c > >> index b5efcba..52b538b 100644 > >> --- a/api/src/glfs-resolve.c > >> +++ b/api/src/glfs-resolve.c > >> @@ -467,9 +467,11 @@ priv_glfs_resolve_at (struct glfs *fs, xlator_t > >> *subvol, inode_t *at, > >> > >> } > >> > >> } > >> > >> - if (parent && next_component) > >> + if (parent && next_component) { > >> + inode_unref (parent); > >> + parent = NULL; > >> > >> /* resolution failed mid-way */ > >> goto out; > >> > >> + } > >> > >> /* At this point, all components up to the last parent > >> directory > >> > >> have been resolved successfully (@parent). Resolution of > >> > >> basename > > > > yes. This could be one of the reasons. There are few leaks with respect to > > inode references in gfAPI. See below. > > > > > > On GlusterFS side, looks like majority of the leaks are related to inodes > > and their contexts. Possible reasons which I can think of are: > > > > 1) When there is a graph switch, old inode table and their entries are not > > purged (this is a known issue). There was an effort put to fix this > > issue. But I think it had other side-effects and hence not been applied. > > Maybe we should revive those changes again. > > > > 2) With regard to above, old entries can be purged in case if any request > > comes with the reference to old inode (as part of 'glfs_resolve_inode'), > > provided their reference counts are properly decremented. But this is not > > happening at the moment in gfapi. > > > > 3) Applications should hold and release their reference as needed and > > required. There are certain fixes needed in this area as well (including > > the fix provided by Pranith above).> > > From code-inspection, have made changes to fix few leaks of case (2) & > > (3) with respect to gfAPI.> > > http://review.gluster.org/#/c/13096 (yet to test the changes) > > > > I haven't yet narrowed down any suspects pertaining to only NFS-Ganesha. > > Will re-check and update. > I tried similar tests but with smaller set of files. I could see the > inode_ctx leak even without graph switches involved. I suspect that > could be because valgrind checks for memory leaks during the exit of the > program. We call 'glfs_fini()' to cleanup the memory being used by > gfapi during exit. Those inode_ctx leaks are result of some inodes being > left during inode_table cleanup. I have submitted below patch to address > this issue. > > http://review.gluster.org/13125 > > However this shall help only if there are volume un-exports being > involved or program being exited. It still doesn't address the actual > RAM being consumed by the application when active. > > Thanks, > Soumya > > > Thanks, > > Soumya > > > >> Pranith > >> > >>> One may see GlusterFS-related leaks here as well. > >>> > >>> On ????????, 25 ?????? 2015 ?. 20:28:13 EET Soumya Koduri wrote: > >>>> On 12/24/2015 09:17 PM, Oleksandr Natalenko wrote: > >>>>> Another addition: it seems to be GlusterFS API library memory leak > >>>>> because NFS-Ganesha also consumes huge amount of memory while doing > >>>>> ordinary "find . -type f" via NFSv4.2 on remote client. Here is memory > >>>>> usage: > >>>>> > >>>>> ==> >>>>> root 5416 34.2 78.5 2047176 1480552 ? Ssl 12:02 117:54 > >>>>> /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f > >>>>> /etc/ganesha/ganesha.conf -N NIV_EVENT > >>>>> ==> >>>>> > >>>>> 1.4G is too much for simple stat() :(. > >>>>> > >>>>> Ideas? > >>>> > >>>> nfs-ganesha also has cache layer which can scale to millions of entries > >>>> depending on the number of files/directories being looked upon. However > >>>> there are parameters to tune it. So either try stat with few entries or > >>>> add below block in nfs-ganesha.conf file, set low limits and check the > >>>> difference. That may help us narrow down how much memory actually > >>>> consumed by core nfs-ganesha and gfAPI. > >>>> > >>>> CACHEINODE { > >>>> > >>>> Cache_Size(uint32, range 1 to UINT32_MAX, default 32633); # cache size > >>>> Entries_HWMark(uint32, range 1 to UINT32_MAX, default 100000); #Max > >>>> no. > >>>> > >>>> of entries in the cache. > >>>> } > >>>> > >>>> Thanks, > >>>> Soumya > >>>> > >>>>> 24.12.2015 16:32, Oleksandr Natalenko ???????: > >>>>>> Still actual issue for 3.7.6. Any suggestions? > >>>>>> > >>>>>> 24.09.2015 10:14, Oleksandr Natalenko ???????: > >>>>>>> In our GlusterFS deployment we've encountered something like memory > >>>>>>> leak in GlusterFS FUSE client. > >>>>>>> > >>>>>>> We use replicated (?2) GlusterFS volume to store mail (exim+dovecot, > >>>>>>> maildir format). Here is inode stats for both bricks and mountpoint: > >>>>>>> > >>>>>>> ==> >>>>>>> Brick 1 (Server 1): > >>>>>>> > >>>>>>> Filesystem Inodes > >>>>>>> IUsed > >>>>>>> > >>>>>>> IFree IUse% Mounted on > >>>>>>> > >>>>>>> /dev/mapper/vg_vd1_misc-lv08_mail 578768144 > >>>>>>> 10954918 > >>>>>>> > >>>>>>> 567813226 2% /bricks/r6sdLV08_vd1_mail > >>>>>>> > >>>>>>> Brick 2 (Server 2): > >>>>>>> > >>>>>>> Filesystem Inodes > >>>>>>> IUsed > >>>>>>> > >>>>>>> IFree IUse% Mounted on > >>>>>>> > >>>>>>> /dev/mapper/vg_vd0_misc-lv07_mail 578767984 > >>>>>>> 10954913 > >>>>>>> > >>>>>>> 567813071 2% /bricks/r6sdLV07_vd0_mail > >>>>>>> > >>>>>>> Mountpoint (Server 3): > >>>>>>> > >>>>>>> Filesystem Inodes IUsed IFree > >>>>>>> IUse% Mounted on > >>>>>>> glusterfs.xxx:mail 578767760 10954915 567812845 > >>>>>>> 2% /var/spool/mail/virtual > >>>>>>> ==> >>>>>>> > >>>>>>> glusterfs.xxx domain has two A records for both Server 1 and Server > >>>>>>> 2. > >>>>>>> > >>>>>>> Here is volume info: > >>>>>>> > >>>>>>> ==> >>>>>>> Volume Name: mail > >>>>>>> Type: Replicate > >>>>>>> Volume ID: f564e85c-7aa6-4170-9417-1f501aa98cd2 > >>>>>>> Status: Started > >>>>>>> Number of Bricks: 1 x 2 = 2 > >>>>>>> Transport-type: tcp > >>>>>>> Bricks: > >>>>>>> Brick1: server1.xxx:/bricks/r6sdLV08_vd1_mail/mail > >>>>>>> Brick2: server2.xxx:/bricks/r6sdLV07_vd0_mail/mail > >>>>>>> Options Reconfigured: > >>>>>>> nfs.rpc-auth-allow: 1.2.4.0/24,4.5.6.0/24 > >>>>>>> features.cache-invalidation-timeout: 10 > >>>>>>> performance.stat-prefetch: off > >>>>>>> performance.quick-read: on > >>>>>>> performance.read-ahead: off > >>>>>>> performance.flush-behind: on > >>>>>>> performance.write-behind: on > >>>>>>> performance.io-thread-count: 4 > >>>>>>> performance.cache-max-file-size: 1048576 > >>>>>>> performance.cache-size: 67108864 > >>>>>>> performance.readdir-ahead: off > >>>>>>> ==> >>>>>>> > >>>>>>> Soon enough after mounting and exim/dovecot start, glusterfs client > >>>>>>> process begins to consume huge amount of RAM: > >>>>>>> > >>>>>>> ==> >>>>>>> user at server3 ~$ ps aux | grep glusterfs | grep mail > >>>>>>> root 28895 14.4 15.0 15510324 14908868 ? Ssl Sep03 4310:05 > >>>>>>> /usr/sbin/glusterfs --fopen-keep-cache --direct-io-mode=disable > >>>>>>> --volfile-server=glusterfs.xxx --volfile-id=mail > >>>>>>> /var/spool/mail/virtual > >>>>>>> ==> >>>>>>> > >>>>>>> That is, ~15 GiB of RAM. > >>>>>>> > >>>>>>> Also we've tried to use mountpoint withing separate KVM VM with 2 or > >>>>>>> 3 > >>>>>>> GiB of RAM, and soon after starting mail daemons got OOM killer for > >>>>>>> glusterfs client process. > >>>>>>> > >>>>>>> Mounting same share via NFS works just fine. Also, we have much less > >>>>>>> iowait and loadavg on client side with NFS. > >>>>>>> > >>>>>>> Also, we've tried to change IO threads count and cache size in order > >>>>>>> to limit memory usage with no luck. As you can see, total cache size > >>>>>>> is 4?64==256 MiB (compare to 15 GiB). > >>>>>>> > >>>>>>> Enabling-disabling stat-prefetch, read-ahead and readdir-ahead > >>>>>>> didn't > >>>>>>> help as well. > >>>>>>> > >>>>>>> Here are volume memory stats: > >>>>>>> > >>>>>>> ==> >>>>>>> Memory status for volume : mail > >>>>>>> ---------------------------------------------- > >>>>>>> Brick : server1.xxx:/bricks/r6sdLV08_vd1_mail/mail > >>>>>>> Mallinfo > >>>>>>> -------- > >>>>>>> Arena : 36859904 > >>>>>>> Ordblks : 10357 > >>>>>>> Smblks : 519 > >>>>>>> Hblks : 21 > >>>>>>> Hblkhd : 30515200 > >>>>>>> Usmblks : 0 > >>>>>>> Fsmblks : 53440 > >>>>>>> Uordblks : 18604144 > >>>>>>> Fordblks : 18255760 > >>>>>>> Keepcost : 114112 > >>>>>>> > >>>>>>> Mempool Stats > >>>>>>> ------------- > >>>>>>> Name HotCount ColdCount PaddedSizeof > >>>>>>> AllocCount MaxAlloc Misses Max-StdAlloc > >>>>>>> ---- -------- --------- ------------ > >>>>>>> ---------- -------- -------- ------------ > >>>>>>> mail-server:fd_t 0 1024 108 > >>>>>>> 30773120 137 0 0 > >>>>>>> mail-server:dentry_t 16110 274 84 > >>>>>>> 235676148 16384 1106499 1152 > >>>>>>> mail-server:inode_t 16363 21 156 > >>>>>>> 237216876 16384 1876651 1169 > >>>>>>> mail-trash:fd_t 0 1024 108 > >>>>>>> > >>>>>>> 0 0 0 0 > >>>>>>> > >>>>>>> mail-trash:dentry_t 0 32768 84 > >>>>>>> > >>>>>>> 0 0 0 0 > >>>>>>> > >>>>>>> mail-trash:inode_t 4 32764 156 > >>>>>>> > >>>>>>> 4 4 0 0 > >>>>>>> > >>>>>>> mail-trash:trash_local_t 0 64 8628 > >>>>>>> > >>>>>>> 0 0 0 0 > >>>>>>> > >>>>>>> mail-changetimerecorder:gf_ctr_local_t 0 64 > >>>>>>> 16540 0 0 0 0 > >>>>>>> mail-changelog:rpcsvc_request_t 0 8 2828 > >>>>>>> > >>>>>>> 0 0 0 0 > >>>>>>> > >>>>>>> mail-changelog:changelog_local_t 0 64 116 > >>>>>>> > >>>>>>> 0 0 0 0 > >>>>>>> > >>>>>>> mail-bitrot-stub:br_stub_local_t 0 512 84 > >>>>>>> 79204 4 0 0 > >>>>>>> mail-locks:pl_local_t 0 32 148 > >>>>>>> 6812757 4 0 0 > >>>>>>> mail-upcall:upcall_local_t 0 512 108 > >>>>>>> > >>>>>>> 0 0 0 0 > >>>>>>> > >>>>>>> mail-marker:marker_local_t 0 128 332 > >>>>>>> 64980 3 0 0 > >>>>>>> mail-quota:quota_local_t 0 64 476 > >>>>>>> > >>>>>>> 0 0 0 0 > >>>>>>> > >>>>>>> mail-server:rpcsvc_request_t 0 512 2828 > >>>>>>> 45462533 34 0 0 > >>>>>>> glusterfs:struct saved_frame 0 8 124 > >>>>>>> > >>>>>>> 2 2 0 0 > >>>>>>> > >>>>>>> glusterfs:struct rpc_req 0 8 588 > >>>>>>> > >>>>>>> 2 2 0 0 > >>>>>>> > >>>>>>> glusterfs:rpcsvc_request_t 1 7 2828 > >>>>>>> > >>>>>>> 2 1 0 0 > >>>>>>> > >>>>>>> glusterfs:log_buf_t 5 251 140 > >>>>>>> 3452 6 0 0 > >>>>>>> glusterfs:data_t 242 16141 52 > >>>>>>> 480115498 664 0 0 > >>>>>>> glusterfs:data_pair_t 230 16153 68 > >>>>>>> 179483528 275 0 0 > >>>>>>> glusterfs:dict_t 23 4073 140 > >>>>>>> 303751675 627 0 0 > >>>>>>> glusterfs:call_stub_t 0 1024 3764 > >>>>>>> 45290655 34 0 0 > >>>>>>> glusterfs:call_stack_t 1 1023 1708 > >>>>>>> 43598469 34 0 0 > >>>>>>> glusterfs:call_frame_t 1 4095 172 > >>>>>>> 336219655 184 0 0 > >>>>>>> ---------------------------------------------- > >>>>>>> Brick : server2.xxx:/bricks/r6sdLV07_vd0_mail/mail > >>>>>>> Mallinfo > >>>>>>> -------- > >>>>>>> Arena : 38174720 > >>>>>>> Ordblks : 9041 > >>>>>>> Smblks : 507 > >>>>>>> Hblks : 21 > >>>>>>> Hblkhd : 30515200 > >>>>>>> Usmblks : 0 > >>>>>>> Fsmblks : 51712 > >>>>>>> Uordblks : 19415008 > >>>>>>> Fordblks : 18759712 > >>>>>>> Keepcost : 114848 > >>>>>>> > >>>>>>> Mempool Stats > >>>>>>> ------------- > >>>>>>> Name HotCount ColdCount PaddedSizeof > >>>>>>> AllocCount MaxAlloc Misses Max-StdAlloc > >>>>>>> ---- -------- --------- ------------ > >>>>>>> ---------- -------- -------- ------------ > >>>>>>> mail-server:fd_t 0 1024 108 > >>>>>>> 2373075 133 0 0 > >>>>>>> mail-server:dentry_t 14114 2270 84 > >>>>>>> 3513654 16384 2300 267 > >>>>>>> mail-server:inode_t 16374 10 156 > >>>>>>> 6766642 16384 194635 1279 > >>>>>>> mail-trash:fd_t 0 1024 108 > >>>>>>> > >>>>>>> 0 0 0 0 > >>>>>>> > >>>>>>> mail-trash:dentry_t 0 32768 84 > >>>>>>> > >>>>>>> 0 0 0 0 > >>>>>>> > >>>>>>> mail-trash:inode_t 4 32764 156 > >>>>>>> > >>>>>>> 4 4 0 0 > >>>>>>> > >>>>>>> mail-trash:trash_local_t 0 64 8628 > >>>>>>> > >>>>>>> 0 0 0 0 > >>>>>>> > >>>>>>> mail-changetimerecorder:gf_ctr_local_t 0 64 > >>>>>>> 16540 0 0 0 0 > >>>>>>> mail-changelog:rpcsvc_request_t 0 8 2828 > >>>>>>> > >>>>>>> 0 0 0 0 > >>>>>>> > >>>>>>> mail-changelog:changelog_local_t 0 64 116 > >>>>>>> > >>>>>>> 0 0 0 0 > >>>>>>> > >>>>>>> mail-bitrot-stub:br_stub_local_t 0 512 84 > >>>>>>> 71354 4 0 0 > >>>>>>> mail-locks:pl_local_t 0 32 148 > >>>>>>> 8135032 4 0 0 > >>>>>>> mail-upcall:upcall_local_t 0 512 108 > >>>>>>> > >>>>>>> 0 0 0 0 > >>>>>>> > >>>>>>> mail-marker:marker_local_t 0 128 332 > >>>>>>> 65005 3 0 0 > >>>>>>> mail-quota:quota_local_t 0 64 476 > >>>>>>> > >>>>>>> 0 0 0 0 > >>>>>>> > >>>>>>> mail-server:rpcsvc_request_t 0 512 2828 > >>>>>>> 12882393 30 0 0 > >>>>>>> glusterfs:struct saved_frame 0 8 124 > >>>>>>> > >>>>>>> 2 2 0 0 > >>>>>>> > >>>>>>> glusterfs:struct rpc_req 0 8 588 > >>>>>>> > >>>>>>> 2 2 0 0 > >>>>>>> > >>>>>>> glusterfs:rpcsvc_request_t 1 7 2828 > >>>>>>> > >>>>>>> 2 1 0 0 > >>>>>>> > >>>>>>> glusterfs:log_buf_t 5 251 140 > >>>>>>> 3443 6 0 0 > >>>>>>> glusterfs:data_t 242 16141 52 > >>>>>>> 138743429 290 0 0 > >>>>>>> glusterfs:data_pair_t 230 16153 68 > >>>>>>> 126649864 270 0 0 > >>>>>>> glusterfs:dict_t 23 4073 140 > >>>>>>> 20356289 63 0 0 > >>>>>>> glusterfs:call_stub_t 0 1024 3764 > >>>>>>> 13678560 31 0 0 > >>>>>>> glusterfs:call_stack_t 1 1023 1708 > >>>>>>> 11011561 30 0 0 > >>>>>>> glusterfs:call_frame_t 1 4095 172 > >>>>>>> 125764190 193 0 0 > >>>>>>> ---------------------------------------------- > >>>>>>> ==> >>>>>>> > >>>>>>> So, my questions are: > >>>>>>> > >>>>>>> 1) what one should do to limit GlusterFS FUSE client memory usage? > >>>>>>> 2) what one should do to prevent client high loadavg because of high > >>>>>>> iowait because of multiple concurrent volume users? > >>>>>>> > >>>>>>> Server/client OS is CentOS 7.1, GlusterFS server version is 3.7.3, > >>>>>>> GlusterFS client version is 3.7.4. > >>>>>>> > >>>>>>> Any additional info needed? > >>>>> > >>>>> _______________________________________________ > >>>>> Gluster-users mailing list > >>>>> Gluster-users at gluster.org > >>>>> http://www.gluster.org/mailman/listinfo/gluster-users > >>> > >>> _______________________________________________ > >>> Gluster-devel mailing list > >>> Gluster-devel at gluster.org > >>> http://www.gluster.org/mailman/listinfo/gluster-devel > > > > _______________________________________________ > > Gluster-devel mailing list > > Gluster-devel at gluster.org > > http://www.gluster.org/mailman/listinfo/gluster-devel
Vijay Bellur
2016-Jan-03 18:35 UTC
[Gluster-users] [Gluster-devel] Memory leak in GlusterFS FUSE client
On 01/03/2016 09:23 AM, Oleksandr Natalenko wrote:> Another Valgrind run. > > I did the following: > > ==> valgrind --leak-check=full --show-leak-kinds=all --log- > file="valgrind_fuse.log" /usr/bin/glusterfs -N --volfile- > server=some.server.com --volfile-id=somevolume /mnt/volume > ==> > then cd to /mnt/volume and find . -type f. After traversing some part of > hierarchy I've stopped find and did umount /mnt/volume. Here is > valgrind_fuse.log file: > > https://gist.github.com/7e2679e1e72e48f75a2b >Can you please try the same by dropping caches before umount? echo 3 > /proc/sys/vm/drop_caches Gluster relies on vfs sending forgets and releases to clean up the inodes and the contexts in the inodes maintained by various translators. Thanks, Vijay