Oleksandr Natalenko
2016-Jan-05 22:23 UTC
[Gluster-users] Memory leak in GlusterFS FUSE client
OK, I've repeated the same traversing test with patched GlusterFS API, and here is new Valgrind log: https://gist.github.com/17ecb16a11c9aed957f5 Still leaks. On ????????, 5 ????? 2016 ?. 22:52:25 EET Soumya Koduri wrote:> On 01/05/2016 05:56 PM, Oleksandr Natalenko wrote: > > Unfortunately, both patches didn't make any difference for me. > > > > I've patched 3.7.6 with both patches, recompiled and installed patched > > GlusterFS package on client side and mounted volume with ~2M of files. > > The I performed usual tree traverse with simple "find". > > > > Memory RES value went from ~130M at the moment of mounting to ~1.5G > > after traversing the volume for ~40 mins. Valgrind log still shows lots > > of leaks. Here it is: > > > > https://gist.github.com/56906ca6e657c4ffa4a1 > > Looks like you had done fuse mount. The patches which I have pasted > below apply to gfapi/nfs-ganesha applications. > > Also, to resolve the nfs-ganesha issue which I had mentioned below (in > case if Entries_HWMARK option gets changed), I have posted below fix - > https://review.gerrithub.io/#/c/258687 > > Thanks, > Soumya > > > Ideas? > > > > 05.01.2016 12:31, Soumya Koduri ???????: > >> I tried to debug the inode* related leaks and seen some improvements > >> after applying the below patches when ran the same test (but will > >> smaller load). Could you please apply those patches & confirm the > >> same? > >> > >> a) http://review.gluster.org/13125 > >> > >> This will fix the inodes & their ctx related leaks during unexport and > >> the program exit. Please check the valgrind output after applying the > >> patch. It should not list any inodes related memory as lost. > >> > >> b) http://review.gluster.org/13096 > >> > >> The reason the change in Entries_HWMARK (in your earlier mail) dint > >> have much effect is that the inode_nlookup count doesn't become zero > >> for those handles/inodes being closed by ganesha. Hence those inodes > >> shall get added to inode lru list instead of purge list which shall > >> get forcefully purged only when the number of gfapi inode table > >> entries reaches its limit (which is 137012). > >> > >> This patch fixes those 'nlookup' counts. Please apply this patch and > >> reduce 'Entries_HWMARK' to much lower value and check if it decreases > >> the in-memory being consumed by ganesha process while being active. > >> > >> CACHEINODE { > >> > >> Entries_HWMark = 500; > >> > >> } > >> > >> > >> Note: I see an issue with nfs-ganesha during exit when the option > >> 'Entries_HWMARK' gets changed. This is not related to any of the above > >> patches (or rather Gluster) and I am currently debugging it. > >> > >> Thanks, > >> Soumya > >> > >> On 12/25/2015 11:34 PM, Oleksandr Natalenko wrote: > >>> 1. test with Cache_Size = 256 and Entries_HWMark = 4096 > >>> > >>> Before find . -type f: > >>> > >>> root 3120 0.6 11.0 879120 208408 ? Ssl 17:39 0:00 > >>> /usr/bin/ > >>> ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N > >>> NIV_EVENT > >>> > >>> After: > >>> > >>> root 3120 11.4 24.3 1170076 458168 ? Ssl 17:39 13:39 > >>> /usr/bin/ > >>> ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N > >>> NIV_EVENT > >>> > >>> ~250M leak. > >>> > >>> 2. test with default values (after ganesha restart) > >>> > >>> Before: > >>> > >>> root 24937 1.3 10.4 875016 197808 ? Ssl 19:39 0:00 > >>> /usr/bin/ > >>> ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N > >>> NIV_EVENT > >>> > >>> After: > >>> > >>> root 24937 3.5 18.9 1022544 356340 ? Ssl 19:39 0:40 > >>> /usr/bin/ > >>> ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N > >>> NIV_EVENT > >>> > >>> ~159M leak. > >>> > >>> No reasonable correlation detected. Second test was finished much > >>> faster than > >>> first (I guess, server-side GlusterFS cache or server kernel page > >>> cache is the > >>> cause). > >>> > >>> There are ~1.8M files on this test volume. > >>> > >>> On ????????, 25 ?????? 2015 ?. 20:28:13 EET Soumya Koduri wrote: > >>>> On 12/24/2015 09:17 PM, Oleksandr Natalenko wrote: > >>>>> Another addition: it seems to be GlusterFS API library memory leak > >>>>> because NFS-Ganesha also consumes huge amount of memory while doing > >>>>> ordinary "find . -type f" via NFSv4.2 on remote client. Here is memory > >>>>> usage: > >>>>> > >>>>> ==> >>>>> root 5416 34.2 78.5 2047176 1480552 ? Ssl 12:02 117:54 > >>>>> /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f > >>>>> /etc/ganesha/ganesha.conf -N NIV_EVENT > >>>>> ==> >>>>> > >>>>> 1.4G is too much for simple stat() :(. > >>>>> > >>>>> Ideas? > >>>> > >>>> nfs-ganesha also has cache layer which can scale to millions of entries > >>>> depending on the number of files/directories being looked upon. However > >>>> there are parameters to tune it. So either try stat with few entries or > >>>> add below block in nfs-ganesha.conf file, set low limits and check the > >>>> difference. That may help us narrow down how much memory actually > >>>> consumed by core nfs-ganesha and gfAPI. > >>>> > >>>> CACHEINODE { > >>>> > >>>> Cache_Size(uint32, range 1 to UINT32_MAX, default 32633); # > >>>> > >>>> cache size > >>>> > >>>> Entries_HWMark(uint32, range 1 to UINT32_MAX, default 100000); > >>>> > >>>> #Max no. > >>>> of entries in the cache. > >>>> } > >>>> > >>>> Thanks, > >>>> Soumya > >>>> > >>>>> 24.12.2015 16:32, Oleksandr Natalenko ???????: > >>>>>> Still actual issue for 3.7.6. Any suggestions? > >>>>>> > >>>>>> 24.09.2015 10:14, Oleksandr Natalenko ???????: > >>>>>>> In our GlusterFS deployment we've encountered something like memory > >>>>>>> leak in GlusterFS FUSE client. > >>>>>>> > >>>>>>> We use replicated (?2) GlusterFS volume to store mail (exim+dovecot, > >>>>>>> maildir format). Here is inode stats for both bricks and mountpoint: > >>>>>>> > >>>>>>> ==> >>>>>>> Brick 1 (Server 1): > >>>>>>> > >>>>>>> Filesystem Inodes IUsed > >>>>>>> > >>>>>>> IFree IUse% Mounted on > >>>>>>> > >>>>>>> /dev/mapper/vg_vd1_misc-lv08_mail 578768144 > >>>>>>> 10954918 > >>>>>>> > >>>>>>> 567813226 2% /bricks/r6sdLV08_vd1_mail > >>>>>>> > >>>>>>> Brick 2 (Server 2): > >>>>>>> > >>>>>>> Filesystem Inodes IUsed > >>>>>>> > >>>>>>> IFree IUse% Mounted on > >>>>>>> > >>>>>>> /dev/mapper/vg_vd0_misc-lv07_mail 578767984 > >>>>>>> 10954913 > >>>>>>> > >>>>>>> 567813071 2% /bricks/r6sdLV07_vd0_mail > >>>>>>> > >>>>>>> Mountpoint (Server 3): > >>>>>>> > >>>>>>> Filesystem Inodes IUsed IFree > >>>>>>> IUse% Mounted on > >>>>>>> glusterfs.xxx:mail 578767760 10954915 567812845 > >>>>>>> 2% /var/spool/mail/virtual > >>>>>>> ==> >>>>>>> > >>>>>>> glusterfs.xxx domain has two A records for both Server 1 and > >>>>>>> Server 2. > >>>>>>> > >>>>>>> Here is volume info: > >>>>>>> > >>>>>>> ==> >>>>>>> Volume Name: mail > >>>>>>> Type: Replicate > >>>>>>> Volume ID: f564e85c-7aa6-4170-9417-1f501aa98cd2 > >>>>>>> Status: Started > >>>>>>> Number of Bricks: 1 x 2 = 2 > >>>>>>> Transport-type: tcp > >>>>>>> Bricks: > >>>>>>> Brick1: server1.xxx:/bricks/r6sdLV08_vd1_mail/mail > >>>>>>> Brick2: server2.xxx:/bricks/r6sdLV07_vd0_mail/mail > >>>>>>> Options Reconfigured: > >>>>>>> nfs.rpc-auth-allow: 1.2.4.0/24,4.5.6.0/24 > >>>>>>> features.cache-invalidation-timeout: 10 > >>>>>>> performance.stat-prefetch: off > >>>>>>> performance.quick-read: on > >>>>>>> performance.read-ahead: off > >>>>>>> performance.flush-behind: on > >>>>>>> performance.write-behind: on > >>>>>>> performance.io-thread-count: 4 > >>>>>>> performance.cache-max-file-size: 1048576 > >>>>>>> performance.cache-size: 67108864 > >>>>>>> performance.readdir-ahead: off > >>>>>>> ==> >>>>>>> > >>>>>>> Soon enough after mounting and exim/dovecot start, glusterfs client > >>>>>>> process begins to consume huge amount of RAM: > >>>>>>> > >>>>>>> ==> >>>>>>> user at server3 ~$ ps aux | grep glusterfs | grep mail > >>>>>>> root 28895 14.4 15.0 15510324 14908868 ? Ssl Sep03 4310:05 > >>>>>>> /usr/sbin/glusterfs --fopen-keep-cache --direct-io-mode=disable > >>>>>>> --volfile-server=glusterfs.xxx --volfile-id=mail > >>>>>>> /var/spool/mail/virtual > >>>>>>> ==> >>>>>>> > >>>>>>> That is, ~15 GiB of RAM. > >>>>>>> > >>>>>>> Also we've tried to use mountpoint withing separate KVM VM with 2 > >>>>>>> or 3 > >>>>>>> GiB of RAM, and soon after starting mail daemons got OOM killer for > >>>>>>> glusterfs client process. > >>>>>>> > >>>>>>> Mounting same share via NFS works just fine. Also, we have much less > >>>>>>> iowait and loadavg on client side with NFS. > >>>>>>> > >>>>>>> Also, we've tried to change IO threads count and cache size in order > >>>>>>> to limit memory usage with no luck. As you can see, total cache size > >>>>>>> is 4?64==256 MiB (compare to 15 GiB). > >>>>>>> > >>>>>>> Enabling-disabling stat-prefetch, read-ahead and readdir-ahead > >>>>>>> didn't > >>>>>>> help as well. > >>>>>>> > >>>>>>> Here are volume memory stats: > >>>>>>> > >>>>>>> ==> >>>>>>> Memory status for volume : mail > >>>>>>> ---------------------------------------------- > >>>>>>> Brick : server1.xxx:/bricks/r6sdLV08_vd1_mail/mail > >>>>>>> Mallinfo > >>>>>>> -------- > >>>>>>> Arena : 36859904 > >>>>>>> Ordblks : 10357 > >>>>>>> Smblks : 519 > >>>>>>> Hblks : 21 > >>>>>>> Hblkhd : 30515200 > >>>>>>> Usmblks : 0 > >>>>>>> Fsmblks : 53440 > >>>>>>> Uordblks : 18604144 > >>>>>>> Fordblks : 18255760 > >>>>>>> Keepcost : 114112 > >>>>>>> > >>>>>>> Mempool Stats > >>>>>>> ------------- > >>>>>>> Name HotCount ColdCount PaddedSizeof > >>>>>>> AllocCount MaxAlloc Misses Max-StdAlloc > >>>>>>> ---- -------- --------- ------------ > >>>>>>> ---------- -------- -------- ------------ > >>>>>>> mail-server:fd_t 0 1024 108 > >>>>>>> 30773120 137 0 0 > >>>>>>> mail-server:dentry_t 16110 274 84 > >>>>>>> 235676148 16384 1106499 1152 > >>>>>>> mail-server:inode_t 16363 21 156 > >>>>>>> 237216876 16384 1876651 1169 > >>>>>>> mail-trash:fd_t 0 1024 108 > >>>>>>> > >>>>>>> 0 0 0 0 > >>>>>>> > >>>>>>> mail-trash:dentry_t 0 32768 84 > >>>>>>> > >>>>>>> 0 0 0 0 > >>>>>>> > >>>>>>> mail-trash:inode_t 4 32764 156 > >>>>>>> > >>>>>>> 4 4 0 0 > >>>>>>> > >>>>>>> mail-trash:trash_local_t 0 64 8628 > >>>>>>> > >>>>>>> 0 0 0 0 > >>>>>>> > >>>>>>> mail-changetimerecorder:gf_ctr_local_t 0 64 > >>>>>>> 16540 0 0 0 0 > >>>>>>> mail-changelog:rpcsvc_request_t 0 8 2828 > >>>>>>> > >>>>>>> 0 0 0 0 > >>>>>>> > >>>>>>> mail-changelog:changelog_local_t 0 64 116 > >>>>>>> > >>>>>>> 0 0 0 0 > >>>>>>> > >>>>>>> mail-bitrot-stub:br_stub_local_t 0 512 84 > >>>>>>> 79204 4 0 0 > >>>>>>> mail-locks:pl_local_t 0 32 148 > >>>>>>> 6812757 4 0 0 > >>>>>>> mail-upcall:upcall_local_t 0 512 108 > >>>>>>> > >>>>>>> 0 0 0 0 > >>>>>>> > >>>>>>> mail-marker:marker_local_t 0 128 332 > >>>>>>> 64980 3 0 0 > >>>>>>> mail-quota:quota_local_t 0 64 476 > >>>>>>> > >>>>>>> 0 0 0 0 > >>>>>>> > >>>>>>> mail-server:rpcsvc_request_t 0 512 2828 > >>>>>>> 45462533 34 0 0 > >>>>>>> glusterfs:struct saved_frame 0 8 124 > >>>>>>> > >>>>>>> 2 2 0 0 > >>>>>>> > >>>>>>> glusterfs:struct rpc_req 0 8 588 > >>>>>>> > >>>>>>> 2 2 0 0 > >>>>>>> > >>>>>>> glusterfs:rpcsvc_request_t 1 7 2828 > >>>>>>> > >>>>>>> 2 1 0 0 > >>>>>>> > >>>>>>> glusterfs:log_buf_t 5 251 140 > >>>>>>> 3452 6 0 0 > >>>>>>> glusterfs:data_t 242 16141 52 > >>>>>>> 480115498 664 0 0 > >>>>>>> glusterfs:data_pair_t 230 16153 68 > >>>>>>> 179483528 275 0 0 > >>>>>>> glusterfs:dict_t 23 4073 140 > >>>>>>> 303751675 627 0 0 > >>>>>>> glusterfs:call_stub_t 0 1024 3764 > >>>>>>> 45290655 34 0 0 > >>>>>>> glusterfs:call_stack_t 1 1023 1708 > >>>>>>> 43598469 34 0 0 > >>>>>>> glusterfs:call_frame_t 1 4095 172 > >>>>>>> 336219655 184 0 0 > >>>>>>> ---------------------------------------------- > >>>>>>> Brick : server2.xxx:/bricks/r6sdLV07_vd0_mail/mail > >>>>>>> Mallinfo > >>>>>>> -------- > >>>>>>> Arena : 38174720 > >>>>>>> Ordblks : 9041 > >>>>>>> Smblks : 507 > >>>>>>> Hblks : 21 > >>>>>>> Hblkhd : 30515200 > >>>>>>> Usmblks : 0 > >>>>>>> Fsmblks : 51712 > >>>>>>> Uordblks : 19415008 > >>>>>>> Fordblks : 18759712 > >>>>>>> Keepcost : 114848 > >>>>>>> > >>>>>>> Mempool Stats > >>>>>>> ------------- > >>>>>>> Name HotCount ColdCount PaddedSizeof > >>>>>>> AllocCount MaxAlloc Misses Max-StdAlloc > >>>>>>> ---- -------- --------- ------------ > >>>>>>> ---------- -------- -------- ------------ > >>>>>>> mail-server:fd_t 0 1024 108 > >>>>>>> 2373075 133 0 0 > >>>>>>> mail-server:dentry_t 14114 2270 84 > >>>>>>> 3513654 16384 2300 267 > >>>>>>> mail-server:inode_t 16374 10 156 > >>>>>>> 6766642 16384 194635 1279 > >>>>>>> mail-trash:fd_t 0 1024 108 > >>>>>>> > >>>>>>> 0 0 0 0 > >>>>>>> > >>>>>>> mail-trash:dentry_t 0 32768 84 > >>>>>>> > >>>>>>> 0 0 0 0 > >>>>>>> > >>>>>>> mail-trash:inode_t 4 32764 156 > >>>>>>> > >>>>>>> 4 4 0 0 > >>>>>>> > >>>>>>> mail-trash:trash_local_t 0 64 8628 > >>>>>>> > >>>>>>> 0 0 0 0 > >>>>>>> > >>>>>>> mail-changetimerecorder:gf_ctr_local_t 0 64 > >>>>>>> 16540 0 0 0 0 > >>>>>>> mail-changelog:rpcsvc_request_t 0 8 2828 > >>>>>>> > >>>>>>> 0 0 0 0 > >>>>>>> > >>>>>>> mail-changelog:changelog_local_t 0 64 116 > >>>>>>> > >>>>>>> 0 0 0 0 > >>>>>>> > >>>>>>> mail-bitrot-stub:br_stub_local_t 0 512 84 > >>>>>>> 71354 4 0 0 > >>>>>>> mail-locks:pl_local_t 0 32 148 > >>>>>>> 8135032 4 0 0 > >>>>>>> mail-upcall:upcall_local_t 0 512 108 > >>>>>>> > >>>>>>> 0 0 0 0 > >>>>>>> > >>>>>>> mail-marker:marker_local_t 0 128 332 > >>>>>>> 65005 3 0 0 > >>>>>>> mail-quota:quota_local_t 0 64 476 > >>>>>>> > >>>>>>> 0 0 0 0 > >>>>>>> > >>>>>>> mail-server:rpcsvc_request_t 0 512 2828 > >>>>>>> 12882393 30 0 0 > >>>>>>> glusterfs:struct saved_frame 0 8 124 > >>>>>>> > >>>>>>> 2 2 0 0 > >>>>>>> > >>>>>>> glusterfs:struct rpc_req 0 8 588 > >>>>>>> > >>>>>>> 2 2 0 0 > >>>>>>> > >>>>>>> glusterfs:rpcsvc_request_t 1 7 2828 > >>>>>>> > >>>>>>> 2 1 0 0 > >>>>>>> > >>>>>>> glusterfs:log_buf_t 5 251 140 > >>>>>>> 3443 6 0 0 > >>>>>>> glusterfs:data_t 242 16141 52 > >>>>>>> 138743429 290 0 0 > >>>>>>> glusterfs:data_pair_t 230 16153 68 > >>>>>>> 126649864 270 0 0 > >>>>>>> glusterfs:dict_t 23 4073 140 > >>>>>>> 20356289 63 0 0 > >>>>>>> glusterfs:call_stub_t 0 1024 3764 > >>>>>>> 13678560 31 0 0 > >>>>>>> glusterfs:call_stack_t 1 1023 1708 > >>>>>>> 11011561 30 0 0 > >>>>>>> glusterfs:call_frame_t 1 4095 172 > >>>>>>> 125764190 193 0 0 > >>>>>>> ---------------------------------------------- > >>>>>>> ==> >>>>>>> > >>>>>>> So, my questions are: > >>>>>>> > >>>>>>> 1) what one should do to limit GlusterFS FUSE client memory usage? > >>>>>>> 2) what one should do to prevent client high loadavg because of high > >>>>>>> iowait because of multiple concurrent volume users? > >>>>>>> > >>>>>>> Server/client OS is CentOS 7.1, GlusterFS server version is 3.7.3, > >>>>>>> GlusterFS client version is 3.7.4. > >>>>>>> > >>>>>>> Any additional info needed? > >>>>> > >>>>> _______________________________________________ > >>>>> Gluster-users mailing list > >>>>> Gluster-users at gluster.org > >>>>> http://www.gluster.org/mailman/listinfo/gluster-users
On 01/06/2016 03:53 AM, Oleksandr Natalenko wrote:> OK, I've repeated the same traversing test with patched GlusterFS API, and > here is new Valgrind log: > > https://gist.github.com/17ecb16a11c9aed957f5 >Fuse mount doesn't use gfapi helper. Does your above GlusterFS API application call glfs_fini() during exit? glfs_fini() is responsible for freeing the memory consumed by gfAPI applications. Could you repeat the test with nfs-ganesha (which for sure calls glfs_fini() and purges inodes if exceeds its inode cache limit) if possible. Thanks, Soumya> Still leaks. > > On ????????, 5 ????? 2016 ?. 22:52:25 EET Soumya Koduri wrote: >> On 01/05/2016 05:56 PM, Oleksandr Natalenko wrote: >>> Unfortunately, both patches didn't make any difference for me. >>> >>> I've patched 3.7.6 with both patches, recompiled and installed patched >>> GlusterFS package on client side and mounted volume with ~2M of files. >>> The I performed usual tree traverse with simple "find". >>> >>> Memory RES value went from ~130M at the moment of mounting to ~1.5G >>> after traversing the volume for ~40 mins. Valgrind log still shows lots >>> of leaks. Here it is: >>> >>> https://gist.github.com/56906ca6e657c4ffa4a1 >> >> Looks like you had done fuse mount. The patches which I have pasted >> below apply to gfapi/nfs-ganesha applications. >> >> Also, to resolve the nfs-ganesha issue which I had mentioned below (in >> case if Entries_HWMARK option gets changed), I have posted below fix - >> https://review.gerrithub.io/#/c/258687 >> >> Thanks, >> Soumya >> >>> Ideas? >>> >>> 05.01.2016 12:31, Soumya Koduri ???????: >>>> I tried to debug the inode* related leaks and seen some improvements >>>> after applying the below patches when ran the same test (but will >>>> smaller load). Could you please apply those patches & confirm the >>>> same? >>>> >>>> a) http://review.gluster.org/13125 >>>> >>>> This will fix the inodes & their ctx related leaks during unexport and >>>> the program exit. Please check the valgrind output after applying the >>>> patch. It should not list any inodes related memory as lost. >>>> >>>> b) http://review.gluster.org/13096 >>>> >>>> The reason the change in Entries_HWMARK (in your earlier mail) dint >>>> have much effect is that the inode_nlookup count doesn't become zero >>>> for those handles/inodes being closed by ganesha. Hence those inodes >>>> shall get added to inode lru list instead of purge list which shall >>>> get forcefully purged only when the number of gfapi inode table >>>> entries reaches its limit (which is 137012). >>>> >>>> This patch fixes those 'nlookup' counts. Please apply this patch and >>>> reduce 'Entries_HWMARK' to much lower value and check if it decreases >>>> the in-memory being consumed by ganesha process while being active. >>>> >>>> CACHEINODE { >>>> >>>> Entries_HWMark = 500; >>>> >>>> } >>>> >>>> >>>> Note: I see an issue with nfs-ganesha during exit when the option >>>> 'Entries_HWMARK' gets changed. This is not related to any of the above >>>> patches (or rather Gluster) and I am currently debugging it. >>>> >>>> Thanks, >>>> Soumya >>>> >>>> On 12/25/2015 11:34 PM, Oleksandr Natalenko wrote: >>>>> 1. test with Cache_Size = 256 and Entries_HWMark = 4096 >>>>> >>>>> Before find . -type f: >>>>> >>>>> root 3120 0.6 11.0 879120 208408 ? Ssl 17:39 0:00 >>>>> /usr/bin/ >>>>> ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N >>>>> NIV_EVENT >>>>> >>>>> After: >>>>> >>>>> root 3120 11.4 24.3 1170076 458168 ? Ssl 17:39 13:39 >>>>> /usr/bin/ >>>>> ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N >>>>> NIV_EVENT >>>>> >>>>> ~250M leak. >>>>> >>>>> 2. test with default values (after ganesha restart) >>>>> >>>>> Before: >>>>> >>>>> root 24937 1.3 10.4 875016 197808 ? Ssl 19:39 0:00 >>>>> /usr/bin/ >>>>> ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N >>>>> NIV_EVENT >>>>> >>>>> After: >>>>> >>>>> root 24937 3.5 18.9 1022544 356340 ? Ssl 19:39 0:40 >>>>> /usr/bin/ >>>>> ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N >>>>> NIV_EVENT >>>>> >>>>> ~159M leak. >>>>> >>>>> No reasonable correlation detected. Second test was finished much >>>>> faster than >>>>> first (I guess, server-side GlusterFS cache or server kernel page >>>>> cache is the >>>>> cause). >>>>> >>>>> There are ~1.8M files on this test volume. >>>>> >>>>> On ????????, 25 ?????? 2015 ?. 20:28:13 EET Soumya Koduri wrote: >>>>>> On 12/24/2015 09:17 PM, Oleksandr Natalenko wrote: >>>>>>> Another addition: it seems to be GlusterFS API library memory leak >>>>>>> because NFS-Ganesha also consumes huge amount of memory while doing >>>>>>> ordinary "find . -type f" via NFSv4.2 on remote client. Here is memory >>>>>>> usage: >>>>>>> >>>>>>> ==>>>>>>> root 5416 34.2 78.5 2047176 1480552 ? Ssl 12:02 117:54 >>>>>>> /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f >>>>>>> /etc/ganesha/ganesha.conf -N NIV_EVENT >>>>>>> ==>>>>>>> >>>>>>> 1.4G is too much for simple stat() :(. >>>>>>> >>>>>>> Ideas? >>>>>> >>>>>> nfs-ganesha also has cache layer which can scale to millions of entries >>>>>> depending on the number of files/directories being looked upon. However >>>>>> there are parameters to tune it. So either try stat with few entries or >>>>>> add below block in nfs-ganesha.conf file, set low limits and check the >>>>>> difference. That may help us narrow down how much memory actually >>>>>> consumed by core nfs-ganesha and gfAPI. >>>>>> >>>>>> CACHEINODE { >>>>>> >>>>>> Cache_Size(uint32, range 1 to UINT32_MAX, default 32633); # >>>>>> >>>>>> cache size >>>>>> >>>>>> Entries_HWMark(uint32, range 1 to UINT32_MAX, default 100000); >>>>>> >>>>>> #Max no. >>>>>> of entries in the cache. >>>>>> } >>>>>> >>>>>> Thanks, >>>>>> Soumya >>>>>> >>>>>>> 24.12.2015 16:32, Oleksandr Natalenko ???????: >>>>>>>> Still actual issue for 3.7.6. Any suggestions? >>>>>>>> >>>>>>>> 24.09.2015 10:14, Oleksandr Natalenko ???????: >>>>>>>>> In our GlusterFS deployment we've encountered something like memory >>>>>>>>> leak in GlusterFS FUSE client. >>>>>>>>> >>>>>>>>> We use replicated (?2) GlusterFS volume to store mail (exim+dovecot, >>>>>>>>> maildir format). Here is inode stats for both bricks and mountpoint: >>>>>>>>> >>>>>>>>> ==>>>>>>>>> Brick 1 (Server 1): >>>>>>>>> >>>>>>>>> Filesystem Inodes IUsed >>>>>>>>> >>>>>>>>> IFree IUse% Mounted on >>>>>>>>> >>>>>>>>> /dev/mapper/vg_vd1_misc-lv08_mail 578768144 >>>>>>>>> 10954918 >>>>>>>>> >>>>>>>>> 567813226 2% /bricks/r6sdLV08_vd1_mail >>>>>>>>> >>>>>>>>> Brick 2 (Server 2): >>>>>>>>> >>>>>>>>> Filesystem Inodes IUsed >>>>>>>>> >>>>>>>>> IFree IUse% Mounted on >>>>>>>>> >>>>>>>>> /dev/mapper/vg_vd0_misc-lv07_mail 578767984 >>>>>>>>> 10954913 >>>>>>>>> >>>>>>>>> 567813071 2% /bricks/r6sdLV07_vd0_mail >>>>>>>>> >>>>>>>>> Mountpoint (Server 3): >>>>>>>>> >>>>>>>>> Filesystem Inodes IUsed IFree >>>>>>>>> IUse% Mounted on >>>>>>>>> glusterfs.xxx:mail 578767760 10954915 567812845 >>>>>>>>> 2% /var/spool/mail/virtual >>>>>>>>> ==>>>>>>>>> >>>>>>>>> glusterfs.xxx domain has two A records for both Server 1 and >>>>>>>>> Server 2. >>>>>>>>> >>>>>>>>> Here is volume info: >>>>>>>>> >>>>>>>>> ==>>>>>>>>> Volume Name: mail >>>>>>>>> Type: Replicate >>>>>>>>> Volume ID: f564e85c-7aa6-4170-9417-1f501aa98cd2 >>>>>>>>> Status: Started >>>>>>>>> Number of Bricks: 1 x 2 = 2 >>>>>>>>> Transport-type: tcp >>>>>>>>> Bricks: >>>>>>>>> Brick1: server1.xxx:/bricks/r6sdLV08_vd1_mail/mail >>>>>>>>> Brick2: server2.xxx:/bricks/r6sdLV07_vd0_mail/mail >>>>>>>>> Options Reconfigured: >>>>>>>>> nfs.rpc-auth-allow: 1.2.4.0/24,4.5.6.0/24 >>>>>>>>> features.cache-invalidation-timeout: 10 >>>>>>>>> performance.stat-prefetch: off >>>>>>>>> performance.quick-read: on >>>>>>>>> performance.read-ahead: off >>>>>>>>> performance.flush-behind: on >>>>>>>>> performance.write-behind: on >>>>>>>>> performance.io-thread-count: 4 >>>>>>>>> performance.cache-max-file-size: 1048576 >>>>>>>>> performance.cache-size: 67108864 >>>>>>>>> performance.readdir-ahead: off >>>>>>>>> ==>>>>>>>>> >>>>>>>>> Soon enough after mounting and exim/dovecot start, glusterfs client >>>>>>>>> process begins to consume huge amount of RAM: >>>>>>>>> >>>>>>>>> ==>>>>>>>>> user at server3 ~$ ps aux | grep glusterfs | grep mail >>>>>>>>> root 28895 14.4 15.0 15510324 14908868 ? Ssl Sep03 4310:05 >>>>>>>>> /usr/sbin/glusterfs --fopen-keep-cache --direct-io-mode=disable >>>>>>>>> --volfile-server=glusterfs.xxx --volfile-id=mail >>>>>>>>> /var/spool/mail/virtual >>>>>>>>> ==>>>>>>>>> >>>>>>>>> That is, ~15 GiB of RAM. >>>>>>>>> >>>>>>>>> Also we've tried to use mountpoint withing separate KVM VM with 2 >>>>>>>>> or 3 >>>>>>>>> GiB of RAM, and soon after starting mail daemons got OOM killer for >>>>>>>>> glusterfs client process. >>>>>>>>> >>>>>>>>> Mounting same share via NFS works just fine. Also, we have much less >>>>>>>>> iowait and loadavg on client side with NFS. >>>>>>>>> >>>>>>>>> Also, we've tried to change IO threads count and cache size in order >>>>>>>>> to limit memory usage with no luck. As you can see, total cache size >>>>>>>>> is 4?64==256 MiB (compare to 15 GiB). >>>>>>>>> >>>>>>>>> Enabling-disabling stat-prefetch, read-ahead and readdir-ahead >>>>>>>>> didn't >>>>>>>>> help as well. >>>>>>>>> >>>>>>>>> Here are volume memory stats: >>>>>>>>> >>>>>>>>> ==>>>>>>>>> Memory status for volume : mail >>>>>>>>> ---------------------------------------------- >>>>>>>>> Brick : server1.xxx:/bricks/r6sdLV08_vd1_mail/mail >>>>>>>>> Mallinfo >>>>>>>>> -------- >>>>>>>>> Arena : 36859904 >>>>>>>>> Ordblks : 10357 >>>>>>>>> Smblks : 519 >>>>>>>>> Hblks : 21 >>>>>>>>> Hblkhd : 30515200 >>>>>>>>> Usmblks : 0 >>>>>>>>> Fsmblks : 53440 >>>>>>>>> Uordblks : 18604144 >>>>>>>>> Fordblks : 18255760 >>>>>>>>> Keepcost : 114112 >>>>>>>>> >>>>>>>>> Mempool Stats >>>>>>>>> ------------- >>>>>>>>> Name HotCount ColdCount PaddedSizeof >>>>>>>>> AllocCount MaxAlloc Misses Max-StdAlloc >>>>>>>>> ---- -------- --------- ------------ >>>>>>>>> ---------- -------- -------- ------------ >>>>>>>>> mail-server:fd_t 0 1024 108 >>>>>>>>> 30773120 137 0 0 >>>>>>>>> mail-server:dentry_t 16110 274 84 >>>>>>>>> 235676148 16384 1106499 1152 >>>>>>>>> mail-server:inode_t 16363 21 156 >>>>>>>>> 237216876 16384 1876651 1169 >>>>>>>>> mail-trash:fd_t 0 1024 108 >>>>>>>>> >>>>>>>>> 0 0 0 0 >>>>>>>>> >>>>>>>>> mail-trash:dentry_t 0 32768 84 >>>>>>>>> >>>>>>>>> 0 0 0 0 >>>>>>>>> >>>>>>>>> mail-trash:inode_t 4 32764 156 >>>>>>>>> >>>>>>>>> 4 4 0 0 >>>>>>>>> >>>>>>>>> mail-trash:trash_local_t 0 64 8628 >>>>>>>>> >>>>>>>>> 0 0 0 0 >>>>>>>>> >>>>>>>>> mail-changetimerecorder:gf_ctr_local_t 0 64 >>>>>>>>> 16540 0 0 0 0 >>>>>>>>> mail-changelog:rpcsvc_request_t 0 8 2828 >>>>>>>>> >>>>>>>>> 0 0 0 0 >>>>>>>>> >>>>>>>>> mail-changelog:changelog_local_t 0 64 116 >>>>>>>>> >>>>>>>>> 0 0 0 0 >>>>>>>>> >>>>>>>>> mail-bitrot-stub:br_stub_local_t 0 512 84 >>>>>>>>> 79204 4 0 0 >>>>>>>>> mail-locks:pl_local_t 0 32 148 >>>>>>>>> 6812757 4 0 0 >>>>>>>>> mail-upcall:upcall_local_t 0 512 108 >>>>>>>>> >>>>>>>>> 0 0 0 0 >>>>>>>>> >>>>>>>>> mail-marker:marker_local_t 0 128 332 >>>>>>>>> 64980 3 0 0 >>>>>>>>> mail-quota:quota_local_t 0 64 476 >>>>>>>>> >>>>>>>>> 0 0 0 0 >>>>>>>>> >>>>>>>>> mail-server:rpcsvc_request_t 0 512 2828 >>>>>>>>> 45462533 34 0 0 >>>>>>>>> glusterfs:struct saved_frame 0 8 124 >>>>>>>>> >>>>>>>>> 2 2 0 0 >>>>>>>>> >>>>>>>>> glusterfs:struct rpc_req 0 8 588 >>>>>>>>> >>>>>>>>> 2 2 0 0 >>>>>>>>> >>>>>>>>> glusterfs:rpcsvc_request_t 1 7 2828 >>>>>>>>> >>>>>>>>> 2 1 0 0 >>>>>>>>> >>>>>>>>> glusterfs:log_buf_t 5 251 140 >>>>>>>>> 3452 6 0 0 >>>>>>>>> glusterfs:data_t 242 16141 52 >>>>>>>>> 480115498 664 0 0 >>>>>>>>> glusterfs:data_pair_t 230 16153 68 >>>>>>>>> 179483528 275 0 0 >>>>>>>>> glusterfs:dict_t 23 4073 140 >>>>>>>>> 303751675 627 0 0 >>>>>>>>> glusterfs:call_stub_t 0 1024 3764 >>>>>>>>> 45290655 34 0 0 >>>>>>>>> glusterfs:call_stack_t 1 1023 1708 >>>>>>>>> 43598469 34 0 0 >>>>>>>>> glusterfs:call_frame_t 1 4095 172 >>>>>>>>> 336219655 184 0 0 >>>>>>>>> ---------------------------------------------- >>>>>>>>> Brick : server2.xxx:/bricks/r6sdLV07_vd0_mail/mail >>>>>>>>> Mallinfo >>>>>>>>> -------- >>>>>>>>> Arena : 38174720 >>>>>>>>> Ordblks : 9041 >>>>>>>>> Smblks : 507 >>>>>>>>> Hblks : 21 >>>>>>>>> Hblkhd : 30515200 >>>>>>>>> Usmblks : 0 >>>>>>>>> Fsmblks : 51712 >>>>>>>>> Uordblks : 19415008 >>>>>>>>> Fordblks : 18759712 >>>>>>>>> Keepcost : 114848 >>>>>>>>> >>>>>>>>> Mempool Stats >>>>>>>>> ------------- >>>>>>>>> Name HotCount ColdCount PaddedSizeof >>>>>>>>> AllocCount MaxAlloc Misses Max-StdAlloc >>>>>>>>> ---- -------- --------- ------------ >>>>>>>>> ---------- -------- -------- ------------ >>>>>>>>> mail-server:fd_t 0 1024 108 >>>>>>>>> 2373075 133 0 0 >>>>>>>>> mail-server:dentry_t 14114 2270 84 >>>>>>>>> 3513654 16384 2300 267 >>>>>>>>> mail-server:inode_t 16374 10 156 >>>>>>>>> 6766642 16384 194635 1279 >>>>>>>>> mail-trash:fd_t 0 1024 108 >>>>>>>>> >>>>>>>>> 0 0 0 0 >>>>>>>>> >>>>>>>>> mail-trash:dentry_t 0 32768 84 >>>>>>>>> >>>>>>>>> 0 0 0 0 >>>>>>>>> >>>>>>>>> mail-trash:inode_t 4 32764 156 >>>>>>>>> >>>>>>>>> 4 4 0 0 >>>>>>>>> >>>>>>>>> mail-trash:trash_local_t 0 64 8628 >>>>>>>>> >>>>>>>>> 0 0 0 0 >>>>>>>>> >>>>>>>>> mail-changetimerecorder:gf_ctr_local_t 0 64 >>>>>>>>> 16540 0 0 0 0 >>>>>>>>> mail-changelog:rpcsvc_request_t 0 8 2828 >>>>>>>>> >>>>>>>>> 0 0 0 0 >>>>>>>>> >>>>>>>>> mail-changelog:changelog_local_t 0 64 116 >>>>>>>>> >>>>>>>>> 0 0 0 0 >>>>>>>>> >>>>>>>>> mail-bitrot-stub:br_stub_local_t 0 512 84 >>>>>>>>> 71354 4 0 0 >>>>>>>>> mail-locks:pl_local_t 0 32 148 >>>>>>>>> 8135032 4 0 0 >>>>>>>>> mail-upcall:upcall_local_t 0 512 108 >>>>>>>>> >>>>>>>>> 0 0 0 0 >>>>>>>>> >>>>>>>>> mail-marker:marker_local_t 0 128 332 >>>>>>>>> 65005 3 0 0 >>>>>>>>> mail-quota:quota_local_t 0 64 476 >>>>>>>>> >>>>>>>>> 0 0 0 0 >>>>>>>>> >>>>>>>>> mail-server:rpcsvc_request_t 0 512 2828 >>>>>>>>> 12882393 30 0 0 >>>>>>>>> glusterfs:struct saved_frame 0 8 124 >>>>>>>>> >>>>>>>>> 2 2 0 0 >>>>>>>>> >>>>>>>>> glusterfs:struct rpc_req 0 8 588 >>>>>>>>> >>>>>>>>> 2 2 0 0 >>>>>>>>> >>>>>>>>> glusterfs:rpcsvc_request_t 1 7 2828 >>>>>>>>> >>>>>>>>> 2 1 0 0 >>>>>>>>> >>>>>>>>> glusterfs:log_buf_t 5 251 140 >>>>>>>>> 3443 6 0 0 >>>>>>>>> glusterfs:data_t 242 16141 52 >>>>>>>>> 138743429 290 0 0 >>>>>>>>> glusterfs:data_pair_t 230 16153 68 >>>>>>>>> 126649864 270 0 0 >>>>>>>>> glusterfs:dict_t 23 4073 140 >>>>>>>>> 20356289 63 0 0 >>>>>>>>> glusterfs:call_stub_t 0 1024 3764 >>>>>>>>> 13678560 31 0 0 >>>>>>>>> glusterfs:call_stack_t 1 1023 1708 >>>>>>>>> 11011561 30 0 0 >>>>>>>>> glusterfs:call_frame_t 1 4095 172 >>>>>>>>> 125764190 193 0 0 >>>>>>>>> ---------------------------------------------- >>>>>>>>> ==>>>>>>>>> >>>>>>>>> So, my questions are: >>>>>>>>> >>>>>>>>> 1) what one should do to limit GlusterFS FUSE client memory usage? >>>>>>>>> 2) what one should do to prevent client high loadavg because of high >>>>>>>>> iowait because of multiple concurrent volume users? >>>>>>>>> >>>>>>>>> Server/client OS is CentOS 7.1, GlusterFS server version is 3.7.3, >>>>>>>>> GlusterFS client version is 3.7.4. >>>>>>>>> >>>>>>>>> Any additional info needed? >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Gluster-users mailing list >>>>>>> Gluster-users at gluster.org >>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users > >