On 01/06/2016 03:53 AM, Oleksandr Natalenko wrote:> OK, I've repeated the same traversing test with patched GlusterFS API, and > here is new Valgrind log: > > https://gist.github.com/17ecb16a11c9aed957f5 >Fuse mount doesn't use gfapi helper. Does your above GlusterFS API application call glfs_fini() during exit? glfs_fini() is responsible for freeing the memory consumed by gfAPI applications. Could you repeat the test with nfs-ganesha (which for sure calls glfs_fini() and purges inodes if exceeds its inode cache limit) if possible. Thanks, Soumya> Still leaks. > > On ????????, 5 ????? 2016 ?. 22:52:25 EET Soumya Koduri wrote: >> On 01/05/2016 05:56 PM, Oleksandr Natalenko wrote: >>> Unfortunately, both patches didn't make any difference for me. >>> >>> I've patched 3.7.6 with both patches, recompiled and installed patched >>> GlusterFS package on client side and mounted volume with ~2M of files. >>> The I performed usual tree traverse with simple "find". >>> >>> Memory RES value went from ~130M at the moment of mounting to ~1.5G >>> after traversing the volume for ~40 mins. Valgrind log still shows lots >>> of leaks. Here it is: >>> >>> https://gist.github.com/56906ca6e657c4ffa4a1 >> >> Looks like you had done fuse mount. The patches which I have pasted >> below apply to gfapi/nfs-ganesha applications. >> >> Also, to resolve the nfs-ganesha issue which I had mentioned below (in >> case if Entries_HWMARK option gets changed), I have posted below fix - >> https://review.gerrithub.io/#/c/258687 >> >> Thanks, >> Soumya >> >>> Ideas? >>> >>> 05.01.2016 12:31, Soumya Koduri ???????: >>>> I tried to debug the inode* related leaks and seen some improvements >>>> after applying the below patches when ran the same test (but will >>>> smaller load). Could you please apply those patches & confirm the >>>> same? >>>> >>>> a) http://review.gluster.org/13125 >>>> >>>> This will fix the inodes & their ctx related leaks during unexport and >>>> the program exit. Please check the valgrind output after applying the >>>> patch. It should not list any inodes related memory as lost. >>>> >>>> b) http://review.gluster.org/13096 >>>> >>>> The reason the change in Entries_HWMARK (in your earlier mail) dint >>>> have much effect is that the inode_nlookup count doesn't become zero >>>> for those handles/inodes being closed by ganesha. Hence those inodes >>>> shall get added to inode lru list instead of purge list which shall >>>> get forcefully purged only when the number of gfapi inode table >>>> entries reaches its limit (which is 137012). >>>> >>>> This patch fixes those 'nlookup' counts. Please apply this patch and >>>> reduce 'Entries_HWMARK' to much lower value and check if it decreases >>>> the in-memory being consumed by ganesha process while being active. >>>> >>>> CACHEINODE { >>>> >>>> Entries_HWMark = 500; >>>> >>>> } >>>> >>>> >>>> Note: I see an issue with nfs-ganesha during exit when the option >>>> 'Entries_HWMARK' gets changed. This is not related to any of the above >>>> patches (or rather Gluster) and I am currently debugging it. >>>> >>>> Thanks, >>>> Soumya >>>> >>>> On 12/25/2015 11:34 PM, Oleksandr Natalenko wrote: >>>>> 1. test with Cache_Size = 256 and Entries_HWMark = 4096 >>>>> >>>>> Before find . -type f: >>>>> >>>>> root 3120 0.6 11.0 879120 208408 ? Ssl 17:39 0:00 >>>>> /usr/bin/ >>>>> ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N >>>>> NIV_EVENT >>>>> >>>>> After: >>>>> >>>>> root 3120 11.4 24.3 1170076 458168 ? Ssl 17:39 13:39 >>>>> /usr/bin/ >>>>> ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N >>>>> NIV_EVENT >>>>> >>>>> ~250M leak. >>>>> >>>>> 2. test with default values (after ganesha restart) >>>>> >>>>> Before: >>>>> >>>>> root 24937 1.3 10.4 875016 197808 ? Ssl 19:39 0:00 >>>>> /usr/bin/ >>>>> ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N >>>>> NIV_EVENT >>>>> >>>>> After: >>>>> >>>>> root 24937 3.5 18.9 1022544 356340 ? Ssl 19:39 0:40 >>>>> /usr/bin/ >>>>> ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N >>>>> NIV_EVENT >>>>> >>>>> ~159M leak. >>>>> >>>>> No reasonable correlation detected. Second test was finished much >>>>> faster than >>>>> first (I guess, server-side GlusterFS cache or server kernel page >>>>> cache is the >>>>> cause). >>>>> >>>>> There are ~1.8M files on this test volume. >>>>> >>>>> On ????????, 25 ?????? 2015 ?. 20:28:13 EET Soumya Koduri wrote: >>>>>> On 12/24/2015 09:17 PM, Oleksandr Natalenko wrote: >>>>>>> Another addition: it seems to be GlusterFS API library memory leak >>>>>>> because NFS-Ganesha also consumes huge amount of memory while doing >>>>>>> ordinary "find . -type f" via NFSv4.2 on remote client. Here is memory >>>>>>> usage: >>>>>>> >>>>>>> ==>>>>>>> root 5416 34.2 78.5 2047176 1480552 ? Ssl 12:02 117:54 >>>>>>> /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f >>>>>>> /etc/ganesha/ganesha.conf -N NIV_EVENT >>>>>>> ==>>>>>>> >>>>>>> 1.4G is too much for simple stat() :(. >>>>>>> >>>>>>> Ideas? >>>>>> >>>>>> nfs-ganesha also has cache layer which can scale to millions of entries >>>>>> depending on the number of files/directories being looked upon. However >>>>>> there are parameters to tune it. So either try stat with few entries or >>>>>> add below block in nfs-ganesha.conf file, set low limits and check the >>>>>> difference. That may help us narrow down how much memory actually >>>>>> consumed by core nfs-ganesha and gfAPI. >>>>>> >>>>>> CACHEINODE { >>>>>> >>>>>> Cache_Size(uint32, range 1 to UINT32_MAX, default 32633); # >>>>>> >>>>>> cache size >>>>>> >>>>>> Entries_HWMark(uint32, range 1 to UINT32_MAX, default 100000); >>>>>> >>>>>> #Max no. >>>>>> of entries in the cache. >>>>>> } >>>>>> >>>>>> Thanks, >>>>>> Soumya >>>>>> >>>>>>> 24.12.2015 16:32, Oleksandr Natalenko ???????: >>>>>>>> Still actual issue for 3.7.6. Any suggestions? >>>>>>>> >>>>>>>> 24.09.2015 10:14, Oleksandr Natalenko ???????: >>>>>>>>> In our GlusterFS deployment we've encountered something like memory >>>>>>>>> leak in GlusterFS FUSE client. >>>>>>>>> >>>>>>>>> We use replicated (?2) GlusterFS volume to store mail (exim+dovecot, >>>>>>>>> maildir format). Here is inode stats for both bricks and mountpoint: >>>>>>>>> >>>>>>>>> ==>>>>>>>>> Brick 1 (Server 1): >>>>>>>>> >>>>>>>>> Filesystem Inodes IUsed >>>>>>>>> >>>>>>>>> IFree IUse% Mounted on >>>>>>>>> >>>>>>>>> /dev/mapper/vg_vd1_misc-lv08_mail 578768144 >>>>>>>>> 10954918 >>>>>>>>> >>>>>>>>> 567813226 2% /bricks/r6sdLV08_vd1_mail >>>>>>>>> >>>>>>>>> Brick 2 (Server 2): >>>>>>>>> >>>>>>>>> Filesystem Inodes IUsed >>>>>>>>> >>>>>>>>> IFree IUse% Mounted on >>>>>>>>> >>>>>>>>> /dev/mapper/vg_vd0_misc-lv07_mail 578767984 >>>>>>>>> 10954913 >>>>>>>>> >>>>>>>>> 567813071 2% /bricks/r6sdLV07_vd0_mail >>>>>>>>> >>>>>>>>> Mountpoint (Server 3): >>>>>>>>> >>>>>>>>> Filesystem Inodes IUsed IFree >>>>>>>>> IUse% Mounted on >>>>>>>>> glusterfs.xxx:mail 578767760 10954915 567812845 >>>>>>>>> 2% /var/spool/mail/virtual >>>>>>>>> ==>>>>>>>>> >>>>>>>>> glusterfs.xxx domain has two A records for both Server 1 and >>>>>>>>> Server 2. >>>>>>>>> >>>>>>>>> Here is volume info: >>>>>>>>> >>>>>>>>> ==>>>>>>>>> Volume Name: mail >>>>>>>>> Type: Replicate >>>>>>>>> Volume ID: f564e85c-7aa6-4170-9417-1f501aa98cd2 >>>>>>>>> Status: Started >>>>>>>>> Number of Bricks: 1 x 2 = 2 >>>>>>>>> Transport-type: tcp >>>>>>>>> Bricks: >>>>>>>>> Brick1: server1.xxx:/bricks/r6sdLV08_vd1_mail/mail >>>>>>>>> Brick2: server2.xxx:/bricks/r6sdLV07_vd0_mail/mail >>>>>>>>> Options Reconfigured: >>>>>>>>> nfs.rpc-auth-allow: 1.2.4.0/24,4.5.6.0/24 >>>>>>>>> features.cache-invalidation-timeout: 10 >>>>>>>>> performance.stat-prefetch: off >>>>>>>>> performance.quick-read: on >>>>>>>>> performance.read-ahead: off >>>>>>>>> performance.flush-behind: on >>>>>>>>> performance.write-behind: on >>>>>>>>> performance.io-thread-count: 4 >>>>>>>>> performance.cache-max-file-size: 1048576 >>>>>>>>> performance.cache-size: 67108864 >>>>>>>>> performance.readdir-ahead: off >>>>>>>>> ==>>>>>>>>> >>>>>>>>> Soon enough after mounting and exim/dovecot start, glusterfs client >>>>>>>>> process begins to consume huge amount of RAM: >>>>>>>>> >>>>>>>>> ==>>>>>>>>> user at server3 ~$ ps aux | grep glusterfs | grep mail >>>>>>>>> root 28895 14.4 15.0 15510324 14908868 ? Ssl Sep03 4310:05 >>>>>>>>> /usr/sbin/glusterfs --fopen-keep-cache --direct-io-mode=disable >>>>>>>>> --volfile-server=glusterfs.xxx --volfile-id=mail >>>>>>>>> /var/spool/mail/virtual >>>>>>>>> ==>>>>>>>>> >>>>>>>>> That is, ~15 GiB of RAM. >>>>>>>>> >>>>>>>>> Also we've tried to use mountpoint withing separate KVM VM with 2 >>>>>>>>> or 3 >>>>>>>>> GiB of RAM, and soon after starting mail daemons got OOM killer for >>>>>>>>> glusterfs client process. >>>>>>>>> >>>>>>>>> Mounting same share via NFS works just fine. Also, we have much less >>>>>>>>> iowait and loadavg on client side with NFS. >>>>>>>>> >>>>>>>>> Also, we've tried to change IO threads count and cache size in order >>>>>>>>> to limit memory usage with no luck. As you can see, total cache size >>>>>>>>> is 4?64==256 MiB (compare to 15 GiB). >>>>>>>>> >>>>>>>>> Enabling-disabling stat-prefetch, read-ahead and readdir-ahead >>>>>>>>> didn't >>>>>>>>> help as well. >>>>>>>>> >>>>>>>>> Here are volume memory stats: >>>>>>>>> >>>>>>>>> ==>>>>>>>>> Memory status for volume : mail >>>>>>>>> ---------------------------------------------- >>>>>>>>> Brick : server1.xxx:/bricks/r6sdLV08_vd1_mail/mail >>>>>>>>> Mallinfo >>>>>>>>> -------- >>>>>>>>> Arena : 36859904 >>>>>>>>> Ordblks : 10357 >>>>>>>>> Smblks : 519 >>>>>>>>> Hblks : 21 >>>>>>>>> Hblkhd : 30515200 >>>>>>>>> Usmblks : 0 >>>>>>>>> Fsmblks : 53440 >>>>>>>>> Uordblks : 18604144 >>>>>>>>> Fordblks : 18255760 >>>>>>>>> Keepcost : 114112 >>>>>>>>> >>>>>>>>> Mempool Stats >>>>>>>>> ------------- >>>>>>>>> Name HotCount ColdCount PaddedSizeof >>>>>>>>> AllocCount MaxAlloc Misses Max-StdAlloc >>>>>>>>> ---- -------- --------- ------------ >>>>>>>>> ---------- -------- -------- ------------ >>>>>>>>> mail-server:fd_t 0 1024 108 >>>>>>>>> 30773120 137 0 0 >>>>>>>>> mail-server:dentry_t 16110 274 84 >>>>>>>>> 235676148 16384 1106499 1152 >>>>>>>>> mail-server:inode_t 16363 21 156 >>>>>>>>> 237216876 16384 1876651 1169 >>>>>>>>> mail-trash:fd_t 0 1024 108 >>>>>>>>> >>>>>>>>> 0 0 0 0 >>>>>>>>> >>>>>>>>> mail-trash:dentry_t 0 32768 84 >>>>>>>>> >>>>>>>>> 0 0 0 0 >>>>>>>>> >>>>>>>>> mail-trash:inode_t 4 32764 156 >>>>>>>>> >>>>>>>>> 4 4 0 0 >>>>>>>>> >>>>>>>>> mail-trash:trash_local_t 0 64 8628 >>>>>>>>> >>>>>>>>> 0 0 0 0 >>>>>>>>> >>>>>>>>> mail-changetimerecorder:gf_ctr_local_t 0 64 >>>>>>>>> 16540 0 0 0 0 >>>>>>>>> mail-changelog:rpcsvc_request_t 0 8 2828 >>>>>>>>> >>>>>>>>> 0 0 0 0 >>>>>>>>> >>>>>>>>> mail-changelog:changelog_local_t 0 64 116 >>>>>>>>> >>>>>>>>> 0 0 0 0 >>>>>>>>> >>>>>>>>> mail-bitrot-stub:br_stub_local_t 0 512 84 >>>>>>>>> 79204 4 0 0 >>>>>>>>> mail-locks:pl_local_t 0 32 148 >>>>>>>>> 6812757 4 0 0 >>>>>>>>> mail-upcall:upcall_local_t 0 512 108 >>>>>>>>> >>>>>>>>> 0 0 0 0 >>>>>>>>> >>>>>>>>> mail-marker:marker_local_t 0 128 332 >>>>>>>>> 64980 3 0 0 >>>>>>>>> mail-quota:quota_local_t 0 64 476 >>>>>>>>> >>>>>>>>> 0 0 0 0 >>>>>>>>> >>>>>>>>> mail-server:rpcsvc_request_t 0 512 2828 >>>>>>>>> 45462533 34 0 0 >>>>>>>>> glusterfs:struct saved_frame 0 8 124 >>>>>>>>> >>>>>>>>> 2 2 0 0 >>>>>>>>> >>>>>>>>> glusterfs:struct rpc_req 0 8 588 >>>>>>>>> >>>>>>>>> 2 2 0 0 >>>>>>>>> >>>>>>>>> glusterfs:rpcsvc_request_t 1 7 2828 >>>>>>>>> >>>>>>>>> 2 1 0 0 >>>>>>>>> >>>>>>>>> glusterfs:log_buf_t 5 251 140 >>>>>>>>> 3452 6 0 0 >>>>>>>>> glusterfs:data_t 242 16141 52 >>>>>>>>> 480115498 664 0 0 >>>>>>>>> glusterfs:data_pair_t 230 16153 68 >>>>>>>>> 179483528 275 0 0 >>>>>>>>> glusterfs:dict_t 23 4073 140 >>>>>>>>> 303751675 627 0 0 >>>>>>>>> glusterfs:call_stub_t 0 1024 3764 >>>>>>>>> 45290655 34 0 0 >>>>>>>>> glusterfs:call_stack_t 1 1023 1708 >>>>>>>>> 43598469 34 0 0 >>>>>>>>> glusterfs:call_frame_t 1 4095 172 >>>>>>>>> 336219655 184 0 0 >>>>>>>>> ---------------------------------------------- >>>>>>>>> Brick : server2.xxx:/bricks/r6sdLV07_vd0_mail/mail >>>>>>>>> Mallinfo >>>>>>>>> -------- >>>>>>>>> Arena : 38174720 >>>>>>>>> Ordblks : 9041 >>>>>>>>> Smblks : 507 >>>>>>>>> Hblks : 21 >>>>>>>>> Hblkhd : 30515200 >>>>>>>>> Usmblks : 0 >>>>>>>>> Fsmblks : 51712 >>>>>>>>> Uordblks : 19415008 >>>>>>>>> Fordblks : 18759712 >>>>>>>>> Keepcost : 114848 >>>>>>>>> >>>>>>>>> Mempool Stats >>>>>>>>> ------------- >>>>>>>>> Name HotCount ColdCount PaddedSizeof >>>>>>>>> AllocCount MaxAlloc Misses Max-StdAlloc >>>>>>>>> ---- -------- --------- ------------ >>>>>>>>> ---------- -------- -------- ------------ >>>>>>>>> mail-server:fd_t 0 1024 108 >>>>>>>>> 2373075 133 0 0 >>>>>>>>> mail-server:dentry_t 14114 2270 84 >>>>>>>>> 3513654 16384 2300 267 >>>>>>>>> mail-server:inode_t 16374 10 156 >>>>>>>>> 6766642 16384 194635 1279 >>>>>>>>> mail-trash:fd_t 0 1024 108 >>>>>>>>> >>>>>>>>> 0 0 0 0 >>>>>>>>> >>>>>>>>> mail-trash:dentry_t 0 32768 84 >>>>>>>>> >>>>>>>>> 0 0 0 0 >>>>>>>>> >>>>>>>>> mail-trash:inode_t 4 32764 156 >>>>>>>>> >>>>>>>>> 4 4 0 0 >>>>>>>>> >>>>>>>>> mail-trash:trash_local_t 0 64 8628 >>>>>>>>> >>>>>>>>> 0 0 0 0 >>>>>>>>> >>>>>>>>> mail-changetimerecorder:gf_ctr_local_t 0 64 >>>>>>>>> 16540 0 0 0 0 >>>>>>>>> mail-changelog:rpcsvc_request_t 0 8 2828 >>>>>>>>> >>>>>>>>> 0 0 0 0 >>>>>>>>> >>>>>>>>> mail-changelog:changelog_local_t 0 64 116 >>>>>>>>> >>>>>>>>> 0 0 0 0 >>>>>>>>> >>>>>>>>> mail-bitrot-stub:br_stub_local_t 0 512 84 >>>>>>>>> 71354 4 0 0 >>>>>>>>> mail-locks:pl_local_t 0 32 148 >>>>>>>>> 8135032 4 0 0 >>>>>>>>> mail-upcall:upcall_local_t 0 512 108 >>>>>>>>> >>>>>>>>> 0 0 0 0 >>>>>>>>> >>>>>>>>> mail-marker:marker_local_t 0 128 332 >>>>>>>>> 65005 3 0 0 >>>>>>>>> mail-quota:quota_local_t 0 64 476 >>>>>>>>> >>>>>>>>> 0 0 0 0 >>>>>>>>> >>>>>>>>> mail-server:rpcsvc_request_t 0 512 2828 >>>>>>>>> 12882393 30 0 0 >>>>>>>>> glusterfs:struct saved_frame 0 8 124 >>>>>>>>> >>>>>>>>> 2 2 0 0 >>>>>>>>> >>>>>>>>> glusterfs:struct rpc_req 0 8 588 >>>>>>>>> >>>>>>>>> 2 2 0 0 >>>>>>>>> >>>>>>>>> glusterfs:rpcsvc_request_t 1 7 2828 >>>>>>>>> >>>>>>>>> 2 1 0 0 >>>>>>>>> >>>>>>>>> glusterfs:log_buf_t 5 251 140 >>>>>>>>> 3443 6 0 0 >>>>>>>>> glusterfs:data_t 242 16141 52 >>>>>>>>> 138743429 290 0 0 >>>>>>>>> glusterfs:data_pair_t 230 16153 68 >>>>>>>>> 126649864 270 0 0 >>>>>>>>> glusterfs:dict_t 23 4073 140 >>>>>>>>> 20356289 63 0 0 >>>>>>>>> glusterfs:call_stub_t 0 1024 3764 >>>>>>>>> 13678560 31 0 0 >>>>>>>>> glusterfs:call_stack_t 1 1023 1708 >>>>>>>>> 11011561 30 0 0 >>>>>>>>> glusterfs:call_frame_t 1 4095 172 >>>>>>>>> 125764190 193 0 0 >>>>>>>>> ---------------------------------------------- >>>>>>>>> ==>>>>>>>>> >>>>>>>>> So, my questions are: >>>>>>>>> >>>>>>>>> 1) what one should do to limit GlusterFS FUSE client memory usage? >>>>>>>>> 2) what one should do to prevent client high loadavg because of high >>>>>>>>> iowait because of multiple concurrent volume users? >>>>>>>>> >>>>>>>>> Server/client OS is CentOS 7.1, GlusterFS server version is 3.7.3, >>>>>>>>> GlusterFS client version is 3.7.4. >>>>>>>>> >>>>>>>>> Any additional info needed? >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Gluster-users mailing list >>>>>>> Gluster-users at gluster.org >>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users > >
Oleksandr Natalenko
2016-Jan-06 08:28 UTC
[Gluster-users] Memory leak in GlusterFS FUSE client
OK, here is valgrind log of patched Ganesha (I took recent version of your patchset, 8685abfc6d) with Entries_HWMARK set to 500. https://gist.github.com/5397c152a259b9600af0 See no huge runtime leaks now. However, I've repeated this test with another volume in replica and got the following Ganesha error: ==ganesha.nfsd: inode.c:716: __inode_forget: Assertion `inode->nlookup >= nlookup' failed. == 06.01.2016 08:40, Soumya Koduri ???????:> On 01/06/2016 03:53 AM, Oleksandr Natalenko wrote: >> OK, I've repeated the same traversing test with patched GlusterFS API, >> and >> here is new Valgrind log: >> >> https://gist.github.com/17ecb16a11c9aed957f5 >> > Fuse mount doesn't use gfapi helper. Does your above GlusterFS API > application call glfs_fini() during exit? glfs_fini() is responsible > for freeing the memory consumed by gfAPI applications. > > Could you repeat the test with nfs-ganesha (which for sure calls > glfs_fini() and purges inodes if exceeds its inode cache limit) if > possible. > > Thanks, > Soumya > >> Still leaks. >> >> On ????????, 5 ????? 2016 ?. 22:52:25 EET Soumya Koduri wrote: >>> On 01/05/2016 05:56 PM, Oleksandr Natalenko wrote: >>>> Unfortunately, both patches didn't make any difference for me. >>>> >>>> I've patched 3.7.6 with both patches, recompiled and installed >>>> patched >>>> GlusterFS package on client side and mounted volume with ~2M of >>>> files. >>>> The I performed usual tree traverse with simple "find". >>>> >>>> Memory RES value went from ~130M at the moment of mounting to ~1.5G >>>> after traversing the volume for ~40 mins. Valgrind log still shows >>>> lots >>>> of leaks. Here it is: >>>> >>>> https://gist.github.com/56906ca6e657c4ffa4a1 >>> >>> Looks like you had done fuse mount. The patches which I have pasted >>> below apply to gfapi/nfs-ganesha applications. >>> >>> Also, to resolve the nfs-ganesha issue which I had mentioned below >>> (in >>> case if Entries_HWMARK option gets changed), I have posted below fix >>> - >>> https://review.gerrithub.io/#/c/258687 >>> >>> Thanks, >>> Soumya >>> >>>> Ideas? >>>> >>>> 05.01.2016 12:31, Soumya Koduri ???????: >>>>> I tried to debug the inode* related leaks and seen some >>>>> improvements >>>>> after applying the below patches when ran the same test (but will >>>>> smaller load). Could you please apply those patches & confirm the >>>>> same? >>>>> >>>>> a) http://review.gluster.org/13125 >>>>> >>>>> This will fix the inodes & their ctx related leaks during unexport >>>>> and >>>>> the program exit. Please check the valgrind output after applying >>>>> the >>>>> patch. It should not list any inodes related memory as lost. >>>>> >>>>> b) http://review.gluster.org/13096 >>>>> >>>>> The reason the change in Entries_HWMARK (in your earlier mail) dint >>>>> have much effect is that the inode_nlookup count doesn't become >>>>> zero >>>>> for those handles/inodes being closed by ganesha. Hence those >>>>> inodes >>>>> shall get added to inode lru list instead of purge list which shall >>>>> get forcefully purged only when the number of gfapi inode table >>>>> entries reaches its limit (which is 137012). >>>>> >>>>> This patch fixes those 'nlookup' counts. Please apply this patch >>>>> and >>>>> reduce 'Entries_HWMARK' to much lower value and check if it >>>>> decreases >>>>> the in-memory being consumed by ganesha process while being active. >>>>> >>>>> CACHEINODE { >>>>> >>>>> Entries_HWMark = 500; >>>>> >>>>> } >>>>> >>>>> >>>>> Note: I see an issue with nfs-ganesha during exit when the option >>>>> 'Entries_HWMARK' gets changed. This is not related to any of the >>>>> above >>>>> patches (or rather Gluster) and I am currently debugging it. >>>>> >>>>> Thanks, >>>>> Soumya >>>>> >>>>> On 12/25/2015 11:34 PM, Oleksandr Natalenko wrote: >>>>>> 1. test with Cache_Size = 256 and Entries_HWMark = 4096 >>>>>> >>>>>> Before find . -type f: >>>>>> >>>>>> root 3120 0.6 11.0 879120 208408 ? Ssl 17:39 0:00 >>>>>> /usr/bin/ >>>>>> ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf >>>>>> -N >>>>>> NIV_EVENT >>>>>> >>>>>> After: >>>>>> >>>>>> root 3120 11.4 24.3 1170076 458168 ? Ssl 17:39 13:39 >>>>>> /usr/bin/ >>>>>> ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf >>>>>> -N >>>>>> NIV_EVENT >>>>>> >>>>>> ~250M leak. >>>>>> >>>>>> 2. test with default values (after ganesha restart) >>>>>> >>>>>> Before: >>>>>> >>>>>> root 24937 1.3 10.4 875016 197808 ? Ssl 19:39 0:00 >>>>>> /usr/bin/ >>>>>> ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf >>>>>> -N >>>>>> NIV_EVENT >>>>>> >>>>>> After: >>>>>> >>>>>> root 24937 3.5 18.9 1022544 356340 ? Ssl 19:39 0:40 >>>>>> /usr/bin/ >>>>>> ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf >>>>>> -N >>>>>> NIV_EVENT >>>>>> >>>>>> ~159M leak. >>>>>> >>>>>> No reasonable correlation detected. Second test was finished much >>>>>> faster than >>>>>> first (I guess, server-side GlusterFS cache or server kernel page >>>>>> cache is the >>>>>> cause). >>>>>> >>>>>> There are ~1.8M files on this test volume. >>>>>> >>>>>> On ????????, 25 ?????? 2015 ?. 20:28:13 EET Soumya Koduri wrote: >>>>>>> On 12/24/2015 09:17 PM, Oleksandr Natalenko wrote: >>>>>>>> Another addition: it seems to be GlusterFS API library memory >>>>>>>> leak >>>>>>>> because NFS-Ganesha also consumes huge amount of memory while >>>>>>>> doing >>>>>>>> ordinary "find . -type f" via NFSv4.2 on remote client. Here is >>>>>>>> memory >>>>>>>> usage: >>>>>>>> >>>>>>>> ==>>>>>>>> root 5416 34.2 78.5 2047176 1480552 ? Ssl 12:02 117:54 >>>>>>>> /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f >>>>>>>> /etc/ganesha/ganesha.conf -N NIV_EVENT >>>>>>>> ==>>>>>>>> >>>>>>>> 1.4G is too much for simple stat() :(. >>>>>>>> >>>>>>>> Ideas? >>>>>>> >>>>>>> nfs-ganesha also has cache layer which can scale to millions of >>>>>>> entries >>>>>>> depending on the number of files/directories being looked upon. >>>>>>> However >>>>>>> there are parameters to tune it. So either try stat with few >>>>>>> entries or >>>>>>> add below block in nfs-ganesha.conf file, set low limits and >>>>>>> check the >>>>>>> difference. That may help us narrow down how much memory actually >>>>>>> consumed by core nfs-ganesha and gfAPI. >>>>>>> >>>>>>> CACHEINODE { >>>>>>> >>>>>>> Cache_Size(uint32, range 1 to UINT32_MAX, default 32633); # >>>>>>> >>>>>>> cache size >>>>>>> >>>>>>> Entries_HWMark(uint32, range 1 to UINT32_MAX, default >>>>>>> 100000); >>>>>>> >>>>>>> #Max no. >>>>>>> of entries in the cache. >>>>>>> } >>>>>>> >>>>>>> Thanks, >>>>>>> Soumya >>>>>>> >>>>>>>> 24.12.2015 16:32, Oleksandr Natalenko ???????: >>>>>>>>> Still actual issue for 3.7.6. Any suggestions? >>>>>>>>> >>>>>>>>> 24.09.2015 10:14, Oleksandr Natalenko ???????: >>>>>>>>>> In our GlusterFS deployment we've encountered something like >>>>>>>>>> memory >>>>>>>>>> leak in GlusterFS FUSE client. >>>>>>>>>> >>>>>>>>>> We use replicated (?2) GlusterFS volume to store mail >>>>>>>>>> (exim+dovecot, >>>>>>>>>> maildir format). Here is inode stats for both bricks and >>>>>>>>>> mountpoint: >>>>>>>>>> >>>>>>>>>> ==>>>>>>>>>> Brick 1 (Server 1): >>>>>>>>>> >>>>>>>>>> Filesystem Inodes >>>>>>>>>> IUsed >>>>>>>>>> >>>>>>>>>> IFree IUse% Mounted on >>>>>>>>>> >>>>>>>>>> /dev/mapper/vg_vd1_misc-lv08_mail 578768144 >>>>>>>>>> 10954918 >>>>>>>>>> >>>>>>>>>> 567813226 2% /bricks/r6sdLV08_vd1_mail >>>>>>>>>> >>>>>>>>>> Brick 2 (Server 2): >>>>>>>>>> >>>>>>>>>> Filesystem Inodes >>>>>>>>>> IUsed >>>>>>>>>> >>>>>>>>>> IFree IUse% Mounted on >>>>>>>>>> >>>>>>>>>> /dev/mapper/vg_vd0_misc-lv07_mail 578767984 >>>>>>>>>> 10954913 >>>>>>>>>> >>>>>>>>>> 567813071 2% /bricks/r6sdLV07_vd0_mail >>>>>>>>>> >>>>>>>>>> Mountpoint (Server 3): >>>>>>>>>> >>>>>>>>>> Filesystem Inodes IUsed >>>>>>>>>> IFree >>>>>>>>>> IUse% Mounted on >>>>>>>>>> glusterfs.xxx:mail 578767760 10954915 >>>>>>>>>> 567812845 >>>>>>>>>> 2% /var/spool/mail/virtual >>>>>>>>>> ==>>>>>>>>>> >>>>>>>>>> glusterfs.xxx domain has two A records for both Server 1 and >>>>>>>>>> Server 2. >>>>>>>>>> >>>>>>>>>> Here is volume info: >>>>>>>>>> >>>>>>>>>> ==>>>>>>>>>> Volume Name: mail >>>>>>>>>> Type: Replicate >>>>>>>>>> Volume ID: f564e85c-7aa6-4170-9417-1f501aa98cd2 >>>>>>>>>> Status: Started >>>>>>>>>> Number of Bricks: 1 x 2 = 2 >>>>>>>>>> Transport-type: tcp >>>>>>>>>> Bricks: >>>>>>>>>> Brick1: server1.xxx:/bricks/r6sdLV08_vd1_mail/mail >>>>>>>>>> Brick2: server2.xxx:/bricks/r6sdLV07_vd0_mail/mail >>>>>>>>>> Options Reconfigured: >>>>>>>>>> nfs.rpc-auth-allow: 1.2.4.0/24,4.5.6.0/24 >>>>>>>>>> features.cache-invalidation-timeout: 10 >>>>>>>>>> performance.stat-prefetch: off >>>>>>>>>> performance.quick-read: on >>>>>>>>>> performance.read-ahead: off >>>>>>>>>> performance.flush-behind: on >>>>>>>>>> performance.write-behind: on >>>>>>>>>> performance.io-thread-count: 4 >>>>>>>>>> performance.cache-max-file-size: 1048576 >>>>>>>>>> performance.cache-size: 67108864 >>>>>>>>>> performance.readdir-ahead: off >>>>>>>>>> ==>>>>>>>>>> >>>>>>>>>> Soon enough after mounting and exim/dovecot start, glusterfs >>>>>>>>>> client >>>>>>>>>> process begins to consume huge amount of RAM: >>>>>>>>>> >>>>>>>>>> ==>>>>>>>>>> user at server3 ~$ ps aux | grep glusterfs | grep mail >>>>>>>>>> root 28895 14.4 15.0 15510324 14908868 ? Ssl Sep03 >>>>>>>>>> 4310:05 >>>>>>>>>> /usr/sbin/glusterfs --fopen-keep-cache >>>>>>>>>> --direct-io-mode=disable >>>>>>>>>> --volfile-server=glusterfs.xxx --volfile-id=mail >>>>>>>>>> /var/spool/mail/virtual >>>>>>>>>> ==>>>>>>>>>> >>>>>>>>>> That is, ~15 GiB of RAM. >>>>>>>>>> >>>>>>>>>> Also we've tried to use mountpoint withing separate KVM VM >>>>>>>>>> with 2 >>>>>>>>>> or 3 >>>>>>>>>> GiB of RAM, and soon after starting mail daemons got OOM >>>>>>>>>> killer for >>>>>>>>>> glusterfs client process. >>>>>>>>>> >>>>>>>>>> Mounting same share via NFS works just fine. Also, we have >>>>>>>>>> much less >>>>>>>>>> iowait and loadavg on client side with NFS. >>>>>>>>>> >>>>>>>>>> Also, we've tried to change IO threads count and cache size in >>>>>>>>>> order >>>>>>>>>> to limit memory usage with no luck. As you can see, total >>>>>>>>>> cache size >>>>>>>>>> is 4?64==256 MiB (compare to 15 GiB). >>>>>>>>>> >>>>>>>>>> Enabling-disabling stat-prefetch, read-ahead and readdir-ahead >>>>>>>>>> didn't >>>>>>>>>> help as well. >>>>>>>>>> >>>>>>>>>> Here are volume memory stats: >>>>>>>>>> >>>>>>>>>> ==>>>>>>>>>> Memory status for volume : mail >>>>>>>>>> ---------------------------------------------- >>>>>>>>>> Brick : server1.xxx:/bricks/r6sdLV08_vd1_mail/mail >>>>>>>>>> Mallinfo >>>>>>>>>> -------- >>>>>>>>>> Arena : 36859904 >>>>>>>>>> Ordblks : 10357 >>>>>>>>>> Smblks : 519 >>>>>>>>>> Hblks : 21 >>>>>>>>>> Hblkhd : 30515200 >>>>>>>>>> Usmblks : 0 >>>>>>>>>> Fsmblks : 53440 >>>>>>>>>> Uordblks : 18604144 >>>>>>>>>> Fordblks : 18255760 >>>>>>>>>> Keepcost : 114112 >>>>>>>>>> >>>>>>>>>> Mempool Stats >>>>>>>>>> ------------- >>>>>>>>>> Name HotCount ColdCount >>>>>>>>>> PaddedSizeof >>>>>>>>>> AllocCount MaxAlloc Misses Max-StdAlloc >>>>>>>>>> ---- -------- --------- >>>>>>>>>> ------------ >>>>>>>>>> ---------- -------- -------- ------------ >>>>>>>>>> mail-server:fd_t 0 1024 >>>>>>>>>> 108 >>>>>>>>>> 30773120 137 0 0 >>>>>>>>>> mail-server:dentry_t 16110 274 >>>>>>>>>> 84 >>>>>>>>>> 235676148 16384 1106499 1152 >>>>>>>>>> mail-server:inode_t 16363 21 >>>>>>>>>> 156 >>>>>>>>>> 237216876 16384 1876651 1169 >>>>>>>>>> mail-trash:fd_t 0 1024 >>>>>>>>>> 108 >>>>>>>>>> >>>>>>>>>> 0 0 0 0 >>>>>>>>>> >>>>>>>>>> mail-trash:dentry_t 0 32768 >>>>>>>>>> 84 >>>>>>>>>> >>>>>>>>>> 0 0 0 0 >>>>>>>>>> >>>>>>>>>> mail-trash:inode_t 4 32764 >>>>>>>>>> 156 >>>>>>>>>> >>>>>>>>>> 4 4 0 0 >>>>>>>>>> >>>>>>>>>> mail-trash:trash_local_t 0 64 >>>>>>>>>> 8628 >>>>>>>>>> >>>>>>>>>> 0 0 0 0 >>>>>>>>>> >>>>>>>>>> mail-changetimerecorder:gf_ctr_local_t 0 64 >>>>>>>>>> 16540 0 0 0 0 >>>>>>>>>> mail-changelog:rpcsvc_request_t 0 8 >>>>>>>>>> 2828 >>>>>>>>>> >>>>>>>>>> 0 0 0 0 >>>>>>>>>> >>>>>>>>>> mail-changelog:changelog_local_t 0 64 >>>>>>>>>> 116 >>>>>>>>>> >>>>>>>>>> 0 0 0 0 >>>>>>>>>> >>>>>>>>>> mail-bitrot-stub:br_stub_local_t 0 512 >>>>>>>>>> 84 >>>>>>>>>> 79204 4 0 0 >>>>>>>>>> mail-locks:pl_local_t 0 32 >>>>>>>>>> 148 >>>>>>>>>> 6812757 4 0 0 >>>>>>>>>> mail-upcall:upcall_local_t 0 512 >>>>>>>>>> 108 >>>>>>>>>> >>>>>>>>>> 0 0 0 0 >>>>>>>>>> >>>>>>>>>> mail-marker:marker_local_t 0 128 >>>>>>>>>> 332 >>>>>>>>>> 64980 3 0 0 >>>>>>>>>> mail-quota:quota_local_t 0 64 >>>>>>>>>> 476 >>>>>>>>>> >>>>>>>>>> 0 0 0 0 >>>>>>>>>> >>>>>>>>>> mail-server:rpcsvc_request_t 0 512 >>>>>>>>>> 2828 >>>>>>>>>> 45462533 34 0 0 >>>>>>>>>> glusterfs:struct saved_frame 0 8 >>>>>>>>>> 124 >>>>>>>>>> >>>>>>>>>> 2 2 0 0 >>>>>>>>>> >>>>>>>>>> glusterfs:struct rpc_req 0 8 >>>>>>>>>> 588 >>>>>>>>>> >>>>>>>>>> 2 2 0 0 >>>>>>>>>> >>>>>>>>>> glusterfs:rpcsvc_request_t 1 7 >>>>>>>>>> 2828 >>>>>>>>>> >>>>>>>>>> 2 1 0 0 >>>>>>>>>> >>>>>>>>>> glusterfs:log_buf_t 5 251 >>>>>>>>>> 140 >>>>>>>>>> 3452 6 0 0 >>>>>>>>>> glusterfs:data_t 242 16141 >>>>>>>>>> 52 >>>>>>>>>> 480115498 664 0 0 >>>>>>>>>> glusterfs:data_pair_t 230 16153 >>>>>>>>>> 68 >>>>>>>>>> 179483528 275 0 0 >>>>>>>>>> glusterfs:dict_t 23 4073 >>>>>>>>>> 140 >>>>>>>>>> 303751675 627 0 0 >>>>>>>>>> glusterfs:call_stub_t 0 1024 >>>>>>>>>> 3764 >>>>>>>>>> 45290655 34 0 0 >>>>>>>>>> glusterfs:call_stack_t 1 1023 >>>>>>>>>> 1708 >>>>>>>>>> 43598469 34 0 0 >>>>>>>>>> glusterfs:call_frame_t 1 4095 >>>>>>>>>> 172 >>>>>>>>>> 336219655 184 0 0 >>>>>>>>>> ---------------------------------------------- >>>>>>>>>> Brick : server2.xxx:/bricks/r6sdLV07_vd0_mail/mail >>>>>>>>>> Mallinfo >>>>>>>>>> -------- >>>>>>>>>> Arena : 38174720 >>>>>>>>>> Ordblks : 9041 >>>>>>>>>> Smblks : 507 >>>>>>>>>> Hblks : 21 >>>>>>>>>> Hblkhd : 30515200 >>>>>>>>>> Usmblks : 0 >>>>>>>>>> Fsmblks : 51712 >>>>>>>>>> Uordblks : 19415008 >>>>>>>>>> Fordblks : 18759712 >>>>>>>>>> Keepcost : 114848 >>>>>>>>>> >>>>>>>>>> Mempool Stats >>>>>>>>>> ------------- >>>>>>>>>> Name HotCount ColdCount >>>>>>>>>> PaddedSizeof >>>>>>>>>> AllocCount MaxAlloc Misses Max-StdAlloc >>>>>>>>>> ---- -------- --------- >>>>>>>>>> ------------ >>>>>>>>>> ---------- -------- -------- ------------ >>>>>>>>>> mail-server:fd_t 0 1024 >>>>>>>>>> 108 >>>>>>>>>> 2373075 133 0 0 >>>>>>>>>> mail-server:dentry_t 14114 2270 >>>>>>>>>> 84 >>>>>>>>>> 3513654 16384 2300 267 >>>>>>>>>> mail-server:inode_t 16374 10 >>>>>>>>>> 156 >>>>>>>>>> 6766642 16384 194635 1279 >>>>>>>>>> mail-trash:fd_t 0 1024 >>>>>>>>>> 108 >>>>>>>>>> >>>>>>>>>> 0 0 0 0 >>>>>>>>>> >>>>>>>>>> mail-trash:dentry_t 0 32768 >>>>>>>>>> 84 >>>>>>>>>> >>>>>>>>>> 0 0 0 0 >>>>>>>>>> >>>>>>>>>> mail-trash:inode_t 4 32764 >>>>>>>>>> 156 >>>>>>>>>> >>>>>>>>>> 4 4 0 0 >>>>>>>>>> >>>>>>>>>> mail-trash:trash_local_t 0 64 >>>>>>>>>> 8628 >>>>>>>>>> >>>>>>>>>> 0 0 0 0 >>>>>>>>>> >>>>>>>>>> mail-changetimerecorder:gf_ctr_local_t 0 64 >>>>>>>>>> 16540 0 0 0 0 >>>>>>>>>> mail-changelog:rpcsvc_request_t 0 8 >>>>>>>>>> 2828 >>>>>>>>>> >>>>>>>>>> 0 0 0 0 >>>>>>>>>> >>>>>>>>>> mail-changelog:changelog_local_t 0 64 >>>>>>>>>> 116 >>>>>>>>>> >>>>>>>>>> 0 0 0 0 >>>>>>>>>> >>>>>>>>>> mail-bitrot-stub:br_stub_local_t 0 512 >>>>>>>>>> 84 >>>>>>>>>> 71354 4 0 0 >>>>>>>>>> mail-locks:pl_local_t 0 32 >>>>>>>>>> 148 >>>>>>>>>> 8135032 4 0 0 >>>>>>>>>> mail-upcall:upcall_local_t 0 512 >>>>>>>>>> 108 >>>>>>>>>> >>>>>>>>>> 0 0 0 0 >>>>>>>>>> >>>>>>>>>> mail-marker:marker_local_t 0 128 >>>>>>>>>> 332 >>>>>>>>>> 65005 3 0 0 >>>>>>>>>> mail-quota:quota_local_t 0 64 >>>>>>>>>> 476 >>>>>>>>>> >>>>>>>>>> 0 0 0 0 >>>>>>>>>> >>>>>>>>>> mail-server:rpcsvc_request_t 0 512 >>>>>>>>>> 2828 >>>>>>>>>> 12882393 30 0 0 >>>>>>>>>> glusterfs:struct saved_frame 0 8 >>>>>>>>>> 124 >>>>>>>>>> >>>>>>>>>> 2 2 0 0 >>>>>>>>>> >>>>>>>>>> glusterfs:struct rpc_req 0 8 >>>>>>>>>> 588 >>>>>>>>>> >>>>>>>>>> 2 2 0 0 >>>>>>>>>> >>>>>>>>>> glusterfs:rpcsvc_request_t 1 7 >>>>>>>>>> 2828 >>>>>>>>>> >>>>>>>>>> 2 1 0 0 >>>>>>>>>> >>>>>>>>>> glusterfs:log_buf_t 5 251 >>>>>>>>>> 140 >>>>>>>>>> 3443 6 0 0 >>>>>>>>>> glusterfs:data_t 242 16141 >>>>>>>>>> 52 >>>>>>>>>> 138743429 290 0 0 >>>>>>>>>> glusterfs:data_pair_t 230 16153 >>>>>>>>>> 68 >>>>>>>>>> 126649864 270 0 0 >>>>>>>>>> glusterfs:dict_t 23 4073 >>>>>>>>>> 140 >>>>>>>>>> 20356289 63 0 0 >>>>>>>>>> glusterfs:call_stub_t 0 1024 >>>>>>>>>> 3764 >>>>>>>>>> 13678560 31 0 0 >>>>>>>>>> glusterfs:call_stack_t 1 1023 >>>>>>>>>> 1708 >>>>>>>>>> 11011561 30 0 0 >>>>>>>>>> glusterfs:call_frame_t 1 4095 >>>>>>>>>> 172 >>>>>>>>>> 125764190 193 0 0 >>>>>>>>>> ---------------------------------------------- >>>>>>>>>> ==>>>>>>>>>> >>>>>>>>>> So, my questions are: >>>>>>>>>> >>>>>>>>>> 1) what one should do to limit GlusterFS FUSE client memory >>>>>>>>>> usage? >>>>>>>>>> 2) what one should do to prevent client high loadavg because >>>>>>>>>> of high >>>>>>>>>> iowait because of multiple concurrent volume users? >>>>>>>>>> >>>>>>>>>> Server/client OS is CentOS 7.1, GlusterFS server version is >>>>>>>>>> 3.7.3, >>>>>>>>>> GlusterFS client version is 3.7.4. >>>>>>>>>> >>>>>>>>>> Any additional info needed? >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Gluster-users mailing list >>>>>>>> Gluster-users at gluster.org >>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users >> >>