On 12/24/2015 09:17 PM, Oleksandr Natalenko wrote:> Another addition: it seems to be GlusterFS API library memory leak > because NFS-Ganesha also consumes huge amount of memory while doing > ordinary "find . -type f" via NFSv4.2 on remote client. Here is memory > usage: > > ==> root 5416 34.2 78.5 2047176 1480552 ? Ssl 12:02 117:54 > /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f > /etc/ganesha/ganesha.conf -N NIV_EVENT > ==> > 1.4G is too much for simple stat() :(. > > Ideas?nfs-ganesha also has cache layer which can scale to millions of entries depending on the number of files/directories being looked upon. However there are parameters to tune it. So either try stat with few entries or add below block in nfs-ganesha.conf file, set low limits and check the difference. That may help us narrow down how much memory actually consumed by core nfs-ganesha and gfAPI. CACHEINODE { Cache_Size(uint32, range 1 to UINT32_MAX, default 32633); # cache size Entries_HWMark(uint32, range 1 to UINT32_MAX, default 100000); #Max no. of entries in the cache. } Thanks, Soumya> > 24.12.2015 16:32, Oleksandr Natalenko ???????: >> Still actual issue for 3.7.6. Any suggestions? >> >> 24.09.2015 10:14, Oleksandr Natalenko ???????: >>> In our GlusterFS deployment we've encountered something like memory >>> leak in GlusterFS FUSE client. >>> >>> We use replicated (?2) GlusterFS volume to store mail (exim+dovecot, >>> maildir format). Here is inode stats for both bricks and mountpoint: >>> >>> ==>>> Brick 1 (Server 1): >>> >>> Filesystem Inodes IUsed >>> IFree IUse% Mounted on >>> /dev/mapper/vg_vd1_misc-lv08_mail 578768144 10954918 >>> 567813226 2% /bricks/r6sdLV08_vd1_mail >>> >>> Brick 2 (Server 2): >>> >>> Filesystem Inodes IUsed >>> IFree IUse% Mounted on >>> /dev/mapper/vg_vd0_misc-lv07_mail 578767984 10954913 >>> 567813071 2% /bricks/r6sdLV07_vd0_mail >>> >>> Mountpoint (Server 3): >>> >>> Filesystem Inodes IUsed IFree >>> IUse% Mounted on >>> glusterfs.xxx:mail 578767760 10954915 567812845 >>> 2% /var/spool/mail/virtual >>> ==>>> >>> glusterfs.xxx domain has two A records for both Server 1 and Server 2. >>> >>> Here is volume info: >>> >>> ==>>> Volume Name: mail >>> Type: Replicate >>> Volume ID: f564e85c-7aa6-4170-9417-1f501aa98cd2 >>> Status: Started >>> Number of Bricks: 1 x 2 = 2 >>> Transport-type: tcp >>> Bricks: >>> Brick1: server1.xxx:/bricks/r6sdLV08_vd1_mail/mail >>> Brick2: server2.xxx:/bricks/r6sdLV07_vd0_mail/mail >>> Options Reconfigured: >>> nfs.rpc-auth-allow: 1.2.4.0/24,4.5.6.0/24 >>> features.cache-invalidation-timeout: 10 >>> performance.stat-prefetch: off >>> performance.quick-read: on >>> performance.read-ahead: off >>> performance.flush-behind: on >>> performance.write-behind: on >>> performance.io-thread-count: 4 >>> performance.cache-max-file-size: 1048576 >>> performance.cache-size: 67108864 >>> performance.readdir-ahead: off >>> ==>>> >>> Soon enough after mounting and exim/dovecot start, glusterfs client >>> process begins to consume huge amount of RAM: >>> >>> ==>>> user at server3 ~$ ps aux | grep glusterfs | grep mail >>> root 28895 14.4 15.0 15510324 14908868 ? Ssl Sep03 4310:05 >>> /usr/sbin/glusterfs --fopen-keep-cache --direct-io-mode=disable >>> --volfile-server=glusterfs.xxx --volfile-id=mail >>> /var/spool/mail/virtual >>> ==>>> >>> That is, ~15 GiB of RAM. >>> >>> Also we've tried to use mountpoint withing separate KVM VM with 2 or 3 >>> GiB of RAM, and soon after starting mail daemons got OOM killer for >>> glusterfs client process. >>> >>> Mounting same share via NFS works just fine. Also, we have much less >>> iowait and loadavg on client side with NFS. >>> >>> Also, we've tried to change IO threads count and cache size in order >>> to limit memory usage with no luck. As you can see, total cache size >>> is 4?64==256 MiB (compare to 15 GiB). >>> >>> Enabling-disabling stat-prefetch, read-ahead and readdir-ahead didn't >>> help as well. >>> >>> Here are volume memory stats: >>> >>> ==>>> Memory status for volume : mail >>> ---------------------------------------------- >>> Brick : server1.xxx:/bricks/r6sdLV08_vd1_mail/mail >>> Mallinfo >>> -------- >>> Arena : 36859904 >>> Ordblks : 10357 >>> Smblks : 519 >>> Hblks : 21 >>> Hblkhd : 30515200 >>> Usmblks : 0 >>> Fsmblks : 53440 >>> Uordblks : 18604144 >>> Fordblks : 18255760 >>> Keepcost : 114112 >>> >>> Mempool Stats >>> ------------- >>> Name HotCount ColdCount PaddedSizeof >>> AllocCount MaxAlloc Misses Max-StdAlloc >>> ---- -------- --------- ------------ >>> ---------- -------- -------- ------------ >>> mail-server:fd_t 0 1024 108 >>> 30773120 137 0 0 >>> mail-server:dentry_t 16110 274 84 >>> 235676148 16384 1106499 1152 >>> mail-server:inode_t 16363 21 156 >>> 237216876 16384 1876651 1169 >>> mail-trash:fd_t 0 1024 108 >>> 0 0 0 0 >>> mail-trash:dentry_t 0 32768 84 >>> 0 0 0 0 >>> mail-trash:inode_t 4 32764 156 >>> 4 4 0 0 >>> mail-trash:trash_local_t 0 64 8628 >>> 0 0 0 0 >>> mail-changetimerecorder:gf_ctr_local_t 0 64 >>> 16540 0 0 0 0 >>> mail-changelog:rpcsvc_request_t 0 8 2828 >>> 0 0 0 0 >>> mail-changelog:changelog_local_t 0 64 116 >>> 0 0 0 0 >>> mail-bitrot-stub:br_stub_local_t 0 512 84 >>> 79204 4 0 0 >>> mail-locks:pl_local_t 0 32 148 >>> 6812757 4 0 0 >>> mail-upcall:upcall_local_t 0 512 108 >>> 0 0 0 0 >>> mail-marker:marker_local_t 0 128 332 >>> 64980 3 0 0 >>> mail-quota:quota_local_t 0 64 476 >>> 0 0 0 0 >>> mail-server:rpcsvc_request_t 0 512 2828 >>> 45462533 34 0 0 >>> glusterfs:struct saved_frame 0 8 124 >>> 2 2 0 0 >>> glusterfs:struct rpc_req 0 8 588 >>> 2 2 0 0 >>> glusterfs:rpcsvc_request_t 1 7 2828 >>> 2 1 0 0 >>> glusterfs:log_buf_t 5 251 140 >>> 3452 6 0 0 >>> glusterfs:data_t 242 16141 52 >>> 480115498 664 0 0 >>> glusterfs:data_pair_t 230 16153 68 >>> 179483528 275 0 0 >>> glusterfs:dict_t 23 4073 140 >>> 303751675 627 0 0 >>> glusterfs:call_stub_t 0 1024 3764 >>> 45290655 34 0 0 >>> glusterfs:call_stack_t 1 1023 1708 >>> 43598469 34 0 0 >>> glusterfs:call_frame_t 1 4095 172 >>> 336219655 184 0 0 >>> ---------------------------------------------- >>> Brick : server2.xxx:/bricks/r6sdLV07_vd0_mail/mail >>> Mallinfo >>> -------- >>> Arena : 38174720 >>> Ordblks : 9041 >>> Smblks : 507 >>> Hblks : 21 >>> Hblkhd : 30515200 >>> Usmblks : 0 >>> Fsmblks : 51712 >>> Uordblks : 19415008 >>> Fordblks : 18759712 >>> Keepcost : 114848 >>> >>> Mempool Stats >>> ------------- >>> Name HotCount ColdCount PaddedSizeof >>> AllocCount MaxAlloc Misses Max-StdAlloc >>> ---- -------- --------- ------------ >>> ---------- -------- -------- ------------ >>> mail-server:fd_t 0 1024 108 >>> 2373075 133 0 0 >>> mail-server:dentry_t 14114 2270 84 >>> 3513654 16384 2300 267 >>> mail-server:inode_t 16374 10 156 >>> 6766642 16384 194635 1279 >>> mail-trash:fd_t 0 1024 108 >>> 0 0 0 0 >>> mail-trash:dentry_t 0 32768 84 >>> 0 0 0 0 >>> mail-trash:inode_t 4 32764 156 >>> 4 4 0 0 >>> mail-trash:trash_local_t 0 64 8628 >>> 0 0 0 0 >>> mail-changetimerecorder:gf_ctr_local_t 0 64 >>> 16540 0 0 0 0 >>> mail-changelog:rpcsvc_request_t 0 8 2828 >>> 0 0 0 0 >>> mail-changelog:changelog_local_t 0 64 116 >>> 0 0 0 0 >>> mail-bitrot-stub:br_stub_local_t 0 512 84 >>> 71354 4 0 0 >>> mail-locks:pl_local_t 0 32 148 >>> 8135032 4 0 0 >>> mail-upcall:upcall_local_t 0 512 108 >>> 0 0 0 0 >>> mail-marker:marker_local_t 0 128 332 >>> 65005 3 0 0 >>> mail-quota:quota_local_t 0 64 476 >>> 0 0 0 0 >>> mail-server:rpcsvc_request_t 0 512 2828 >>> 12882393 30 0 0 >>> glusterfs:struct saved_frame 0 8 124 >>> 2 2 0 0 >>> glusterfs:struct rpc_req 0 8 588 >>> 2 2 0 0 >>> glusterfs:rpcsvc_request_t 1 7 2828 >>> 2 1 0 0 >>> glusterfs:log_buf_t 5 251 140 >>> 3443 6 0 0 >>> glusterfs:data_t 242 16141 52 >>> 138743429 290 0 0 >>> glusterfs:data_pair_t 230 16153 68 >>> 126649864 270 0 0 >>> glusterfs:dict_t 23 4073 140 >>> 20356289 63 0 0 >>> glusterfs:call_stub_t 0 1024 3764 >>> 13678560 31 0 0 >>> glusterfs:call_stack_t 1 1023 1708 >>> 11011561 30 0 0 >>> glusterfs:call_frame_t 1 4095 172 >>> 125764190 193 0 0 >>> ---------------------------------------------- >>> ==>>> >>> So, my questions are: >>> >>> 1) what one should do to limit GlusterFS FUSE client memory usage? >>> 2) what one should do to prevent client high loadavg because of high >>> iowait because of multiple concurrent volume users? >>> >>> Server/client OS is CentOS 7.1, GlusterFS server version is 3.7.3, >>> GlusterFS client version is 3.7.4. >>> >>> Any additional info needed? > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users
Oleksandr Natalenko
2015-Dec-25 15:26 UTC
[Gluster-users] Memory leak in GlusterFS FUSE client
What units Cache_Size is measured in? Bytes? 25.12.2015 16:58, Soumya Koduri ???????:> On 12/24/2015 09:17 PM, Oleksandr Natalenko wrote: >> Another addition: it seems to be GlusterFS API library memory leak >> because NFS-Ganesha also consumes huge amount of memory while doing >> ordinary "find . -type f" via NFSv4.2 on remote client. Here is memory >> usage: >> >> ==>> root 5416 34.2 78.5 2047176 1480552 ? Ssl 12:02 117:54 >> /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f >> /etc/ganesha/ganesha.conf -N NIV_EVENT >> ==>> >> 1.4G is too much for simple stat() :(. >> >> Ideas? > nfs-ganesha also has cache layer which can scale to millions of > entries depending on the number of files/directories being looked > upon. However there are parameters to tune it. So either try stat with > few entries or add below block in nfs-ganesha.conf file, set low > limits and check the difference. That may help us narrow down how much > memory actually consumed by core nfs-ganesha and gfAPI. > > CACHEINODE { > Cache_Size(uint32, range 1 to UINT32_MAX, default 32633); # cache size > Entries_HWMark(uint32, range 1 to UINT32_MAX, default 100000); #Max > no. of entries in the cache. > } > > Thanks, > Soumya > >> >> 24.12.2015 16:32, Oleksandr Natalenko ???????: >>> Still actual issue for 3.7.6. Any suggestions? >>> >>> 24.09.2015 10:14, Oleksandr Natalenko ???????: >>>> In our GlusterFS deployment we've encountered something like memory >>>> leak in GlusterFS FUSE client. >>>> >>>> We use replicated (?2) GlusterFS volume to store mail (exim+dovecot, >>>> maildir format). Here is inode stats for both bricks and mountpoint: >>>> >>>> ==>>>> Brick 1 (Server 1): >>>> >>>> Filesystem Inodes >>>> IUsed >>>> IFree IUse% Mounted on >>>> /dev/mapper/vg_vd1_misc-lv08_mail 578768144 >>>> 10954918 >>>> 567813226 2% /bricks/r6sdLV08_vd1_mail >>>> >>>> Brick 2 (Server 2): >>>> >>>> Filesystem Inodes >>>> IUsed >>>> IFree IUse% Mounted on >>>> /dev/mapper/vg_vd0_misc-lv07_mail 578767984 >>>> 10954913 >>>> 567813071 2% /bricks/r6sdLV07_vd0_mail >>>> >>>> Mountpoint (Server 3): >>>> >>>> Filesystem Inodes IUsed IFree >>>> IUse% Mounted on >>>> glusterfs.xxx:mail 578767760 10954915 567812845 >>>> 2% /var/spool/mail/virtual >>>> ==>>>> >>>> glusterfs.xxx domain has two A records for both Server 1 and Server >>>> 2. >>>> >>>> Here is volume info: >>>> >>>> ==>>>> Volume Name: mail >>>> Type: Replicate >>>> Volume ID: f564e85c-7aa6-4170-9417-1f501aa98cd2 >>>> Status: Started >>>> Number of Bricks: 1 x 2 = 2 >>>> Transport-type: tcp >>>> Bricks: >>>> Brick1: server1.xxx:/bricks/r6sdLV08_vd1_mail/mail >>>> Brick2: server2.xxx:/bricks/r6sdLV07_vd0_mail/mail >>>> Options Reconfigured: >>>> nfs.rpc-auth-allow: 1.2.4.0/24,4.5.6.0/24 >>>> features.cache-invalidation-timeout: 10 >>>> performance.stat-prefetch: off >>>> performance.quick-read: on >>>> performance.read-ahead: off >>>> performance.flush-behind: on >>>> performance.write-behind: on >>>> performance.io-thread-count: 4 >>>> performance.cache-max-file-size: 1048576 >>>> performance.cache-size: 67108864 >>>> performance.readdir-ahead: off >>>> ==>>>> >>>> Soon enough after mounting and exim/dovecot start, glusterfs client >>>> process begins to consume huge amount of RAM: >>>> >>>> ==>>>> user at server3 ~$ ps aux | grep glusterfs | grep mail >>>> root 28895 14.4 15.0 15510324 14908868 ? Ssl Sep03 4310:05 >>>> /usr/sbin/glusterfs --fopen-keep-cache --direct-io-mode=disable >>>> --volfile-server=glusterfs.xxx --volfile-id=mail >>>> /var/spool/mail/virtual >>>> ==>>>> >>>> That is, ~15 GiB of RAM. >>>> >>>> Also we've tried to use mountpoint withing separate KVM VM with 2 or >>>> 3 >>>> GiB of RAM, and soon after starting mail daemons got OOM killer for >>>> glusterfs client process. >>>> >>>> Mounting same share via NFS works just fine. Also, we have much less >>>> iowait and loadavg on client side with NFS. >>>> >>>> Also, we've tried to change IO threads count and cache size in order >>>> to limit memory usage with no luck. As you can see, total cache size >>>> is 4?64==256 MiB (compare to 15 GiB). >>>> >>>> Enabling-disabling stat-prefetch, read-ahead and readdir-ahead >>>> didn't >>>> help as well. >>>> >>>> Here are volume memory stats: >>>> >>>> ==>>>> Memory status for volume : mail >>>> ---------------------------------------------- >>>> Brick : server1.xxx:/bricks/r6sdLV08_vd1_mail/mail >>>> Mallinfo >>>> -------- >>>> Arena : 36859904 >>>> Ordblks : 10357 >>>> Smblks : 519 >>>> Hblks : 21 >>>> Hblkhd : 30515200 >>>> Usmblks : 0 >>>> Fsmblks : 53440 >>>> Uordblks : 18604144 >>>> Fordblks : 18255760 >>>> Keepcost : 114112 >>>> >>>> Mempool Stats >>>> ------------- >>>> Name HotCount ColdCount PaddedSizeof >>>> AllocCount MaxAlloc Misses Max-StdAlloc >>>> ---- -------- --------- ------------ >>>> ---------- -------- -------- ------------ >>>> mail-server:fd_t 0 1024 108 >>>> 30773120 137 0 0 >>>> mail-server:dentry_t 16110 274 84 >>>> 235676148 16384 1106499 1152 >>>> mail-server:inode_t 16363 21 156 >>>> 237216876 16384 1876651 1169 >>>> mail-trash:fd_t 0 1024 108 >>>> 0 0 0 0 >>>> mail-trash:dentry_t 0 32768 84 >>>> 0 0 0 0 >>>> mail-trash:inode_t 4 32764 156 >>>> 4 4 0 0 >>>> mail-trash:trash_local_t 0 64 8628 >>>> 0 0 0 0 >>>> mail-changetimerecorder:gf_ctr_local_t 0 64 >>>> 16540 0 0 0 0 >>>> mail-changelog:rpcsvc_request_t 0 8 2828 >>>> 0 0 0 0 >>>> mail-changelog:changelog_local_t 0 64 116 >>>> 0 0 0 0 >>>> mail-bitrot-stub:br_stub_local_t 0 512 84 >>>> 79204 4 0 0 >>>> mail-locks:pl_local_t 0 32 148 >>>> 6812757 4 0 0 >>>> mail-upcall:upcall_local_t 0 512 108 >>>> 0 0 0 0 >>>> mail-marker:marker_local_t 0 128 332 >>>> 64980 3 0 0 >>>> mail-quota:quota_local_t 0 64 476 >>>> 0 0 0 0 >>>> mail-server:rpcsvc_request_t 0 512 2828 >>>> 45462533 34 0 0 >>>> glusterfs:struct saved_frame 0 8 124 >>>> 2 2 0 0 >>>> glusterfs:struct rpc_req 0 8 588 >>>> 2 2 0 0 >>>> glusterfs:rpcsvc_request_t 1 7 2828 >>>> 2 1 0 0 >>>> glusterfs:log_buf_t 5 251 140 >>>> 3452 6 0 0 >>>> glusterfs:data_t 242 16141 52 >>>> 480115498 664 0 0 >>>> glusterfs:data_pair_t 230 16153 68 >>>> 179483528 275 0 0 >>>> glusterfs:dict_t 23 4073 140 >>>> 303751675 627 0 0 >>>> glusterfs:call_stub_t 0 1024 3764 >>>> 45290655 34 0 0 >>>> glusterfs:call_stack_t 1 1023 1708 >>>> 43598469 34 0 0 >>>> glusterfs:call_frame_t 1 4095 172 >>>> 336219655 184 0 0 >>>> ---------------------------------------------- >>>> Brick : server2.xxx:/bricks/r6sdLV07_vd0_mail/mail >>>> Mallinfo >>>> -------- >>>> Arena : 38174720 >>>> Ordblks : 9041 >>>> Smblks : 507 >>>> Hblks : 21 >>>> Hblkhd : 30515200 >>>> Usmblks : 0 >>>> Fsmblks : 51712 >>>> Uordblks : 19415008 >>>> Fordblks : 18759712 >>>> Keepcost : 114848 >>>> >>>> Mempool Stats >>>> ------------- >>>> Name HotCount ColdCount PaddedSizeof >>>> AllocCount MaxAlloc Misses Max-StdAlloc >>>> ---- -------- --------- ------------ >>>> ---------- -------- -------- ------------ >>>> mail-server:fd_t 0 1024 108 >>>> 2373075 133 0 0 >>>> mail-server:dentry_t 14114 2270 84 >>>> 3513654 16384 2300 267 >>>> mail-server:inode_t 16374 10 156 >>>> 6766642 16384 194635 1279 >>>> mail-trash:fd_t 0 1024 108 >>>> 0 0 0 0 >>>> mail-trash:dentry_t 0 32768 84 >>>> 0 0 0 0 >>>> mail-trash:inode_t 4 32764 156 >>>> 4 4 0 0 >>>> mail-trash:trash_local_t 0 64 8628 >>>> 0 0 0 0 >>>> mail-changetimerecorder:gf_ctr_local_t 0 64 >>>> 16540 0 0 0 0 >>>> mail-changelog:rpcsvc_request_t 0 8 2828 >>>> 0 0 0 0 >>>> mail-changelog:changelog_local_t 0 64 116 >>>> 0 0 0 0 >>>> mail-bitrot-stub:br_stub_local_t 0 512 84 >>>> 71354 4 0 0 >>>> mail-locks:pl_local_t 0 32 148 >>>> 8135032 4 0 0 >>>> mail-upcall:upcall_local_t 0 512 108 >>>> 0 0 0 0 >>>> mail-marker:marker_local_t 0 128 332 >>>> 65005 3 0 0 >>>> mail-quota:quota_local_t 0 64 476 >>>> 0 0 0 0 >>>> mail-server:rpcsvc_request_t 0 512 2828 >>>> 12882393 30 0 0 >>>> glusterfs:struct saved_frame 0 8 124 >>>> 2 2 0 0 >>>> glusterfs:struct rpc_req 0 8 588 >>>> 2 2 0 0 >>>> glusterfs:rpcsvc_request_t 1 7 2828 >>>> 2 1 0 0 >>>> glusterfs:log_buf_t 5 251 140 >>>> 3443 6 0 0 >>>> glusterfs:data_t 242 16141 52 >>>> 138743429 290 0 0 >>>> glusterfs:data_pair_t 230 16153 68 >>>> 126649864 270 0 0 >>>> glusterfs:dict_t 23 4073 140 >>>> 20356289 63 0 0 >>>> glusterfs:call_stub_t 0 1024 3764 >>>> 13678560 31 0 0 >>>> glusterfs:call_stack_t 1 1023 1708 >>>> 11011561 30 0 0 >>>> glusterfs:call_frame_t 1 4095 172 >>>> 125764190 193 0 0 >>>> ---------------------------------------------- >>>> ==>>>> >>>> So, my questions are: >>>> >>>> 1) what one should do to limit GlusterFS FUSE client memory usage? >>>> 2) what one should do to prevent client high loadavg because of high >>>> iowait because of multiple concurrent volume users? >>>> >>>> Server/client OS is CentOS 7.1, GlusterFS server version is 3.7.3, >>>> GlusterFS client version is 3.7.4. >>>> >>>> Any additional info needed? >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://www.gluster.org/mailman/listinfo/gluster-users
Oleksandr Natalenko
2015-Dec-25 18:04 UTC
[Gluster-users] Memory leak in GlusterFS FUSE client
1. test with Cache_Size = 256 and Entries_HWMark = 4096 Before find . -type f: root 3120 0.6 11.0 879120 208408 ? Ssl 17:39 0:00 /usr/bin/ ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT After: root 3120 11.4 24.3 1170076 458168 ? Ssl 17:39 13:39 /usr/bin/ ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT ~250M leak. 2. test with default values (after ganesha restart) Before: root 24937 1.3 10.4 875016 197808 ? Ssl 19:39 0:00 /usr/bin/ ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT After: root 24937 3.5 18.9 1022544 356340 ? Ssl 19:39 0:40 /usr/bin/ ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT ~159M leak. No reasonable correlation detected. Second test was finished much faster than first (I guess, server-side GlusterFS cache or server kernel page cache is the cause). There are ~1.8M files on this test volume. On ????????, 25 ?????? 2015 ?. 20:28:13 EET Soumya Koduri wrote:> On 12/24/2015 09:17 PM, Oleksandr Natalenko wrote: > > Another addition: it seems to be GlusterFS API library memory leak > > because NFS-Ganesha also consumes huge amount of memory while doing > > ordinary "find . -type f" via NFSv4.2 on remote client. Here is memory > > usage: > > > > ==> > root 5416 34.2 78.5 2047176 1480552 ? Ssl 12:02 117:54 > > /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f > > /etc/ganesha/ganesha.conf -N NIV_EVENT > > ==> > > > 1.4G is too much for simple stat() :(. > > > > Ideas? > > nfs-ganesha also has cache layer which can scale to millions of entries > depending on the number of files/directories being looked upon. However > there are parameters to tune it. So either try stat with few entries or > add below block in nfs-ganesha.conf file, set low limits and check the > difference. That may help us narrow down how much memory actually > consumed by core nfs-ganesha and gfAPI. > > CACHEINODE { > Cache_Size(uint32, range 1 to UINT32_MAX, default 32633); # cache size > Entries_HWMark(uint32, range 1 to UINT32_MAX, default 100000); #Max no. > of entries in the cache. > } > > Thanks, > Soumya > > > 24.12.2015 16:32, Oleksandr Natalenko ???????: > >> Still actual issue for 3.7.6. Any suggestions? > >> > >> 24.09.2015 10:14, Oleksandr Natalenko ???????: > >>> In our GlusterFS deployment we've encountered something like memory > >>> leak in GlusterFS FUSE client. > >>> > >>> We use replicated (?2) GlusterFS volume to store mail (exim+dovecot, > >>> maildir format). Here is inode stats for both bricks and mountpoint: > >>> > >>> ==> >>> Brick 1 (Server 1): > >>> > >>> Filesystem Inodes IUsed > >>> > >>> IFree IUse% Mounted on > >>> > >>> /dev/mapper/vg_vd1_misc-lv08_mail 578768144 10954918 > >>> > >>> 567813226 2% /bricks/r6sdLV08_vd1_mail > >>> > >>> Brick 2 (Server 2): > >>> > >>> Filesystem Inodes IUsed > >>> > >>> IFree IUse% Mounted on > >>> > >>> /dev/mapper/vg_vd0_misc-lv07_mail 578767984 10954913 > >>> > >>> 567813071 2% /bricks/r6sdLV07_vd0_mail > >>> > >>> Mountpoint (Server 3): > >>> > >>> Filesystem Inodes IUsed IFree > >>> IUse% Mounted on > >>> glusterfs.xxx:mail 578767760 10954915 567812845 > >>> 2% /var/spool/mail/virtual > >>> ==> >>> > >>> glusterfs.xxx domain has two A records for both Server 1 and Server 2. > >>> > >>> Here is volume info: > >>> > >>> ==> >>> Volume Name: mail > >>> Type: Replicate > >>> Volume ID: f564e85c-7aa6-4170-9417-1f501aa98cd2 > >>> Status: Started > >>> Number of Bricks: 1 x 2 = 2 > >>> Transport-type: tcp > >>> Bricks: > >>> Brick1: server1.xxx:/bricks/r6sdLV08_vd1_mail/mail > >>> Brick2: server2.xxx:/bricks/r6sdLV07_vd0_mail/mail > >>> Options Reconfigured: > >>> nfs.rpc-auth-allow: 1.2.4.0/24,4.5.6.0/24 > >>> features.cache-invalidation-timeout: 10 > >>> performance.stat-prefetch: off > >>> performance.quick-read: on > >>> performance.read-ahead: off > >>> performance.flush-behind: on > >>> performance.write-behind: on > >>> performance.io-thread-count: 4 > >>> performance.cache-max-file-size: 1048576 > >>> performance.cache-size: 67108864 > >>> performance.readdir-ahead: off > >>> ==> >>> > >>> Soon enough after mounting and exim/dovecot start, glusterfs client > >>> process begins to consume huge amount of RAM: > >>> > >>> ==> >>> user at server3 ~$ ps aux | grep glusterfs | grep mail > >>> root 28895 14.4 15.0 15510324 14908868 ? Ssl Sep03 4310:05 > >>> /usr/sbin/glusterfs --fopen-keep-cache --direct-io-mode=disable > >>> --volfile-server=glusterfs.xxx --volfile-id=mail > >>> /var/spool/mail/virtual > >>> ==> >>> > >>> That is, ~15 GiB of RAM. > >>> > >>> Also we've tried to use mountpoint withing separate KVM VM with 2 or 3 > >>> GiB of RAM, and soon after starting mail daemons got OOM killer for > >>> glusterfs client process. > >>> > >>> Mounting same share via NFS works just fine. Also, we have much less > >>> iowait and loadavg on client side with NFS. > >>> > >>> Also, we've tried to change IO threads count and cache size in order > >>> to limit memory usage with no luck. As you can see, total cache size > >>> is 4?64==256 MiB (compare to 15 GiB). > >>> > >>> Enabling-disabling stat-prefetch, read-ahead and readdir-ahead didn't > >>> help as well. > >>> > >>> Here are volume memory stats: > >>> > >>> ==> >>> Memory status for volume : mail > >>> ---------------------------------------------- > >>> Brick : server1.xxx:/bricks/r6sdLV08_vd1_mail/mail > >>> Mallinfo > >>> -------- > >>> Arena : 36859904 > >>> Ordblks : 10357 > >>> Smblks : 519 > >>> Hblks : 21 > >>> Hblkhd : 30515200 > >>> Usmblks : 0 > >>> Fsmblks : 53440 > >>> Uordblks : 18604144 > >>> Fordblks : 18255760 > >>> Keepcost : 114112 > >>> > >>> Mempool Stats > >>> ------------- > >>> Name HotCount ColdCount PaddedSizeof > >>> AllocCount MaxAlloc Misses Max-StdAlloc > >>> ---- -------- --------- ------------ > >>> ---------- -------- -------- ------------ > >>> mail-server:fd_t 0 1024 108 > >>> 30773120 137 0 0 > >>> mail-server:dentry_t 16110 274 84 > >>> 235676148 16384 1106499 1152 > >>> mail-server:inode_t 16363 21 156 > >>> 237216876 16384 1876651 1169 > >>> mail-trash:fd_t 0 1024 108 > >>> > >>> 0 0 0 0 > >>> > >>> mail-trash:dentry_t 0 32768 84 > >>> > >>> 0 0 0 0 > >>> > >>> mail-trash:inode_t 4 32764 156 > >>> > >>> 4 4 0 0 > >>> > >>> mail-trash:trash_local_t 0 64 8628 > >>> > >>> 0 0 0 0 > >>> > >>> mail-changetimerecorder:gf_ctr_local_t 0 64 > >>> 16540 0 0 0 0 > >>> mail-changelog:rpcsvc_request_t 0 8 2828 > >>> > >>> 0 0 0 0 > >>> > >>> mail-changelog:changelog_local_t 0 64 116 > >>> > >>> 0 0 0 0 > >>> > >>> mail-bitrot-stub:br_stub_local_t 0 512 84 > >>> 79204 4 0 0 > >>> mail-locks:pl_local_t 0 32 148 > >>> 6812757 4 0 0 > >>> mail-upcall:upcall_local_t 0 512 108 > >>> > >>> 0 0 0 0 > >>> > >>> mail-marker:marker_local_t 0 128 332 > >>> 64980 3 0 0 > >>> mail-quota:quota_local_t 0 64 476 > >>> > >>> 0 0 0 0 > >>> > >>> mail-server:rpcsvc_request_t 0 512 2828 > >>> 45462533 34 0 0 > >>> glusterfs:struct saved_frame 0 8 124 > >>> > >>> 2 2 0 0 > >>> > >>> glusterfs:struct rpc_req 0 8 588 > >>> > >>> 2 2 0 0 > >>> > >>> glusterfs:rpcsvc_request_t 1 7 2828 > >>> > >>> 2 1 0 0 > >>> > >>> glusterfs:log_buf_t 5 251 140 > >>> 3452 6 0 0 > >>> glusterfs:data_t 242 16141 52 > >>> 480115498 664 0 0 > >>> glusterfs:data_pair_t 230 16153 68 > >>> 179483528 275 0 0 > >>> glusterfs:dict_t 23 4073 140 > >>> 303751675 627 0 0 > >>> glusterfs:call_stub_t 0 1024 3764 > >>> 45290655 34 0 0 > >>> glusterfs:call_stack_t 1 1023 1708 > >>> 43598469 34 0 0 > >>> glusterfs:call_frame_t 1 4095 172 > >>> 336219655 184 0 0 > >>> ---------------------------------------------- > >>> Brick : server2.xxx:/bricks/r6sdLV07_vd0_mail/mail > >>> Mallinfo > >>> -------- > >>> Arena : 38174720 > >>> Ordblks : 9041 > >>> Smblks : 507 > >>> Hblks : 21 > >>> Hblkhd : 30515200 > >>> Usmblks : 0 > >>> Fsmblks : 51712 > >>> Uordblks : 19415008 > >>> Fordblks : 18759712 > >>> Keepcost : 114848 > >>> > >>> Mempool Stats > >>> ------------- > >>> Name HotCount ColdCount PaddedSizeof > >>> AllocCount MaxAlloc Misses Max-StdAlloc > >>> ---- -------- --------- ------------ > >>> ---------- -------- -------- ------------ > >>> mail-server:fd_t 0 1024 108 > >>> 2373075 133 0 0 > >>> mail-server:dentry_t 14114 2270 84 > >>> 3513654 16384 2300 267 > >>> mail-server:inode_t 16374 10 156 > >>> 6766642 16384 194635 1279 > >>> mail-trash:fd_t 0 1024 108 > >>> > >>> 0 0 0 0 > >>> > >>> mail-trash:dentry_t 0 32768 84 > >>> > >>> 0 0 0 0 > >>> > >>> mail-trash:inode_t 4 32764 156 > >>> > >>> 4 4 0 0 > >>> > >>> mail-trash:trash_local_t 0 64 8628 > >>> > >>> 0 0 0 0 > >>> > >>> mail-changetimerecorder:gf_ctr_local_t 0 64 > >>> 16540 0 0 0 0 > >>> mail-changelog:rpcsvc_request_t 0 8 2828 > >>> > >>> 0 0 0 0 > >>> > >>> mail-changelog:changelog_local_t 0 64 116 > >>> > >>> 0 0 0 0 > >>> > >>> mail-bitrot-stub:br_stub_local_t 0 512 84 > >>> 71354 4 0 0 > >>> mail-locks:pl_local_t 0 32 148 > >>> 8135032 4 0 0 > >>> mail-upcall:upcall_local_t 0 512 108 > >>> > >>> 0 0 0 0 > >>> > >>> mail-marker:marker_local_t 0 128 332 > >>> 65005 3 0 0 > >>> mail-quota:quota_local_t 0 64 476 > >>> > >>> 0 0 0 0 > >>> > >>> mail-server:rpcsvc_request_t 0 512 2828 > >>> 12882393 30 0 0 > >>> glusterfs:struct saved_frame 0 8 124 > >>> > >>> 2 2 0 0 > >>> > >>> glusterfs:struct rpc_req 0 8 588 > >>> > >>> 2 2 0 0 > >>> > >>> glusterfs:rpcsvc_request_t 1 7 2828 > >>> > >>> 2 1 0 0 > >>> > >>> glusterfs:log_buf_t 5 251 140 > >>> 3443 6 0 0 > >>> glusterfs:data_t 242 16141 52 > >>> 138743429 290 0 0 > >>> glusterfs:data_pair_t 230 16153 68 > >>> 126649864 270 0 0 > >>> glusterfs:dict_t 23 4073 140 > >>> 20356289 63 0 0 > >>> glusterfs:call_stub_t 0 1024 3764 > >>> 13678560 31 0 0 > >>> glusterfs:call_stack_t 1 1023 1708 > >>> 11011561 30 0 0 > >>> glusterfs:call_frame_t 1 4095 172 > >>> 125764190 193 0 0 > >>> ---------------------------------------------- > >>> ==> >>> > >>> So, my questions are: > >>> > >>> 1) what one should do to limit GlusterFS FUSE client memory usage? > >>> 2) what one should do to prevent client high loadavg because of high > >>> iowait because of multiple concurrent volume users? > >>> > >>> Server/client OS is CentOS 7.1, GlusterFS server version is 3.7.3, > >>> GlusterFS client version is 3.7.4. > >>> > >>> Any additional info needed? > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > http://www.gluster.org/mailman/listinfo/gluster-users
Oleksandr Natalenko
2015-Dec-25 23:15 UTC
[Gluster-users] Memory leak in GlusterFS FUSE client
Also, here is valgrind output with our custom tool, that does GlusterFS volume traversing (with simple stats) just like find tool. In this case NFS-Ganesha is not used. https://gist.github.com/e4602a50d3c98f7a2766 One may see GlusterFS-related leaks here as well. On ????????, 25 ?????? 2015 ?. 20:28:13 EET Soumya Koduri wrote:> On 12/24/2015 09:17 PM, Oleksandr Natalenko wrote: > > Another addition: it seems to be GlusterFS API library memory leak > > because NFS-Ganesha also consumes huge amount of memory while doing > > ordinary "find . -type f" via NFSv4.2 on remote client. Here is memory > > usage: > > > > ==> > root 5416 34.2 78.5 2047176 1480552 ? Ssl 12:02 117:54 > > /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f > > /etc/ganesha/ganesha.conf -N NIV_EVENT > > ==> > > > 1.4G is too much for simple stat() :(. > > > > Ideas? > > nfs-ganesha also has cache layer which can scale to millions of entries > depending on the number of files/directories being looked upon. However > there are parameters to tune it. So either try stat with few entries or > add below block in nfs-ganesha.conf file, set low limits and check the > difference. That may help us narrow down how much memory actually > consumed by core nfs-ganesha and gfAPI. > > CACHEINODE { > Cache_Size(uint32, range 1 to UINT32_MAX, default 32633); # cache size > Entries_HWMark(uint32, range 1 to UINT32_MAX, default 100000); #Max no. > of entries in the cache. > } > > Thanks, > Soumya > > > 24.12.2015 16:32, Oleksandr Natalenko ???????: > >> Still actual issue for 3.7.6. Any suggestions? > >> > >> 24.09.2015 10:14, Oleksandr Natalenko ???????: > >>> In our GlusterFS deployment we've encountered something like memory > >>> leak in GlusterFS FUSE client. > >>> > >>> We use replicated (?2) GlusterFS volume to store mail (exim+dovecot, > >>> maildir format). Here is inode stats for both bricks and mountpoint: > >>> > >>> ==> >>> Brick 1 (Server 1): > >>> > >>> Filesystem Inodes IUsed > >>> > >>> IFree IUse% Mounted on > >>> > >>> /dev/mapper/vg_vd1_misc-lv08_mail 578768144 10954918 > >>> > >>> 567813226 2% /bricks/r6sdLV08_vd1_mail > >>> > >>> Brick 2 (Server 2): > >>> > >>> Filesystem Inodes IUsed > >>> > >>> IFree IUse% Mounted on > >>> > >>> /dev/mapper/vg_vd0_misc-lv07_mail 578767984 10954913 > >>> > >>> 567813071 2% /bricks/r6sdLV07_vd0_mail > >>> > >>> Mountpoint (Server 3): > >>> > >>> Filesystem Inodes IUsed IFree > >>> IUse% Mounted on > >>> glusterfs.xxx:mail 578767760 10954915 567812845 > >>> 2% /var/spool/mail/virtual > >>> ==> >>> > >>> glusterfs.xxx domain has two A records for both Server 1 and Server 2. > >>> > >>> Here is volume info: > >>> > >>> ==> >>> Volume Name: mail > >>> Type: Replicate > >>> Volume ID: f564e85c-7aa6-4170-9417-1f501aa98cd2 > >>> Status: Started > >>> Number of Bricks: 1 x 2 = 2 > >>> Transport-type: tcp > >>> Bricks: > >>> Brick1: server1.xxx:/bricks/r6sdLV08_vd1_mail/mail > >>> Brick2: server2.xxx:/bricks/r6sdLV07_vd0_mail/mail > >>> Options Reconfigured: > >>> nfs.rpc-auth-allow: 1.2.4.0/24,4.5.6.0/24 > >>> features.cache-invalidation-timeout: 10 > >>> performance.stat-prefetch: off > >>> performance.quick-read: on > >>> performance.read-ahead: off > >>> performance.flush-behind: on > >>> performance.write-behind: on > >>> performance.io-thread-count: 4 > >>> performance.cache-max-file-size: 1048576 > >>> performance.cache-size: 67108864 > >>> performance.readdir-ahead: off > >>> ==> >>> > >>> Soon enough after mounting and exim/dovecot start, glusterfs client > >>> process begins to consume huge amount of RAM: > >>> > >>> ==> >>> user at server3 ~$ ps aux | grep glusterfs | grep mail > >>> root 28895 14.4 15.0 15510324 14908868 ? Ssl Sep03 4310:05 > >>> /usr/sbin/glusterfs --fopen-keep-cache --direct-io-mode=disable > >>> --volfile-server=glusterfs.xxx --volfile-id=mail > >>> /var/spool/mail/virtual > >>> ==> >>> > >>> That is, ~15 GiB of RAM. > >>> > >>> Also we've tried to use mountpoint withing separate KVM VM with 2 or 3 > >>> GiB of RAM, and soon after starting mail daemons got OOM killer for > >>> glusterfs client process. > >>> > >>> Mounting same share via NFS works just fine. Also, we have much less > >>> iowait and loadavg on client side with NFS. > >>> > >>> Also, we've tried to change IO threads count and cache size in order > >>> to limit memory usage with no luck. As you can see, total cache size > >>> is 4?64==256 MiB (compare to 15 GiB). > >>> > >>> Enabling-disabling stat-prefetch, read-ahead and readdir-ahead didn't > >>> help as well. > >>> > >>> Here are volume memory stats: > >>> > >>> ==> >>> Memory status for volume : mail > >>> ---------------------------------------------- > >>> Brick : server1.xxx:/bricks/r6sdLV08_vd1_mail/mail > >>> Mallinfo > >>> -------- > >>> Arena : 36859904 > >>> Ordblks : 10357 > >>> Smblks : 519 > >>> Hblks : 21 > >>> Hblkhd : 30515200 > >>> Usmblks : 0 > >>> Fsmblks : 53440 > >>> Uordblks : 18604144 > >>> Fordblks : 18255760 > >>> Keepcost : 114112 > >>> > >>> Mempool Stats > >>> ------------- > >>> Name HotCount ColdCount PaddedSizeof > >>> AllocCount MaxAlloc Misses Max-StdAlloc > >>> ---- -------- --------- ------------ > >>> ---------- -------- -------- ------------ > >>> mail-server:fd_t 0 1024 108 > >>> 30773120 137 0 0 > >>> mail-server:dentry_t 16110 274 84 > >>> 235676148 16384 1106499 1152 > >>> mail-server:inode_t 16363 21 156 > >>> 237216876 16384 1876651 1169 > >>> mail-trash:fd_t 0 1024 108 > >>> > >>> 0 0 0 0 > >>> > >>> mail-trash:dentry_t 0 32768 84 > >>> > >>> 0 0 0 0 > >>> > >>> mail-trash:inode_t 4 32764 156 > >>> > >>> 4 4 0 0 > >>> > >>> mail-trash:trash_local_t 0 64 8628 > >>> > >>> 0 0 0 0 > >>> > >>> mail-changetimerecorder:gf_ctr_local_t 0 64 > >>> 16540 0 0 0 0 > >>> mail-changelog:rpcsvc_request_t 0 8 2828 > >>> > >>> 0 0 0 0 > >>> > >>> mail-changelog:changelog_local_t 0 64 116 > >>> > >>> 0 0 0 0 > >>> > >>> mail-bitrot-stub:br_stub_local_t 0 512 84 > >>> 79204 4 0 0 > >>> mail-locks:pl_local_t 0 32 148 > >>> 6812757 4 0 0 > >>> mail-upcall:upcall_local_t 0 512 108 > >>> > >>> 0 0 0 0 > >>> > >>> mail-marker:marker_local_t 0 128 332 > >>> 64980 3 0 0 > >>> mail-quota:quota_local_t 0 64 476 > >>> > >>> 0 0 0 0 > >>> > >>> mail-server:rpcsvc_request_t 0 512 2828 > >>> 45462533 34 0 0 > >>> glusterfs:struct saved_frame 0 8 124 > >>> > >>> 2 2 0 0 > >>> > >>> glusterfs:struct rpc_req 0 8 588 > >>> > >>> 2 2 0 0 > >>> > >>> glusterfs:rpcsvc_request_t 1 7 2828 > >>> > >>> 2 1 0 0 > >>> > >>> glusterfs:log_buf_t 5 251 140 > >>> 3452 6 0 0 > >>> glusterfs:data_t 242 16141 52 > >>> 480115498 664 0 0 > >>> glusterfs:data_pair_t 230 16153 68 > >>> 179483528 275 0 0 > >>> glusterfs:dict_t 23 4073 140 > >>> 303751675 627 0 0 > >>> glusterfs:call_stub_t 0 1024 3764 > >>> 45290655 34 0 0 > >>> glusterfs:call_stack_t 1 1023 1708 > >>> 43598469 34 0 0 > >>> glusterfs:call_frame_t 1 4095 172 > >>> 336219655 184 0 0 > >>> ---------------------------------------------- > >>> Brick : server2.xxx:/bricks/r6sdLV07_vd0_mail/mail > >>> Mallinfo > >>> -------- > >>> Arena : 38174720 > >>> Ordblks : 9041 > >>> Smblks : 507 > >>> Hblks : 21 > >>> Hblkhd : 30515200 > >>> Usmblks : 0 > >>> Fsmblks : 51712 > >>> Uordblks : 19415008 > >>> Fordblks : 18759712 > >>> Keepcost : 114848 > >>> > >>> Mempool Stats > >>> ------------- > >>> Name HotCount ColdCount PaddedSizeof > >>> AllocCount MaxAlloc Misses Max-StdAlloc > >>> ---- -------- --------- ------------ > >>> ---------- -------- -------- ------------ > >>> mail-server:fd_t 0 1024 108 > >>> 2373075 133 0 0 > >>> mail-server:dentry_t 14114 2270 84 > >>> 3513654 16384 2300 267 > >>> mail-server:inode_t 16374 10 156 > >>> 6766642 16384 194635 1279 > >>> mail-trash:fd_t 0 1024 108 > >>> > >>> 0 0 0 0 > >>> > >>> mail-trash:dentry_t 0 32768 84 > >>> > >>> 0 0 0 0 > >>> > >>> mail-trash:inode_t 4 32764 156 > >>> > >>> 4 4 0 0 > >>> > >>> mail-trash:trash_local_t 0 64 8628 > >>> > >>> 0 0 0 0 > >>> > >>> mail-changetimerecorder:gf_ctr_local_t 0 64 > >>> 16540 0 0 0 0 > >>> mail-changelog:rpcsvc_request_t 0 8 2828 > >>> > >>> 0 0 0 0 > >>> > >>> mail-changelog:changelog_local_t 0 64 116 > >>> > >>> 0 0 0 0 > >>> > >>> mail-bitrot-stub:br_stub_local_t 0 512 84 > >>> 71354 4 0 0 > >>> mail-locks:pl_local_t 0 32 148 > >>> 8135032 4 0 0 > >>> mail-upcall:upcall_local_t 0 512 108 > >>> > >>> 0 0 0 0 > >>> > >>> mail-marker:marker_local_t 0 128 332 > >>> 65005 3 0 0 > >>> mail-quota:quota_local_t 0 64 476 > >>> > >>> 0 0 0 0 > >>> > >>> mail-server:rpcsvc_request_t 0 512 2828 > >>> 12882393 30 0 0 > >>> glusterfs:struct saved_frame 0 8 124 > >>> > >>> 2 2 0 0 > >>> > >>> glusterfs:struct rpc_req 0 8 588 > >>> > >>> 2 2 0 0 > >>> > >>> glusterfs:rpcsvc_request_t 1 7 2828 > >>> > >>> 2 1 0 0 > >>> > >>> glusterfs:log_buf_t 5 251 140 > >>> 3443 6 0 0 > >>> glusterfs:data_t 242 16141 52 > >>> 138743429 290 0 0 > >>> glusterfs:data_pair_t 230 16153 68 > >>> 126649864 270 0 0 > >>> glusterfs:dict_t 23 4073 140 > >>> 20356289 63 0 0 > >>> glusterfs:call_stub_t 0 1024 3764 > >>> 13678560 31 0 0 > >>> glusterfs:call_stack_t 1 1023 1708 > >>> 11011561 30 0 0 > >>> glusterfs:call_frame_t 1 4095 172 > >>> 125764190 193 0 0 > >>> ---------------------------------------------- > >>> ==> >>> > >>> So, my questions are: > >>> > >>> 1) what one should do to limit GlusterFS FUSE client memory usage? > >>> 2) what one should do to prevent client high loadavg because of high > >>> iowait because of multiple concurrent volume users? > >>> > >>> Server/client OS is CentOS 7.1, GlusterFS server version is 3.7.3, > >>> GlusterFS client version is 3.7.4. > >>> > >>> Any additional info needed? > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > http://www.gluster.org/mailman/listinfo/gluster-users
I tried to debug the inode* related leaks and seen some improvements after applying the below patches when ran the same test (but will smaller load). Could you please apply those patches & confirm the same? a) http://review.gluster.org/13125 This will fix the inodes & their ctx related leaks during unexport and the program exit. Please check the valgrind output after applying the patch. It should not list any inodes related memory as lost. b) http://review.gluster.org/13096 The reason the change in Entries_HWMARK (in your earlier mail) dint have much effect is that the inode_nlookup count doesn't become zero for those handles/inodes being closed by ganesha. Hence those inodes shall get added to inode lru list instead of purge list which shall get forcefully purged only when the number of gfapi inode table entries reaches its limit (which is 137012). This patch fixes those 'nlookup' counts. Please apply this patch and reduce 'Entries_HWMARK' to much lower value and check if it decreases the in-memory being consumed by ganesha process while being active. CACHEINODE { Entries_HWMark = 500; } Note: I see an issue with nfs-ganesha during exit when the option 'Entries_HWMARK' gets changed. This is not related to any of the above patches (or rather Gluster) and I am currently debugging it. Thanks, Soumya On 12/25/2015 11:34 PM, Oleksandr Natalenko wrote:> 1. test with Cache_Size = 256 and Entries_HWMark = 4096 > > Before find . -type f: > > root 3120 0.6 11.0 879120 208408 ? Ssl 17:39 0:00 /usr/bin/ > ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT > > After: > > root 3120 11.4 24.3 1170076 458168 ? Ssl 17:39 13:39 /usr/bin/ > ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT > > ~250M leak. > > 2. test with default values (after ganesha restart) > > Before: > > root 24937 1.3 10.4 875016 197808 ? Ssl 19:39 0:00 /usr/bin/ > ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT > > After: > > root 24937 3.5 18.9 1022544 356340 ? Ssl 19:39 0:40 /usr/bin/ > ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT > > ~159M leak. > > No reasonable correlation detected. Second test was finished much faster than > first (I guess, server-side GlusterFS cache or server kernel page cache is the > cause). > > There are ~1.8M files on this test volume. > > On ????????, 25 ?????? 2015 ?. 20:28:13 EET Soumya Koduri wrote: >> On 12/24/2015 09:17 PM, Oleksandr Natalenko wrote: >>> Another addition: it seems to be GlusterFS API library memory leak >>> because NFS-Ganesha also consumes huge amount of memory while doing >>> ordinary "find . -type f" via NFSv4.2 on remote client. Here is memory >>> usage: >>> >>> ==>>> root 5416 34.2 78.5 2047176 1480552 ? Ssl 12:02 117:54 >>> /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f >>> /etc/ganesha/ganesha.conf -N NIV_EVENT >>> ==>>> >>> 1.4G is too much for simple stat() :(. >>> >>> Ideas? >> >> nfs-ganesha also has cache layer which can scale to millions of entries >> depending on the number of files/directories being looked upon. However >> there are parameters to tune it. So either try stat with few entries or >> add below block in nfs-ganesha.conf file, set low limits and check the >> difference. That may help us narrow down how much memory actually >> consumed by core nfs-ganesha and gfAPI. >> >> CACHEINODE { >> Cache_Size(uint32, range 1 to UINT32_MAX, default 32633); # cache size >> Entries_HWMark(uint32, range 1 to UINT32_MAX, default 100000); #Max no. >> of entries in the cache. >> } >> >> Thanks, >> Soumya >> >>> 24.12.2015 16:32, Oleksandr Natalenko ???????: >>>> Still actual issue for 3.7.6. Any suggestions? >>>> >>>> 24.09.2015 10:14, Oleksandr Natalenko ???????: >>>>> In our GlusterFS deployment we've encountered something like memory >>>>> leak in GlusterFS FUSE client. >>>>> >>>>> We use replicated (?2) GlusterFS volume to store mail (exim+dovecot, >>>>> maildir format). Here is inode stats for both bricks and mountpoint: >>>>> >>>>> ==>>>>> Brick 1 (Server 1): >>>>> >>>>> Filesystem Inodes IUsed >>>>> >>>>> IFree IUse% Mounted on >>>>> >>>>> /dev/mapper/vg_vd1_misc-lv08_mail 578768144 10954918 >>>>> >>>>> 567813226 2% /bricks/r6sdLV08_vd1_mail >>>>> >>>>> Brick 2 (Server 2): >>>>> >>>>> Filesystem Inodes IUsed >>>>> >>>>> IFree IUse% Mounted on >>>>> >>>>> /dev/mapper/vg_vd0_misc-lv07_mail 578767984 10954913 >>>>> >>>>> 567813071 2% /bricks/r6sdLV07_vd0_mail >>>>> >>>>> Mountpoint (Server 3): >>>>> >>>>> Filesystem Inodes IUsed IFree >>>>> IUse% Mounted on >>>>> glusterfs.xxx:mail 578767760 10954915 567812845 >>>>> 2% /var/spool/mail/virtual >>>>> ==>>>>> >>>>> glusterfs.xxx domain has two A records for both Server 1 and Server 2. >>>>> >>>>> Here is volume info: >>>>> >>>>> ==>>>>> Volume Name: mail >>>>> Type: Replicate >>>>> Volume ID: f564e85c-7aa6-4170-9417-1f501aa98cd2 >>>>> Status: Started >>>>> Number of Bricks: 1 x 2 = 2 >>>>> Transport-type: tcp >>>>> Bricks: >>>>> Brick1: server1.xxx:/bricks/r6sdLV08_vd1_mail/mail >>>>> Brick2: server2.xxx:/bricks/r6sdLV07_vd0_mail/mail >>>>> Options Reconfigured: >>>>> nfs.rpc-auth-allow: 1.2.4.0/24,4.5.6.0/24 >>>>> features.cache-invalidation-timeout: 10 >>>>> performance.stat-prefetch: off >>>>> performance.quick-read: on >>>>> performance.read-ahead: off >>>>> performance.flush-behind: on >>>>> performance.write-behind: on >>>>> performance.io-thread-count: 4 >>>>> performance.cache-max-file-size: 1048576 >>>>> performance.cache-size: 67108864 >>>>> performance.readdir-ahead: off >>>>> ==>>>>> >>>>> Soon enough after mounting and exim/dovecot start, glusterfs client >>>>> process begins to consume huge amount of RAM: >>>>> >>>>> ==>>>>> user at server3 ~$ ps aux | grep glusterfs | grep mail >>>>> root 28895 14.4 15.0 15510324 14908868 ? Ssl Sep03 4310:05 >>>>> /usr/sbin/glusterfs --fopen-keep-cache --direct-io-mode=disable >>>>> --volfile-server=glusterfs.xxx --volfile-id=mail >>>>> /var/spool/mail/virtual >>>>> ==>>>>> >>>>> That is, ~15 GiB of RAM. >>>>> >>>>> Also we've tried to use mountpoint withing separate KVM VM with 2 or 3 >>>>> GiB of RAM, and soon after starting mail daemons got OOM killer for >>>>> glusterfs client process. >>>>> >>>>> Mounting same share via NFS works just fine. Also, we have much less >>>>> iowait and loadavg on client side with NFS. >>>>> >>>>> Also, we've tried to change IO threads count and cache size in order >>>>> to limit memory usage with no luck. As you can see, total cache size >>>>> is 4?64==256 MiB (compare to 15 GiB). >>>>> >>>>> Enabling-disabling stat-prefetch, read-ahead and readdir-ahead didn't >>>>> help as well. >>>>> >>>>> Here are volume memory stats: >>>>> >>>>> ==>>>>> Memory status for volume : mail >>>>> ---------------------------------------------- >>>>> Brick : server1.xxx:/bricks/r6sdLV08_vd1_mail/mail >>>>> Mallinfo >>>>> -------- >>>>> Arena : 36859904 >>>>> Ordblks : 10357 >>>>> Smblks : 519 >>>>> Hblks : 21 >>>>> Hblkhd : 30515200 >>>>> Usmblks : 0 >>>>> Fsmblks : 53440 >>>>> Uordblks : 18604144 >>>>> Fordblks : 18255760 >>>>> Keepcost : 114112 >>>>> >>>>> Mempool Stats >>>>> ------------- >>>>> Name HotCount ColdCount PaddedSizeof >>>>> AllocCount MaxAlloc Misses Max-StdAlloc >>>>> ---- -------- --------- ------------ >>>>> ---------- -------- -------- ------------ >>>>> mail-server:fd_t 0 1024 108 >>>>> 30773120 137 0 0 >>>>> mail-server:dentry_t 16110 274 84 >>>>> 235676148 16384 1106499 1152 >>>>> mail-server:inode_t 16363 21 156 >>>>> 237216876 16384 1876651 1169 >>>>> mail-trash:fd_t 0 1024 108 >>>>> >>>>> 0 0 0 0 >>>>> >>>>> mail-trash:dentry_t 0 32768 84 >>>>> >>>>> 0 0 0 0 >>>>> >>>>> mail-trash:inode_t 4 32764 156 >>>>> >>>>> 4 4 0 0 >>>>> >>>>> mail-trash:trash_local_t 0 64 8628 >>>>> >>>>> 0 0 0 0 >>>>> >>>>> mail-changetimerecorder:gf_ctr_local_t 0 64 >>>>> 16540 0 0 0 0 >>>>> mail-changelog:rpcsvc_request_t 0 8 2828 >>>>> >>>>> 0 0 0 0 >>>>> >>>>> mail-changelog:changelog_local_t 0 64 116 >>>>> >>>>> 0 0 0 0 >>>>> >>>>> mail-bitrot-stub:br_stub_local_t 0 512 84 >>>>> 79204 4 0 0 >>>>> mail-locks:pl_local_t 0 32 148 >>>>> 6812757 4 0 0 >>>>> mail-upcall:upcall_local_t 0 512 108 >>>>> >>>>> 0 0 0 0 >>>>> >>>>> mail-marker:marker_local_t 0 128 332 >>>>> 64980 3 0 0 >>>>> mail-quota:quota_local_t 0 64 476 >>>>> >>>>> 0 0 0 0 >>>>> >>>>> mail-server:rpcsvc_request_t 0 512 2828 >>>>> 45462533 34 0 0 >>>>> glusterfs:struct saved_frame 0 8 124 >>>>> >>>>> 2 2 0 0 >>>>> >>>>> glusterfs:struct rpc_req 0 8 588 >>>>> >>>>> 2 2 0 0 >>>>> >>>>> glusterfs:rpcsvc_request_t 1 7 2828 >>>>> >>>>> 2 1 0 0 >>>>> >>>>> glusterfs:log_buf_t 5 251 140 >>>>> 3452 6 0 0 >>>>> glusterfs:data_t 242 16141 52 >>>>> 480115498 664 0 0 >>>>> glusterfs:data_pair_t 230 16153 68 >>>>> 179483528 275 0 0 >>>>> glusterfs:dict_t 23 4073 140 >>>>> 303751675 627 0 0 >>>>> glusterfs:call_stub_t 0 1024 3764 >>>>> 45290655 34 0 0 >>>>> glusterfs:call_stack_t 1 1023 1708 >>>>> 43598469 34 0 0 >>>>> glusterfs:call_frame_t 1 4095 172 >>>>> 336219655 184 0 0 >>>>> ---------------------------------------------- >>>>> Brick : server2.xxx:/bricks/r6sdLV07_vd0_mail/mail >>>>> Mallinfo >>>>> -------- >>>>> Arena : 38174720 >>>>> Ordblks : 9041 >>>>> Smblks : 507 >>>>> Hblks : 21 >>>>> Hblkhd : 30515200 >>>>> Usmblks : 0 >>>>> Fsmblks : 51712 >>>>> Uordblks : 19415008 >>>>> Fordblks : 18759712 >>>>> Keepcost : 114848 >>>>> >>>>> Mempool Stats >>>>> ------------- >>>>> Name HotCount ColdCount PaddedSizeof >>>>> AllocCount MaxAlloc Misses Max-StdAlloc >>>>> ---- -------- --------- ------------ >>>>> ---------- -------- -------- ------------ >>>>> mail-server:fd_t 0 1024 108 >>>>> 2373075 133 0 0 >>>>> mail-server:dentry_t 14114 2270 84 >>>>> 3513654 16384 2300 267 >>>>> mail-server:inode_t 16374 10 156 >>>>> 6766642 16384 194635 1279 >>>>> mail-trash:fd_t 0 1024 108 >>>>> >>>>> 0 0 0 0 >>>>> >>>>> mail-trash:dentry_t 0 32768 84 >>>>> >>>>> 0 0 0 0 >>>>> >>>>> mail-trash:inode_t 4 32764 156 >>>>> >>>>> 4 4 0 0 >>>>> >>>>> mail-trash:trash_local_t 0 64 8628 >>>>> >>>>> 0 0 0 0 >>>>> >>>>> mail-changetimerecorder:gf_ctr_local_t 0 64 >>>>> 16540 0 0 0 0 >>>>> mail-changelog:rpcsvc_request_t 0 8 2828 >>>>> >>>>> 0 0 0 0 >>>>> >>>>> mail-changelog:changelog_local_t 0 64 116 >>>>> >>>>> 0 0 0 0 >>>>> >>>>> mail-bitrot-stub:br_stub_local_t 0 512 84 >>>>> 71354 4 0 0 >>>>> mail-locks:pl_local_t 0 32 148 >>>>> 8135032 4 0 0 >>>>> mail-upcall:upcall_local_t 0 512 108 >>>>> >>>>> 0 0 0 0 >>>>> >>>>> mail-marker:marker_local_t 0 128 332 >>>>> 65005 3 0 0 >>>>> mail-quota:quota_local_t 0 64 476 >>>>> >>>>> 0 0 0 0 >>>>> >>>>> mail-server:rpcsvc_request_t 0 512 2828 >>>>> 12882393 30 0 0 >>>>> glusterfs:struct saved_frame 0 8 124 >>>>> >>>>> 2 2 0 0 >>>>> >>>>> glusterfs:struct rpc_req 0 8 588 >>>>> >>>>> 2 2 0 0 >>>>> >>>>> glusterfs:rpcsvc_request_t 1 7 2828 >>>>> >>>>> 2 1 0 0 >>>>> >>>>> glusterfs:log_buf_t 5 251 140 >>>>> 3443 6 0 0 >>>>> glusterfs:data_t 242 16141 52 >>>>> 138743429 290 0 0 >>>>> glusterfs:data_pair_t 230 16153 68 >>>>> 126649864 270 0 0 >>>>> glusterfs:dict_t 23 4073 140 >>>>> 20356289 63 0 0 >>>>> glusterfs:call_stub_t 0 1024 3764 >>>>> 13678560 31 0 0 >>>>> glusterfs:call_stack_t 1 1023 1708 >>>>> 11011561 30 0 0 >>>>> glusterfs:call_frame_t 1 4095 172 >>>>> 125764190 193 0 0 >>>>> ---------------------------------------------- >>>>> ==>>>>> >>>>> So, my questions are: >>>>> >>>>> 1) what one should do to limit GlusterFS FUSE client memory usage? >>>>> 2) what one should do to prevent client high loadavg because of high >>>>> iowait because of multiple concurrent volume users? >>>>> >>>>> Server/client OS is CentOS 7.1, GlusterFS server version is 3.7.3, >>>>> GlusterFS client version is 3.7.4. >>>>> >>>>> Any additional info needed? >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> http://www.gluster.org/mailman/listinfo/gluster-users > >