thr3ads.net - Gluster users - [Gluster-users] Memory leak in 3.6.*? [Jul 2016]

If this information is useful, please help other people find it:
Share via:

Mykola Ulianytskyi

2016-Jul-22 15:47 UTC

[Gluster-users] Memory leak in 3.6.*?

Hi
>  3.7 clients are not compatible with 3.6 servers
Can you provide more info?

I use some 3.7 clients with 3.6 servers and don't see issues.

Thank you

--
With best regards,
Mykola


On Fri, Jul 22, 2016 at 4:31 PM, Yannick Perret
<yannick.perret at liris.cnrs.fr> wrote:> Note: I'm have a dev client machine so I can perform tests or recompile
> glusterfs client if it can helps getting data about that.
>
> I did not test this problem against 3.7.x version as my 2 servers are in
use
> and I can't upgrade them at this time, and 3.7 clients are not
compatible
> with 3.6 servers (as far as I can see from my tests).
>
> --
> Y.
>
>
> Le 22/07/2016 14:06, Yannick Perret a ?crit :
>
> Hello,
> some times ago I posted about a memory leak in client process, but it was
on
> a very old 32bit machine (both kernel and OS) and I don't found
evidences
> about a similar problem on our recent machines.
> But I performed more tests and I have the same problem.
>
> Clients are 64bit Debian 8.2 machines. Glusterfs client on these machines
is
> compiled from sources with activated stuff:
> FUSE client          : yes
> Infiniband verbs     : no
> epoll IO multiplex   : yes
> argp-standalone      : no
> fusermount           : yes
> readline             : yes
> georeplication       : yes
> Linux-AIO            : no
> Enable Debug         : no
> systemtap            : no
> Block Device xlator  : no
> glupy                : no
> Use syslog           : yes
> XML output           : yes
> QEMU Block formats   : no
> Encryption xlator    : yes
> Erasure Code xlator  : yes
>
> I tested both 3.6.7 and 3.6.9 version on client (3.6.7 is the one installed
> on our machines, even on servers, 3.6.9 is for testing with last 3.6
> version).
>
> Here are the operations on the client (also performed with similar results
> with 3.6.7 version):
> # /usr/local/sbin/glusterfs --version
> glusterfs 3.6.9 built on Jul 22 2016 13:27:42
> (?)
> # mount -t glusterfs sto1.my.domain:BACKUP-ADMIN-DATA  /zog/
> # cd /usr/
> # cp -Rp * /zog/TEMP/
> Then monitoring memory used by glusterfs process while 'cp' is
running
> (resp. VSZ and RSS from 'ps'):
> 284740 70232
> 284740 70232
> 284876 71704
> 285000 72684
> 285136 74008
> 285416 75940
> (?)
> 368684 151980
> 369324 153768
> 369836 155576
> 370092 156192
> 370092 156192
> Here both sizes are stable and correspond to the end of 'cp'
command.
> If I restart an other 'cp' (even on the same directories) size
starts again
> to increase.
> If I perform a 'ls -lR' in the directory size also increase:
> 370756 192488
> 389964 212148
> 390948 213232
> (here I ^C the 'ls')
>
> When doing nothing the size don't increase but never decrease (calling
> 'sync' don't change the situation).
> Sending a HUP signal to glusterfs process also increases memory (390948
> 213324 ? 456484 213320).
> Changing volume configuration (changing diagnostics.client-sys-log-level
> value) don't change anything.
>
> Here the actual ps:
> root     17041  4.9  5.2 456484 213320 ?       Ssl  13:29   1:21
> /usr/local/sbin/glusterfs --volfile-server=sto1.my.domain
> --volfile-id=BACKUP-ADMIN-DATA /zog
>
> Of course umouting/remounting fall back to "start" size:
> # umount /zog
> # mount -t glusterfs sto1.my.domain:BACKUP-ADMIN-DATA  /zog/
> ? root     28741  0.3  0.7 273320 30484 ?        Ssl  13:57   0:00
> /usr/local/sbin/glusterfs --volfile-server=sto1.my.domain
> --volfile-id=BACKUP-ADMIN-DATA /zog
>
>
> I didn't saw this before because most of our volumes are mounted
"on demand"
> for some storage activities or are permanently mounted but with very few
> activity.
> But clearly this memory usage driff is a long-term problem. On the old
32bit
> machine I had this problem ("solved" by using NFS mounts in order
to wait
> for this old machine to be replaced) and it lead to glusterfs being killed
> by OS when out of free memory. It was faster than what I describe here but
> it's just a question of time.
>
>
> Thanks for any help about that.
>
> Regards,
> --
> Y.
>
>
> The corresponding volume on servers is (if it can help):
> Volume Name: BACKUP-ADMIN-DATA
> Type: Replicate
> Volume ID: 306d57f3-fb30-4bcc-8687-08bf0a3d7878
> Status: Started
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: sto1.my.domain:/glusterfs/backup-admin/data
> Brick2: sto2.my.domain:/glusterfs/backup-admin/data
> Options Reconfigured:
> diagnostics.client-sys-log-level: WARNING
>
>
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

Yannick Perret

2016-Jul-22 19:12 UTC

head link

[Gluster-users] Memory leak in 3.6.*?

Le 22/07/2016 17:47, Mykola Ulianytskyi a ?crit :> Hi
>
>>   3.7 clients are not compatible with 3.6 servers
> Can you provide more info?
>
> I use some 3.7 clients with 3.6 servers and don't see issues.Well,
with client 3.7.13 compiled on the same machine when I try the same 
mount I get:
# mount -t glusterfs sto1.my.domain:BACKUP-ADMIN-DATA  /zog/
Mount failed. Please check the log file for more details.

Checking the logs (/var/log/glusterfs/zog.log) I have:
[2016-07-22 19:05:40.249143] I [MSGID: 100030] [glusterfsd.c:2338:main] 
0-/usr/local/sbin/glusterfs: Started running /usr/local/sbin/glusterfs 
version 3.7.13 (args: /usr/local/sbin/glusterfs 
--volfile-server=sto1.my.domain --volfile-id=BACKUP-ADMIN-DATA /zog)
[2016-07-22 19:05:40.258437] I [MSGID: 101190] 
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread 
with index 1
[2016-07-22 19:05:40.259480] W [socket.c:701:__socket_rwv] 0-glusterfs: 
readv on <the-IP>:24007 failed (Aucune donn?e disponible)
[2016-07-22 19:05:40.259859] E [rpc-clnt.c:362:saved_frames_unwind] (--> 
/usr/local/lib/libglusterfs.so.0(_gf_log_callingfn+0x175)[0x7fad7d039335]
(-->
/usr/local/lib/libgfrpc.so.0(saved_frames_unwind+0x1b3)[0x7fad7ce04e73] 
(--> 
/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fad7ce04f6e] 
(--> 
/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e)[0x7fad7ce065ee] 
(--> /usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7fad7ce06de8] 
))))) 0-glusterfs: forced unwinding frame type(GlusterFS Handshake) 
op(GETSPEC(2)) called at 2016-07-22 19:05:40.258858 (xid=0x1)
[2016-07-22 19:05:40.259894] E [glusterfsd-mgmt.c:1690:mgmt_getspec_cbk] 
0-mgmt: failed to fetch volume file (key:BACKUP-ADMIN-DATA)
[2016-07-22 19:05:40.259939] W [glusterfsd.c:1251:cleanup_and_exit] 
(-->/usr/local/lib/libgfrpc.so.0(saved_frames_unwind+0x1de) 
[0x7fad7ce04e9e] -->/usr/local/sbin/glusterfs(mgmt_getspec_cbk+0x454) 
[0x40d564] -->/usr/local/sbin/glusterfs(cleanup_and_exit+0x4b) 
[0x407eab] ) 0-: received signum (0), shutting down
[2016-07-22 19:05:40.259965] I [fuse-bridge.c:5720:fini] 0-fuse: 
Unmounting '/zog'.
[2016-07-22 19:05:40.260913] W [glusterfsd.c:1251:cleanup_and_exit] 
(-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x80a4) [0x7fad7c0a30a4] 
-->/usr/local/sbin/glusterfs(glusterfs_sigwaiter+0xc5) [0x408015] 
-->/usr/local/sbin/glusterfs(cleanup_and_exit+0x4b) [0x407eab] ) 0-: 
received signum (15), shutting down

I did not go further about that as I just presumed that 3.7 series was 
not compatible with 3.6 servers but it's maybe something else. But here 
it is the same client, the same server(s) and the same volume.

The compilation is with features (built with "configure 
--disable-tiering" as I don't have installed stuff for that):
FUSE client          : yes
Infiniband verbs     : no
epoll IO multiplex   : yes
argp-standalone      : no
fusermount           : yes
readline             : yes
georeplication       : yes
Linux-AIO            : no
Enable Debug         : no
Block Device xlator  : no
glupy                : yes
Use syslog           : yes
XML output           : yes
QEMU Block formats   : no
Encryption xlator    : yes
Unit Tests           : no
POSIX ACLs           : yes
Data Classification  : no
firewalld-config     : no

Regards,
--
Y.

> Thank you
>
> --
> With best regards,
> Mykola
>
>
> On Fri, Jul 22, 2016 at 4:31 PM, Yannick Perret
> <yannick.perret at liris.cnrs.fr> wrote:
>> Note: I'm have a dev client machine so I can perform tests or
recompile
>> glusterfs client if it can helps getting data about that.
>>
>> I did not test this problem against 3.7.x version as my 2 servers are
in use
>> and I can't upgrade them at this time, and 3.7 clients are not
compatible
>> with 3.6 servers (as far as I can see from my tests).
>>
>> --
>> Y.
>>
>>
>> Le 22/07/2016 14:06, Yannick Perret a ?crit :
>>
>> Hello,
>> some times ago I posted about a memory leak in client process, but it
was on
>> a very old 32bit machine (both kernel and OS) and I don't found
evidences
>> about a similar problem on our recent machines.
>> But I performed more tests and I have the same problem.
>>
>> Clients are 64bit Debian 8.2 machines. Glusterfs client on these
machines is
>> compiled from sources with activated stuff:
>> FUSE client          : yes
>> Infiniband verbs     : no
>> epoll IO multiplex   : yes
>> argp-standalone      : no
>> fusermount           : yes
>> readline             : yes
>> georeplication       : yes
>> Linux-AIO            : no
>> Enable Debug         : no
>> systemtap            : no
>> Block Device xlator  : no
>> glupy                : no
>> Use syslog           : yes
>> XML output           : yes
>> QEMU Block formats   : no
>> Encryption xlator    : yes
>> Erasure Code xlator  : yes
>>
>> I tested both 3.6.7 and 3.6.9 version on client (3.6.7 is the one
installed
>> on our machines, even on servers, 3.6.9 is for testing with last 3.6
>> version).
>>
>> Here are the operations on the client (also performed with similar
results
>> with 3.6.7 version):
>> # /usr/local/sbin/glusterfs --version
>> glusterfs 3.6.9 built on Jul 22 2016 13:27:42
>> (?)
>> # mount -t glusterfs sto1.my.domain:BACKUP-ADMIN-DATA  /zog/
>> # cd /usr/
>> # cp -Rp * /zog/TEMP/
>> Then monitoring memory used by glusterfs process while 'cp' is
running
>> (resp. VSZ and RSS from 'ps'):
>> 284740 70232
>> 284740 70232
>> 284876 71704
>> 285000 72684
>> 285136 74008
>> 285416 75940
>> (?)
>> 368684 151980
>> 369324 153768
>> 369836 155576
>> 370092 156192
>> 370092 156192
>> Here both sizes are stable and correspond to the end of 'cp'
command.
>> If I restart an other 'cp' (even on the same directories) size
starts again
>> to increase.
>> If I perform a 'ls -lR' in the directory size also increase:
>> 370756 192488
>> 389964 212148
>> 390948 213232
>> (here I ^C the 'ls')
>>
>> When doing nothing the size don't increase but never decrease
(calling
>> 'sync' don't change the situation).
>> Sending a HUP signal to glusterfs process also increases memory (390948
>> 213324 ? 456484 213320).
>> Changing volume configuration (changing
diagnostics.client-sys-log-level
>> value) don't change anything.
>>
>> Here the actual ps:
>> root     17041  4.9  5.2 456484 213320 ?       Ssl  13:29   1:21
>> /usr/local/sbin/glusterfs --volfile-server=sto1.my.domain
>> --volfile-id=BACKUP-ADMIN-DATA /zog
>>
>> Of course umouting/remounting fall back to "start" size:
>> # umount /zog
>> # mount -t glusterfs sto1.my.domain:BACKUP-ADMIN-DATA  /zog/
>> ? root     28741  0.3  0.7 273320 30484 ?        Ssl  13:57   0:00
>> /usr/local/sbin/glusterfs --volfile-server=sto1.my.domain
>> --volfile-id=BACKUP-ADMIN-DATA /zog
>>
>>
>> I didn't saw this before because most of our volumes are mounted
"on demand"
>> for some storage activities or are permanently mounted but with very
few
>> activity.
>> But clearly this memory usage driff is a long-term problem. On the old
32bit
>> machine I had this problem ("solved" by using NFS mounts in
order to wait
>> for this old machine to be replaced) and it lead to glusterfs being
killed
>> by OS when out of free memory. It was faster than what I describe here
but
>> it's just a question of time.
>>
>>
>> Thanks for any help about that.
>>
>> Regards,
>> --
>> Y.
>>
>>
>> The corresponding volume on servers is (if it can help):
>> Volume Name: BACKUP-ADMIN-DATA
>> Type: Replicate
>> Volume ID: 306d57f3-fb30-4bcc-8687-08bf0a3d7878
>> Status: Started
>> Number of Bricks: 1 x 2 = 2
>> Transport-type: tcp
>> Bricks:
>> Brick1: sto1.my.domain:/glusterfs/backup-admin/data
>> Brick2: sto2.my.domain:/glusterfs/backup-admin/data
>> Options Reconfigured:
>> diagnostics.client-sys-log-level: WARNING
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3369 bytes
Desc: Signature cryptographique S/MIME
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160722/41ca04dc/attachment.p7s>

Gluster users - Jul 2016 - Memory leak in 3.6.*?

[Gluster-users] Memory leak in 3.6.*?

[Gluster-users] Memory leak in 3.6.*?