Hi, Can you please specify which process has leak? Have you took the statedump of the same process which has leak? Thanks, Sanju On Sat, Feb 2, 2019 at 3:15 PM Pedro Costa <pedro at pmc.digital> wrote:> Hi, > > > > I have a 3x replicated cluster running 4.1.7 on ubuntu 16.04.5, all 3 > replicas are also clients hosting a Node.js/Nginx web server. > > > > The current configuration is as such: > > > > Volume Name: gvol1 > > Type: Replicate > > Volume ID: XXXXXX > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 1 x 3 = 3 > > Transport-type: tcp > > Bricks: > > Brick1: vm000000:/srv/brick1/gvol1 > > Brick2: vm000001:/srv/brick1/gvol1 > > Brick3: vm000002:/srv/brick1/gvol1 > > Options Reconfigured: > > cluster.self-heal-readdir-size: 2KB > > cluster.self-heal-window-size: 2 > > cluster.background-self-heal-count: 20 > > network.ping-timeout: 5 > > disperse.eager-lock: off > > performance.parallel-readdir: on > > performance.readdir-ahead: on > > performance.rda-cache-limit: 128MB > > performance.cache-refresh-timeout: 10 > > performance.nl-cache-timeout: 600 > > performance.nl-cache: on > > cluster.nufa: on > > performance.enable-least-priority: off > > server.outstanding-rpc-limit: 128 > > performance.strict-o-direct: on > > cluster.shd-max-threads: 12 > > client.event-threads: 4 > > cluster.lookup-optimize: on > > network.inode-lru-limit: 90000 > > performance.md-cache-timeout: 600 > > performance.cache-invalidation: on > > performance.cache-samba-metadata: on > > performance.stat-prefetch: on > > features.cache-invalidation-timeout: 600 > > features.cache-invalidation: on > > storage.fips-mode-rchecksum: on > > transport.address-family: inet > > nfs.disable: on > > performance.client-io-threads: on > > features.utime: on > > storage.ctime: on > > server.event-threads: 4 > > performance.cache-size: 256MB > > performance.read-ahead: on > > cluster.readdir-optimize: on > > cluster.strict-readdir: on > > performance.io-thread-count: 8 > > server.allow-insecure: on > > cluster.read-hash-mode: 0 > > cluster.lookup-unhashed: auto > > cluster.choose-local: on > > > > I believe there?s a memory leak somewhere, it just keeps going up until it > hangs one or more nodes taking the whole cluster down sometimes. > > > > I have taken 2 statedumps on one of the nodes, one where the memory is too > high and another just after a reboot with the app running and the volume > fully healed. > > > > > https://pmcdigital.sharepoint.com/:u:/g/EYDsNqTf1UdEuE6B0ZNVPfIBf_I-AbaqHotB1lJOnxLlTg?e=boYP09 > (high memory) > > > > > https://pmcdigital.sharepoint.com/:u:/g/EWZBsnET2xBHl6OxO52RCfIBvQ0uIDQ1GKJZ1GrnviyMhg?e=wI3yaY > (after reboot) > > > > Any help would be greatly appreciated, > > > > Kindest Regards, > > > > > *Pedro Maia Costa **Senior Developer, pmc.digital* > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users-- Thanks, Sanju -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190204/20f08b29/attachment.html>
Hi Sanju, The process was `glusterfs`, yes I took the statedump for the same process (different PID since it was rebooted). Cheers, P. From: Sanju Rakonde <srakonde at redhat.com> Sent: 04 February 2019 06:10 To: Pedro Costa <pedro at pmc.digital> Cc: gluster-users <gluster-users at gluster.org> Subject: Re: [Gluster-users] Help analise statedumps Hi, Can you please specify which process has leak? Have you took the statedump of the same process which has leak? Thanks, Sanju On Sat, Feb 2, 2019 at 3:15 PM Pedro Costa <pedro at pmc.digital<mailto:pedro at pmc.digital>> wrote: Hi, I have a 3x replicated cluster running 4.1.7 on ubuntu 16.04.5, all 3 replicas are also clients hosting a Node.js/Nginx web server. The current configuration is as such: Volume Name: gvol1 Type: Replicate Volume ID: XXXXXX Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: vm000000:/srv/brick1/gvol1 Brick2: vm000001:/srv/brick1/gvol1 Brick3: vm000002:/srv/brick1/gvol1 Options Reconfigured: cluster.self-heal-readdir-size: 2KB cluster.self-heal-window-size: 2 cluster.background-self-heal-count: 20 network.ping-timeout: 5 disperse.eager-lock: off performance.parallel-readdir: on performance.readdir-ahead: on performance.rda-cache-limit: 128MB performance.cache-refresh-timeout: 10 performance.nl-cache-timeout: 600 performance.nl-cache: on cluster.nufa: on performance.enable-least-priority: off server.outstanding-rpc-limit: 128 performance.strict-o-direct: on cluster.shd-max-threads: 12 client.event-threads: 4 cluster.lookup-optimize: on network.inode-lru-limit: 90000 performance.md-cache-timeout: 600 performance.cache-invalidation: on performance.cache-samba-metadata: on performance.stat-prefetch: on features.cache-invalidation-timeout: 600 features.cache-invalidation: on storage.fips-mode-rchecksum: on transport.address-family: inet nfs.disable: on performance.client-io-threads: on features.utime: on storage.ctime: on server.event-threads: 4 performance.cache-size: 256MB performance.read-ahead: on cluster.readdir-optimize: on cluster.strict-readdir: on performance.io-thread-count: 8 server.allow-insecure: on cluster.read-hash-mode: 0 cluster.lookup-unhashed: auto cluster.choose-local: on I believe there?s a memory leak somewhere, it just keeps going up until it hangs one or more nodes taking the whole cluster down sometimes. I have taken 2 statedumps on one of the nodes, one where the memory is too high and another just after a reboot with the app running and the volume fully healed. https://pmcdigital.sharepoint.com/:u:/g/EYDsNqTf1UdEuE6B0ZNVPfIBf_I-AbaqHotB1lJOnxLlTg?e=boYP09 (high memory) https://pmcdigital.sharepoint.com/:u:/g/EWZBsnET2xBHl6OxO52RCfIBvQ0uIDQ1GKJZ1GrnviyMhg?e=wI3yaY (after reboot) Any help would be greatly appreciated, Kindest Regards, Pedro Maia Costa Senior Developer, pmc.digital _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org<mailto:Gluster-users at gluster.org> https://lists.gluster.org/mailman/listinfo/gluster-users -- Thanks, Sanju -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190204/d35e6c7f/attachment.html>
Hi Sanju, If it helps, here?s also a statedump (taken just now) since the reboot?s: https://pmcdigital.sharepoint.com/:u:/g/EbsT2RZsuc5BsRrf7F-fw-4BocyeogW-WvEike_sg8CpZg?e=a7nTqS Many thanks, P. From: Pedro Costa Sent: 04 February 2019 10:12 To: 'Sanju Rakonde' <srakonde at redhat.com> Cc: gluster-users <gluster-users at gluster.org> Subject: RE: [Gluster-users] Help analise statedumps Hi Sanju, The process was `glusterfs`, yes I took the statedump for the same process (different PID since it was rebooted). Cheers, P. From: Sanju Rakonde <srakonde at redhat.com> Sent: 04 February 2019 06:10 To: Pedro Costa <pedro at pmc.digital> Cc: gluster-users <gluster-users at gluster.org> Subject: Re: [Gluster-users] Help analise statedumps Hi, Can you please specify which process has leak? Have you took the statedump of the same process which has leak? Thanks, Sanju On Sat, Feb 2, 2019 at 3:15 PM Pedro Costa <pedro at pmc.digital<mailto:pedro at pmc.digital>> wrote: Hi, I have a 3x replicated cluster running 4.1.7 on ubuntu 16.04.5, all 3 replicas are also clients hosting a Node.js/Nginx web server. The current configuration is as such: Volume Name: gvol1 Type: Replicate Volume ID: XXXXXX Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: vm000000:/srv/brick1/gvol1 Brick2: vm000001:/srv/brick1/gvol1 Brick3: vm000002:/srv/brick1/gvol1 Options Reconfigured: cluster.self-heal-readdir-size: 2KB cluster.self-heal-window-size: 2 cluster.background-self-heal-count: 20 network.ping-timeout: 5 disperse.eager-lock: off performance.parallel-readdir: on performance.readdir-ahead: on performance.rda-cache-limit: 128MB performance.cache-refresh-timeout: 10 performance.nl-cache-timeout: 600 performance.nl-cache: on cluster.nufa: on performance.enable-least-priority: off server.outstanding-rpc-limit: 128 performance.strict-o-direct: on cluster.shd-max-threads: 12 client.event-threads: 4 cluster.lookup-optimize: on network.inode-lru-limit: 90000 performance.md-cache-timeout: 600 performance.cache-invalidation: on performance.cache-samba-metadata: on performance.stat-prefetch: on features.cache-invalidation-timeout: 600 features.cache-invalidation: on storage.fips-mode-rchecksum: on transport.address-family: inet nfs.disable: on performance.client-io-threads: on features.utime: on storage.ctime: on server.event-threads: 4 performance.cache-size: 256MB performance.read-ahead: on cluster.readdir-optimize: on cluster.strict-readdir: on performance.io-thread-count: 8 server.allow-insecure: on cluster.read-hash-mode: 0 cluster.lookup-unhashed: auto cluster.choose-local: on I believe there?s a memory leak somewhere, it just keeps going up until it hangs one or more nodes taking the whole cluster down sometimes. I have taken 2 statedumps on one of the nodes, one where the memory is too high and another just after a reboot with the app running and the volume fully healed. https://pmcdigital.sharepoint.com/:u:/g/EYDsNqTf1UdEuE6B0ZNVPfIBf_I-AbaqHotB1lJOnxLlTg?e=boYP09 (high memory) https://pmcdigital.sharepoint.com/:u:/g/EWZBsnET2xBHl6OxO52RCfIBvQ0uIDQ1GKJZ1GrnviyMhg?e=wI3yaY (after reboot) Any help would be greatly appreciated, Kindest Regards, Pedro Maia Costa Senior Developer, pmc.digital _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org<mailto:Gluster-users at gluster.org> https://lists.gluster.org/mailman/listinfo/gluster-users -- Thanks, Sanju -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190204/5c1f9a5f/attachment.html>