thr3ads.net - Gluster users - [Gluster-users] Help analise statedumps [Feb 2019]

If this information is useful, please help other people find it:
Share via:

Sanju Rakonde

2019-Feb-04 06:10 UTC

[Gluster-users] Help analise statedumps

Hi,

Can you please specify which process has leak? Have you took the statedump
of the same process which has leak?

Thanks,
Sanju

On Sat, Feb 2, 2019 at 3:15 PM Pedro Costa <pedro at pmc.digital> wrote:
> Hi,
>
>
>
> I have a 3x replicated cluster running 4.1.7 on ubuntu 16.04.5, all 3
> replicas are also clients hosting a Node.js/Nginx web server.
>
>
>
> The current configuration is as such:
>
>
>
> Volume Name: gvol1
>
> Type: Replicate
>
> Volume ID: XXXXXX
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 1 x 3 = 3
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: vm000000:/srv/brick1/gvol1
>
> Brick2: vm000001:/srv/brick1/gvol1
>
> Brick3: vm000002:/srv/brick1/gvol1
>
> Options Reconfigured:
>
> cluster.self-heal-readdir-size: 2KB
>
> cluster.self-heal-window-size: 2
>
> cluster.background-self-heal-count: 20
>
> network.ping-timeout: 5
>
> disperse.eager-lock: off
>
> performance.parallel-readdir: on
>
> performance.readdir-ahead: on
>
> performance.rda-cache-limit: 128MB
>
> performance.cache-refresh-timeout: 10
>
> performance.nl-cache-timeout: 600
>
> performance.nl-cache: on
>
> cluster.nufa: on
>
> performance.enable-least-priority: off
>
> server.outstanding-rpc-limit: 128
>
> performance.strict-o-direct: on
>
> cluster.shd-max-threads: 12
>
> client.event-threads: 4
>
> cluster.lookup-optimize: on
>
> network.inode-lru-limit: 90000
>
> performance.md-cache-timeout: 600
>
> performance.cache-invalidation: on
>
> performance.cache-samba-metadata: on
>
> performance.stat-prefetch: on
>
> features.cache-invalidation-timeout: 600
>
> features.cache-invalidation: on
>
> storage.fips-mode-rchecksum: on
>
> transport.address-family: inet
>
> nfs.disable: on
>
> performance.client-io-threads: on
>
> features.utime: on
>
> storage.ctime: on
>
> server.event-threads: 4
>
> performance.cache-size: 256MB
>
> performance.read-ahead: on
>
> cluster.readdir-optimize: on
>
> cluster.strict-readdir: on
>
> performance.io-thread-count: 8
>
> server.allow-insecure: on
>
> cluster.read-hash-mode: 0
>
> cluster.lookup-unhashed: auto
>
> cluster.choose-local: on
>
>
>
> I believe there?s a memory leak somewhere, it just keeps going up until it
> hangs one or more nodes taking the whole cluster down sometimes.
>
>
>
> I have taken 2 statedumps on one of the nodes, one where the memory is too
> high and another just after a reboot with the app running and the volume
> fully healed.
>
>
>
>
>
https://pmcdigital.sharepoint.com/:u:/g/EYDsNqTf1UdEuE6B0ZNVPfIBf_I-AbaqHotB1lJOnxLlTg?e=boYP09
> (high memory)
>
>
>
>
>
https://pmcdigital.sharepoint.com/:u:/g/EWZBsnET2xBHl6OxO52RCfIBvQ0uIDQ1GKJZ1GrnviyMhg?e=wI3yaY
>  (after reboot)
>
>
>
> Any help would be greatly appreciated,
>
>
>
> Kindest Regards,
>
>
>
>
> *Pedro Maia Costa **Senior Developer, pmc.digital*
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users


-- 
Thanks,
Sanju
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190204/20f08b29/attachment.html>

Pedro Costa

2019-Feb-04 10:11 UTC

head link

[Gluster-users] Help analise statedumps

Hi Sanju,

The process was `glusterfs`, yes I took the statedump for the same process
(different PID since it was rebooted).

Cheers,
P.

From: Sanju Rakonde <srakonde at redhat.com>
Sent: 04 February 2019 06:10
To: Pedro Costa <pedro at pmc.digital>
Cc: gluster-users <gluster-users at gluster.org>
Subject: Re: [Gluster-users] Help analise statedumps

Hi,

Can you please specify which process has leak? Have you took the statedump of
the same process which has leak?

Thanks,
Sanju

On Sat, Feb 2, 2019 at 3:15 PM Pedro Costa <pedro at
pmc.digital<mailto:pedro at pmc.digital>> wrote:
Hi,

I have a 3x replicated cluster running 4.1.7 on ubuntu 16.04.5, all 3 replicas
are also clients hosting a Node.js/Nginx web server.

The current configuration is as such:

Volume Name: gvol1
Type: Replicate
Volume ID: XXXXXX
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: vm000000:/srv/brick1/gvol1
Brick2: vm000001:/srv/brick1/gvol1
Brick3: vm000002:/srv/brick1/gvol1
Options Reconfigured:
cluster.self-heal-readdir-size: 2KB
cluster.self-heal-window-size: 2
cluster.background-self-heal-count: 20
network.ping-timeout: 5
disperse.eager-lock: off
performance.parallel-readdir: on
performance.readdir-ahead: on
performance.rda-cache-limit: 128MB
performance.cache-refresh-timeout: 10
performance.nl-cache-timeout: 600
performance.nl-cache: on
cluster.nufa: on
performance.enable-least-priority: off
server.outstanding-rpc-limit: 128
performance.strict-o-direct: on
cluster.shd-max-threads: 12
client.event-threads: 4
cluster.lookup-optimize: on
network.inode-lru-limit: 90000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.cache-samba-metadata: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: on
features.utime: on
storage.ctime: on
server.event-threads: 4
performance.cache-size: 256MB
performance.read-ahead: on
cluster.readdir-optimize: on
cluster.strict-readdir: on
performance.io-thread-count: 8
server.allow-insecure: on
cluster.read-hash-mode: 0
cluster.lookup-unhashed: auto
cluster.choose-local: on

I believe there?s a memory leak somewhere, it just keeps going up until it hangs
one or more nodes taking the whole cluster down sometimes.

I have taken 2 statedumps on one of the nodes, one where the memory is too high
and another just after a reboot with the app running and the volume fully
healed.

https://pmcdigital.sharepoint.com/:u:/g/EYDsNqTf1UdEuE6B0ZNVPfIBf_I-AbaqHotB1lJOnxLlTg?e=boYP09
(high memory)

https://pmcdigital.sharepoint.com/:u:/g/EWZBsnET2xBHl6OxO52RCfIBvQ0uIDQ1GKJZ1GrnviyMhg?e=wI3yaY
(after reboot)

Any help would be greatly appreciated,

Kindest Regards,

Pedro Maia Costa
Senior Developer, pmc.digital
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org<mailto:Gluster-users at gluster.org>
https://lists.gluster.org/mailman/listinfo/gluster-users


--
Thanks,
Sanju
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190204/d35e6c7f/attachment.html>

Pedro Costa

2019-Feb-04 11:27 UTC

head link

[Gluster-users] Help analise statedumps

Hi Sanju,

If it helps, here?s also a statedump (taken just now) since the reboot?s:

https://pmcdigital.sharepoint.com/:u:/g/EbsT2RZsuc5BsRrf7F-fw-4BocyeogW-WvEike_sg8CpZg?e=a7nTqS

Many thanks,
P.

From: Pedro Costa
Sent: 04 February 2019 10:12
To: 'Sanju Rakonde' <srakonde at redhat.com>
Cc: gluster-users <gluster-users at gluster.org>
Subject: RE: [Gluster-users] Help analise statedumps

Hi Sanju,

The process was `glusterfs`, yes I took the statedump for the same process
(different PID since it was rebooted).

Cheers,
P.

From: Sanju Rakonde <srakonde at redhat.com>
Sent: 04 February 2019 06:10
To: Pedro Costa <pedro at pmc.digital>
Cc: gluster-users <gluster-users at gluster.org>
Subject: Re: [Gluster-users] Help analise statedumps

Hi,

Can you please specify which process has leak? Have you took the statedump of
the same process which has leak?

Thanks,
Sanju

On Sat, Feb 2, 2019 at 3:15 PM Pedro Costa <pedro at
pmc.digital<mailto:pedro at pmc.digital>> wrote:
Hi,

I have a 3x replicated cluster running 4.1.7 on ubuntu 16.04.5, all 3 replicas
are also clients hosting a Node.js/Nginx web server.

The current configuration is as such:

Volume Name: gvol1
Type: Replicate
Volume ID: XXXXXX
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: vm000000:/srv/brick1/gvol1
Brick2: vm000001:/srv/brick1/gvol1
Brick3: vm000002:/srv/brick1/gvol1
Options Reconfigured:
cluster.self-heal-readdir-size: 2KB
cluster.self-heal-window-size: 2
cluster.background-self-heal-count: 20
network.ping-timeout: 5
disperse.eager-lock: off
performance.parallel-readdir: on
performance.readdir-ahead: on
performance.rda-cache-limit: 128MB
performance.cache-refresh-timeout: 10
performance.nl-cache-timeout: 600
performance.nl-cache: on
cluster.nufa: on
performance.enable-least-priority: off
server.outstanding-rpc-limit: 128
performance.strict-o-direct: on
cluster.shd-max-threads: 12
client.event-threads: 4
cluster.lookup-optimize: on
network.inode-lru-limit: 90000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.cache-samba-metadata: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: on
features.utime: on
storage.ctime: on
server.event-threads: 4
performance.cache-size: 256MB
performance.read-ahead: on
cluster.readdir-optimize: on
cluster.strict-readdir: on
performance.io-thread-count: 8
server.allow-insecure: on
cluster.read-hash-mode: 0
cluster.lookup-unhashed: auto
cluster.choose-local: on

I believe there?s a memory leak somewhere, it just keeps going up until it hangs
one or more nodes taking the whole cluster down sometimes.

I have taken 2 statedumps on one of the nodes, one where the memory is too high
and another just after a reboot with the app running and the volume fully
healed.

https://pmcdigital.sharepoint.com/:u:/g/EYDsNqTf1UdEuE6B0ZNVPfIBf_I-AbaqHotB1lJOnxLlTg?e=boYP09
(high memory)

https://pmcdigital.sharepoint.com/:u:/g/EWZBsnET2xBHl6OxO52RCfIBvQ0uIDQ1GKJZ1GrnviyMhg?e=wI3yaY
(after reboot)

Any help would be greatly appreciated,

Kindest Regards,

Pedro Maia Costa
Senior Developer, pmc.digital
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org<mailto:Gluster-users at gluster.org>
https://lists.gluster.org/mailman/listinfo/gluster-users


--
Thanks,
Sanju
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190204/5c1f9a5f/attachment.html>

Gluster users - Feb 2019 - Help analise statedumps

[Gluster-users] Help analise statedumps

[Gluster-users] Help analise statedumps

[Gluster-users] Help analise statedumps