Oleksandr Natalenko
2016-Jun-06 08:24 UTC
[Gluster-users] Huge VSZ (VIRT) usage by glustershd on dummy node
Hello. We use v3.7.11, replica 2 setup between 2 nodes + 1 dummy node for keeping volumes metadata. Now we observe huge VSZ (VIRT) usage by glustershd on dummy node: ==root 15109 0.0 13.7 76552820 535272 ? Ssl ???26 2:11 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option *replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7 == that is ~73G. RSS seems to be OK (~522M). Here is the statedump of glustershd process: [1] Also, here is sum of sizes, presented in statedump: ==# cat /var/run/gluster/glusterdump.15109.dump.1465200139 | awk -F '=' 'BEGIN {sum=0} /^size=/ {sum+=$2} END {print sum}' 353276406 == That is ~337 MiB. Also, here are VIRT values from 2 replica nodes: ==root 24659 0.0 0.3 5645836 451796 ? Ssl ???24 3:28 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/44ec3f29003eccedf894865107d5db90.socket --xlator-option *replicate*.node-uuid=a19afcc2-e26c-43ce-bca6-d27dc1713e87 root 18312 0.0 0.3 6137500 477472 ? Ssl ???19 6:37 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/1670a3abbd1eea968126eb6f5be20322.socket --xlator-option *replicate*.node-uuid=52dca21b-c81c-48b5-9de2-1ed37987fbc2 == Those are 5 to 6G, which is much less than dummy node has, but still look too big for us. Should we care about huge VIRT value on dummy node? Also, how one would debug that? Regards, Oleksandr. [1] https://gist.github.com/d2cfa25251136512580220fcdb8a6ce6
Kaushal M
2016-Jun-06 09:21 UTC
[Gluster-users] [Gluster-devel] Huge VSZ (VIRT) usage by glustershd on dummy node
Has multi-threaded SHD been merged into 3.7.* by any chance? If not, what I'm saying below doesn't apply. We saw problems when encrypted transports were used, because the RPC layer was not reaping threads (doing pthread_join) when a connection ended. This lead to similar observations of huge VIRT and relatively small RSS. I'm not sure how multi-threaded shd works, but it could be leaking threads in a similar way. On Mon, Jun 6, 2016 at 1:54 PM, Oleksandr Natalenko <oleksandr at natalenko.name> wrote:> Hello. > > We use v3.7.11, replica 2 setup between 2 nodes + 1 dummy node for keeping > volumes metadata. > > Now we observe huge VSZ (VIRT) usage by glustershd on dummy node: > > ==> root 15109 0.0 13.7 76552820 535272 ? Ssl ???26 2:11 > /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p > /var/lib/glusterd/glustershd/run/glustershd.pid -l > /var/log/glusterfs/glustershd.log -S > /var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option > *replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7 > ==> > that is ~73G. RSS seems to be OK (~522M). Here is the statedump of > glustershd process: [1] > > Also, here is sum of sizes, presented in statedump: > > ==> # cat /var/run/gluster/glusterdump.15109.dump.1465200139 | awk -F '=' 'BEGIN > {sum=0} /^size=/ {sum+=$2} END {print sum}' > 353276406 > ==> > That is ~337 MiB. > > Also, here are VIRT values from 2 replica nodes: > > ==> root 24659 0.0 0.3 5645836 451796 ? Ssl ???24 3:28 > /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p > /var/lib/glusterd/glustershd/run/glustershd.pid -l > /var/log/glusterfs/glustershd.log -S > /var/run/gluster/44ec3f29003eccedf894865107d5db90.socket --xlator-option > *replicate*.node-uuid=a19afcc2-e26c-43ce-bca6-d27dc1713e87 > root 18312 0.0 0.3 6137500 477472 ? Ssl ???19 6:37 > /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p > /var/lib/glusterd/glustershd/run/glustershd.pid -l > /var/log/glusterfs/glustershd.log -S > /var/run/gluster/1670a3abbd1eea968126eb6f5be20322.socket --xlator-option > *replicate*.node-uuid=52dca21b-c81c-48b5-9de2-1ed37987fbc2 > ==> > Those are 5 to 6G, which is much less than dummy node has, but still look > too big for us. > > Should we care about huge VIRT value on dummy node? Also, how one would > debug that? > > Regards, > Oleksandr. > > [1] https://gist.github.com/d2cfa25251136512580220fcdb8a6ce6 > _______________________________________________ > Gluster-devel mailing list > Gluster-devel at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel
Oleksandr Natalenko
2016-Jun-06 10:35 UTC
[Gluster-users] [Gluster-devel] Huge VSZ (VIRT) usage by glustershd on dummy node
Also, I see lots of entries in pmap output: ==00007ef9ff8f3000 4K ----- [ anon ] 00007ef9ff8f4000 8192K rw--- [ anon ] 00007efa000f4000 4K ----- [ anon ] 00007efa000f5000 8192K rw--- [ anon ] == If I sum them, I get the following: ==# pmap 15109 | grep '[ anon ]' | grep 8192K | wc -l 9261 $ echo "9261*(8192+4)" | bc 75903156 == Which is something like 70G+ I have got in VIRT. 06.06.2016 11:24, Oleksandr Natalenko ???????:> Hello. > > We use v3.7.11, replica 2 setup between 2 nodes + 1 dummy node for > keeping volumes metadata. > > Now we observe huge VSZ (VIRT) usage by glustershd on dummy node: > > ==> root 15109 0.0 13.7 76552820 535272 ? Ssl ???26 2:11 > /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p > /var/lib/glusterd/glustershd/run/glustershd.pid -l > /var/log/glusterfs/glustershd.log -S > /var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket > --xlator-option > *replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7 > ==> > that is ~73G. RSS seems to be OK (~522M). Here is the statedump of > glustershd process: [1] > > Also, here is sum of sizes, presented in statedump: > > ==> # cat /var/run/gluster/glusterdump.15109.dump.1465200139 | awk -F '=' > 'BEGIN {sum=0} /^size=/ {sum+=$2} END {print sum}' > 353276406 > ==> > That is ~337 MiB. > > Also, here are VIRT values from 2 replica nodes: > > ==> root 24659 0.0 0.3 5645836 451796 ? Ssl ???24 3:28 > /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p > /var/lib/glusterd/glustershd/run/glustershd.pid -l > /var/log/glusterfs/glustershd.log -S > /var/run/gluster/44ec3f29003eccedf894865107d5db90.socket > --xlator-option > *replicate*.node-uuid=a19afcc2-e26c-43ce-bca6-d27dc1713e87 > root 18312 0.0 0.3 6137500 477472 ? Ssl ???19 6:37 > /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p > /var/lib/glusterd/glustershd/run/glustershd.pid -l > /var/log/glusterfs/glustershd.log -S > /var/run/gluster/1670a3abbd1eea968126eb6f5be20322.socket > --xlator-option > *replicate*.node-uuid=52dca21b-c81c-48b5-9de2-1ed37987fbc2 > ==> > Those are 5 to 6G, which is much less than dummy node has, but still > look too big for us. > > Should we care about huge VIRT value on dummy node? Also, how one > would debug that? > > Regards, > Oleksandr. > > [1] https://gist.github.com/d2cfa25251136512580220fcdb8a6ce6 > _______________________________________________ > Gluster-devel mailing list > Gluster-devel at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel