Hello Gluster Users, I am hoping someone can help me with resolving an ongoing issue I've been having, I'm new to mailing lists so forgive me if I have gotten anything wrong. We have noticed our performance deteriorating over the last few weeks, easily measured by trying to do an ls on one of our top-level folders, and timing it, which usually would take 2-5 seconds, and now takes up to 20 minutes, which obviously renders our cluster basically unusable. This has been intermittent in the past but is now almost constant and I am not sure how to work out the exact cause. We have noticed some errors in the brick logs, and have noticed that if we kill the right brick process, performance instantly returns back to normal, this is not always the same brick, but it indicates to me something in the brick processes or background tasks may be causing extreme latency. Due to this ability to fix it by killing the right brick process off, I think it's a specific file, or folder, or operation which may be hanging and causing the increased latency, but I am not sure how to work it out. One last thing to add is that our bricks are getting quite full (~95% full), we are trying to migrate data off to new storage but that is going slowly, not helped by this issue. I am currently trying to run a full heal as there appear to be many files needing healing, and I have all brick processes running so they have an opportunity to heal, but this means performance is very poor. It currently takes over 15-20 minutes to do an ls of one of our top-level folders, which just contains 60-80 other folders, this should take 2-5 seconds. This is all being checked by FUSE mount locally on the storage node itself, but it is the same for other clients and VMs accessing the cluster. Initially it seemed our NFS mounts were not affected and operated at normal speed, but testing over the last day has shown that our NFS clients are also extremely slow, so it doesn't seem specific to FUSE as I first thought it might be. I am not sure how to proceed from here, I am fairly new to gluster having inherited this setup from my predecessor and trying to keep it going. I have included some info below to try and help with diagnosis, please let me know if any further info would be helpful. I would really appreciate any advice on what I could try to work out the cause. Thank you in advance for reading this, and any suggestions you might be able to offer. - Patrick This is an example of the main error I see in our brick logs, there have been others, I can post them when I see them again too: [2019-04-20 04:54:43.055680] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on /brick1/<filename> library: system.posix_acl_default [Operation not supported] [2019-04-20 05:01:29.476313] W [posix.c:4929:posix_getxattr] 0-gvAA01-posix: Extended attributes not supported (try remounting brick with 'user_xattr' flag) Our setup consists of 2 storage nodes and an arbiter node. I have noticed our nodes are on slightly different versions, I'm not sure if this could be an issue. We have 9 bricks on each node, made up of ZFS RAIDZ2 pools - total capacity is around 560TB. We have bonded 10gbps NICS on each node, and I have tested bandwidth with iperf and found that it's what would be expected from this config. Individual brick performance seems ok, I've tested several bricks using dd and can write a 10GB files at 1.7GB/s. # dd if=/dev/zero of=/brick1/test/test.file bs=1M count=10000 10000+0 records in 10000+0 records out 10485760000 bytes (10 GB, 9.8 GiB) copied, 6.20303 s, 1.7 GB/s Node 1: # glusterfs --version glusterfs 3.12.15 Node 2: # glusterfs --version glusterfs 3.12.14 Arbiter: # glusterfs --version glusterfs 3.12.14 Here is our gluster volume status: # gluster volume status Status of volume: gvAA01 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 01-B:/brick1/gvAA01/brick 49152 0 Y 7219 Brick 02-B:/brick1/gvAA01/brick 49152 0 Y 21845 Brick 00-A:/arbiterAA01/gvAA01/bri ck1 49152 0 Y 6931 Brick 01-B:/brick2/gvAA01/brick 49153 0 Y 7239 Brick 02-B:/brick2/gvAA01/brick 49153 0 Y 9916 Brick 00-A:/arbiterAA01/gvAA01/bri ck2 49153 0 Y 6939 Brick 01-B:/brick3/gvAA01/brick 49154 0 Y 7235 Brick 02-B:/brick3/gvAA01/brick 49154 0 Y 21858 Brick 00-A:/arbiterAA01/gvAA01/bri ck3 49154 0 Y 6947 Brick 01-B:/brick4/gvAA01/brick 49155 0 Y 31840 Brick 02-B:/brick4/gvAA01/brick 49155 0 Y 9933 Brick 00-A:/arbiterAA01/gvAA01/bri ck4 49155 0 Y 6956 Brick 01-B:/brick5/gvAA01/brick 49156 0 Y 7233 Brick 02-B:/brick5/gvAA01/brick 49156 0 Y 9942 Brick 00-A:/arbiterAA01/gvAA01/bri ck5 49156 0 Y 6964 Brick 01-B:/brick6/gvAA01/brick 49157 0 Y 7234 Brick 02-B:/brick6/gvAA01/brick 49157 0 Y 9952 Brick 00-A:/arbiterAA01/gvAA01/bri ck6 49157 0 Y 6974 Brick 01-B:/brick7/gvAA01/brick 49158 0 Y 7248 Brick 02-B:/brick7/gvAA01/brick 49158 0 Y 9960 Brick 00-A:/arbiterAA01/gvAA01/bri ck7 49158 0 Y 6984 Brick 01-B:/brick8/gvAA01/brick 49159 0 Y 7253 Brick 02-B:/brick8/gvAA01/brick 49159 0 Y 9970 Brick 00-A:/arbiterAA01/gvAA01/bri ck8 49159 0 Y 6993 Brick 01-B:/brick9/gvAA01/brick 49160 0 Y 7245 Brick 02-B:/brick9/gvAA01/brick 49160 0 Y 9984 Brick 00-A:/arbiterAA01/gvAA01/bri ck9 49160 0 Y 7001 NFS Server on localhost 2049 0 Y 17276 Self-heal Daemon on localhost N/A N/A Y 25245 NFS Server on 02-B 2049 0 Y 9089 Self-heal Daemon on 02-B N/A N/A Y 17838 NFS Server on 00-a 2049 0 Y 15660 Self-heal Daemon on 00-a N/A N/A Y 16218 Task Status of Volume gvAA01 ------------------------------------------------------------------------------ There are no active volume tasks And gluster volume info: # gluster volume info Volume Name: gvAA01 Type: Distributed-Replicate Volume ID: ca4ece2c-13fe-414b-856c-2878196d6118 Status: Started Snapshot Count: 0 Number of Bricks: 9 x (2 + 1) = 27 Transport-type: tcp Bricks: Brick1: 01-B:/brick1/gvAA01/brick Brick2: 02-B:/brick1/gvAA01/brick Brick3: 00-A:/arbiterAA01/gvAA01/brick1 (arbiter) Brick4: 01-B:/brick2/gvAA01/brick Brick5: 02-B:/brick2/gvAA01/brick Brick6: 00-A:/arbiterAA01/gvAA01/brick2 (arbiter) Brick7: 01-B:/brick3/gvAA01/brick Brick8: 02-B:/brick3/gvAA01/brick Brick9: 00-A:/arbiterAA01/gvAA01/brick3 (arbiter) Brick10: 01-B:/brick4/gvAA01/brick Brick11: 02-B:/brick4/gvAA01/brick Brick12: 00-A:/arbiterAA01/gvAA01/brick4 (arbiter) Brick13: 01-B:/brick5/gvAA01/brick Brick14: 02-B:/brick5/gvAA01/brick Brick15: 00-A:/arbiterAA01/gvAA01/brick5 (arbiter) Brick16: 01-B:/brick6/gvAA01/brick Brick17: 02-B:/brick6/gvAA01/brick Brick18: 00-A:/arbiterAA01/gvAA01/brick6 (arbiter) Brick19: 01-B:/brick7/gvAA01/brick Brick20: 02-B:/brick7/gvAA01/brick Brick21: 00-A:/arbiterAA01/gvAA01/brick7 (arbiter) Brick22: 01-B:/brick8/gvAA01/brick Brick23: 02-B:/brick8/gvAA01/brick Brick24: 00-A:/arbiterAA01/gvAA01/brick8 (arbiter) Brick25: 01-B:/brick9/gvAA01/brick Brick26: 02-B:/brick9/gvAA01/brick Brick27: 00-A:/arbiterAA01/gvAA01/brick9 (arbiter) Options Reconfigured: cluster.shd-max-threads: 4 performance.least-prio-threads: 16 cluster.readdir-optimize: on performance.quick-read: off performance.stat-prefetch: off cluster.data-self-heal: on cluster.lookup-unhashed: auto cluster.lookup-optimize: on cluster.favorite-child-policy: mtime server.allow-insecure: on transport.address-family: inet client.bind-insecure: on cluster.entry-self-heal: off cluster.metadata-self-heal: off performance.md-cache-timeout: 600 cluster.self-heal-daemon: enable performance.readdir-ahead: on diagnostics.brick-log-level: INFO nfs.disable: off -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190420/77578622/attachment-0001.html>
Nithya Balachandran
2019-Apr-24 04:04 UTC
[Gluster-users] Extremely slow Gluster performance
Hi Patrick, Did this start only after the upgrade? How do you determine which brick process to kill? Are there a lot of files to be healed on the volume? Can you provide a tcpdump of the slow listing from a separate test client mount ? 1. Mount the gluster volume on a different mount point than the one being used by your users. 2. Start capturing packets on the system where you have mounted the volume in (1). - tcpdump -i any -s 0 -w /var/tmp/dirls.pcap tcp and not port 22 3. List the directory that is slow from the fuse client 4. Stop the capture (after a couple of minutes or after the listing returns, whichever is earlier) 5. Send us the pcap and the listing of the same directory from one of the bricks in order to compare the entries. We may need more information post looking at the tcpdump. Regards, Nithya On Tue, 23 Apr 2019 at 23:39, Patrick Rennie <patrickmrennie at gmail.com> wrote:> Hello Gluster Users, > > I am hoping someone can help me with resolving an ongoing issue I've been > having, I'm new to mailing lists so forgive me if I have gotten anything > wrong. We have noticed our performance deteriorating over the last few > weeks, easily measured by trying to do an ls on one of our top-level > folders, and timing it, which usually would take 2-5 seconds, and now takes > up to 20 minutes, which obviously renders our cluster basically unusable. > This has been intermittent in the past but is now almost constant and I am > not sure how to work out the exact cause. We have noticed some errors in > the brick logs, and have noticed that if we kill the right brick process, > performance instantly returns back to normal, this is not always the same > brick, but it indicates to me something in the brick processes or > background tasks may be causing extreme latency. Due to this ability to fix > it by killing the right brick process off, I think it's a specific file, or > folder, or operation which may be hanging and causing the increased > latency, but I am not sure how to work it out. One last thing to add is > that our bricks are getting quite full (~95% full), we are trying to > migrate data off to new storage but that is going slowly, not helped by > this issue. I am currently trying to run a full heal as there appear to be > many files needing healing, and I have all brick processes running so they > have an opportunity to heal, but this means performance is very poor. It > currently takes over 15-20 minutes to do an ls of one of our top-level > folders, which just contains 60-80 other folders, this should take 2-5 > seconds. This is all being checked by FUSE mount locally on the storage > node itself, but it is the same for other clients and VMs accessing the > cluster. Initially it seemed our NFS mounts were not affected and operated > at normal speed, but testing over the last day has shown that our NFS > clients are also extremely slow, so it doesn't seem specific to FUSE as I > first thought it might be. > > I am not sure how to proceed from here, I am fairly new to gluster having > inherited this setup from my predecessor and trying to keep it going. I > have included some info below to try and help with diagnosis, please let me > know if any further info would be helpful. I would really appreciate any > advice on what I could try to work out the cause. Thank you in advance for > reading this, and any suggestions you might be able to offer. > > - Patrick > > This is an example of the main error I see in our brick logs, there have > been others, I can post them when I see them again too: > [2019-04-20 04:54:43.055680] E [MSGID: 113001] > [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on > /brick1/<filename> library: system.posix_acl_default [Operation not > supported] > [2019-04-20 05:01:29.476313] W [posix.c:4929:posix_getxattr] > 0-gvAA01-posix: Extended attributes not supported (try remounting brick > with 'user_xattr' flag) > > Our setup consists of 2 storage nodes and an arbiter node. I have noticed > our nodes are on slightly different versions, I'm not sure if this could be > an issue. We have 9 bricks on each node, made up of ZFS RAIDZ2 pools - > total capacity is around 560TB. > We have bonded 10gbps NICS on each node, and I have tested bandwidth with > iperf and found that it's what would be expected from this config. > Individual brick performance seems ok, I've tested several bricks using dd > and can write a 10GB files at 1.7GB/s. > > # dd if=/dev/zero of=/brick1/test/test.file bs=1M count=10000 > 10000+0 records in > 10000+0 records out > 10485760000 bytes (10 GB, 9.8 GiB) copied, 6.20303 s, 1.7 GB/s > > Node 1: > # glusterfs --version > glusterfs 3.12.15 > > Node 2: > # glusterfs --version > glusterfs 3.12.14 > > Arbiter: > # glusterfs --version > glusterfs 3.12.14 > > Here is our gluster volume status: > > # gluster volume status > Status of volume: gvAA01 > Gluster process TCP Port RDMA Port Online > Pid > > ------------------------------------------------------------------------------ > Brick 01-B:/brick1/gvAA01/brick 49152 0 Y 7219 > Brick 02-B:/brick1/gvAA01/brick 49152 0 Y 21845 > Brick 00-A:/arbiterAA01/gvAA01/bri > ck1 49152 0 Y > 6931 > Brick 01-B:/brick2/gvAA01/brick 49153 0 Y 7239 > Brick 02-B:/brick2/gvAA01/brick 49153 0 Y 9916 > Brick 00-A:/arbiterAA01/gvAA01/bri > ck2 49153 0 Y > 6939 > Brick 01-B:/brick3/gvAA01/brick 49154 0 Y 7235 > Brick 02-B:/brick3/gvAA01/brick 49154 0 Y 21858 > Brick 00-A:/arbiterAA01/gvAA01/bri > ck3 49154 0 Y > 6947 > Brick 01-B:/brick4/gvAA01/brick 49155 0 Y 31840 > Brick 02-B:/brick4/gvAA01/brick 49155 0 Y 9933 > Brick 00-A:/arbiterAA01/gvAA01/bri > ck4 49155 0 Y > 6956 > Brick 01-B:/brick5/gvAA01/brick 49156 0 Y 7233 > Brick 02-B:/brick5/gvAA01/brick 49156 0 Y 9942 > Brick 00-A:/arbiterAA01/gvAA01/bri > ck5 49156 0 Y > 6964 > Brick 01-B:/brick6/gvAA01/brick 49157 0 Y 7234 > Brick 02-B:/brick6/gvAA01/brick 49157 0 Y 9952 > Brick 00-A:/arbiterAA01/gvAA01/bri > ck6 49157 0 Y > 6974 > Brick 01-B:/brick7/gvAA01/brick 49158 0 Y 7248 > Brick 02-B:/brick7/gvAA01/brick 49158 0 Y 9960 > Brick 00-A:/arbiterAA01/gvAA01/bri > ck7 49158 0 Y > 6984 > Brick 01-B:/brick8/gvAA01/brick 49159 0 Y 7253 > Brick 02-B:/brick8/gvAA01/brick 49159 0 Y 9970 > Brick 00-A:/arbiterAA01/gvAA01/bri > ck8 49159 0 Y > 6993 > Brick 01-B:/brick9/gvAA01/brick 49160 0 Y 7245 > Brick 02-B:/brick9/gvAA01/brick 49160 0 Y 9984 > Brick 00-A:/arbiterAA01/gvAA01/bri > ck9 49160 0 Y > 7001 > NFS Server on localhost 2049 0 Y > 17276 > Self-heal Daemon on localhost N/A N/A Y > 25245 > NFS Server on 02-B 2049 0 Y 9089 > Self-heal Daemon on 02-B N/A N/A Y 17838 > NFS Server on 00-a 2049 0 Y 15660 > Self-heal Daemon on 00-a N/A N/A Y 16218 > > Task Status of Volume gvAA01 > > ------------------------------------------------------------------------------ > There are no active volume tasks > > And gluster volume info: > > # gluster volume info > > Volume Name: gvAA01 > Type: Distributed-Replicate > Volume ID: ca4ece2c-13fe-414b-856c-2878196d6118 > Status: Started > Snapshot Count: 0 > Number of Bricks: 9 x (2 + 1) = 27 > Transport-type: tcp > Bricks: > Brick1: 01-B:/brick1/gvAA01/brick > Brick2: 02-B:/brick1/gvAA01/brick > Brick3: 00-A:/arbiterAA01/gvAA01/brick1 (arbiter) > Brick4: 01-B:/brick2/gvAA01/brick > Brick5: 02-B:/brick2/gvAA01/brick > Brick6: 00-A:/arbiterAA01/gvAA01/brick2 (arbiter) > Brick7: 01-B:/brick3/gvAA01/brick > Brick8: 02-B:/brick3/gvAA01/brick > Brick9: 00-A:/arbiterAA01/gvAA01/brick3 (arbiter) > Brick10: 01-B:/brick4/gvAA01/brick > Brick11: 02-B:/brick4/gvAA01/brick > Brick12: 00-A:/arbiterAA01/gvAA01/brick4 (arbiter) > Brick13: 01-B:/brick5/gvAA01/brick > Brick14: 02-B:/brick5/gvAA01/brick > Brick15: 00-A:/arbiterAA01/gvAA01/brick5 (arbiter) > Brick16: 01-B:/brick6/gvAA01/brick > Brick17: 02-B:/brick6/gvAA01/brick > Brick18: 00-A:/arbiterAA01/gvAA01/brick6 (arbiter) > Brick19: 01-B:/brick7/gvAA01/brick > Brick20: 02-B:/brick7/gvAA01/brick > Brick21: 00-A:/arbiterAA01/gvAA01/brick7 (arbiter) > Brick22: 01-B:/brick8/gvAA01/brick > Brick23: 02-B:/brick8/gvAA01/brick > Brick24: 00-A:/arbiterAA01/gvAA01/brick8 (arbiter) > Brick25: 01-B:/brick9/gvAA01/brick > Brick26: 02-B:/brick9/gvAA01/brick > Brick27: 00-A:/arbiterAA01/gvAA01/brick9 (arbiter) > Options Reconfigured: > cluster.shd-max-threads: 4 > performance.least-prio-threads: 16 > cluster.readdir-optimize: on > performance.quick-read: off > performance.stat-prefetch: off > cluster.data-self-heal: on > cluster.lookup-unhashed: auto > cluster.lookup-optimize: on > cluster.favorite-child-policy: mtime > server.allow-insecure: on > transport.address-family: inet > client.bind-insecure: on > cluster.entry-self-heal: off > cluster.metadata-self-heal: off > performance.md-cache-timeout: 600 > cluster.self-heal-daemon: enable > performance.readdir-ahead: on > diagnostics.brick-log-level: INFO > nfs.disable: off > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190424/c9859bc7/attachment.html>