David Cunningham
2019-Dec-19 23:49 UTC
[Gluster-users] GFS performance under heavy traffic
Hi Strahil, The chart attached to my original email is taken from the GFS server. I'm not sure what you mean by accessing all bricks simultaneously. We've mounted it from the client like this: gfs1:/gvol0 /mnt/glusterfs/ glusterfs defaults,direct-io-mode=disable,_netdev,backupvolfile-server=gfs2,fetch-attempts=10 0 0 Should we do something different to access all bricks simultaneously? Thanks for your help! On Fri, 20 Dec 2019 at 11:47, Strahil Nikolov <hunter86_bg at yahoo.com> wrote:> I'm not sure if you did measure the traffic from client side (tcpdump on a > client machine) or from Server side. > > In both cases , please verify that the client accesses all bricks > simultaneously, as this can cause unnecessary heals. > > Have you thought about upgrading to v6? There are some enhancements in v6 > which could be beneficial. > > Yet, it is indeed strange that so much traffic is generated with FUSE. > > Another aproach is to test with NFSGanesha which suports pNFS and can > natively speak with Gluster, which cant bring you closer to the previous > setup and also provide some extra performance. > > > Best Regards, > Strahil Nikolov > > > > ? ?????????, 19 ???????? 2019 ?., 02:28:55 ?. ???????+2, David Cunningham < > dcunningham at voisonics.com> ??????: > > > Hi Raghavendra and Strahil, > > We are using GFS version 5.6-1.el7 from the CentOS repository. > Unfortunately we can't modify the application and it expects to read and > write from a normal filesystem. > > There's around 25GB of data being written during a business day, so over > 10 hours that's around 0.7 MBps, which has me mystified as to how it can > generate 114MBps of network traffic. Granted we have read traffic as well, > but still. The chart shows much more inbound traffic to the GFS server than > outbound, suggesting the problem is with data writes. > > Is it possible with GFS to not check with the other nodes when reading? > Our data is mostly static and we don't require 100% guarantee that the data > is up-to-date when reading. > > Thanks for any assistance. > > > On Wed, 18 Dec 2019 at 16:39, Raghavendra Gowdappa <rgowdapp at redhat.com> > wrote: > > What version of Glusterfs are you using? Though, not sure what's the root > cause of your problem, just wanted to point out a bug with read-ahead which > would cause read-amplification over network [1][2], which should be fixed > in recent versions. > > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1214489 > [2] https://bugzilla.redhat.com/show_bug.cgi?id=1393419 > > On Wed, Dec 18, 2019 at 2:50 AM David Cunningham < > dcunningham at voisonics.com> wrote: > > Hello, > > We switched a production system to using GFS instead of NFS at the > weekend, however it didn't go well on Monday when full load hit. The > application started crashing regularly and we had to revert to NFS. It > seems that the problem was high network traffic used by GFS. > > We've two GFS nodes plus one arbiter node, each about 1.3ms latency from > each other. Attached is a chart of network traffic on one of the GFS nodes. > We see that it saturated the 1Gbps link before we reverted to NFS at 15:10. > > The question is, why does GFS use so much network traffic and is there > anything we can do about it? NFS traffic doesn't exceed 4MBps, so 120MBps > for GFS seems awfully high. > > It would also be good to have faster read performance from GFS, but that's > another issue. > > Thanks in advance for any assistance. > > -- > David Cunningham, Voisonics Limited > http://voisonics.com/ > USA: +1 213 221 1092 > New Zealand: +64 (0)28 2558 3782 > ________ > > Community Meeting Calendar: > > APAC Schedule - > Every 2nd and 4th Tuesday at 11:30 AM IST > Bridge: https://bluejeans.com/441850968 > > NA/EMEA Schedule - > Every 1st and 3rd Tuesday at 01:00 PM EDT > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > David Cunningham, Voisonics Limited > http://voisonics.com/ > USA: +1 213 221 1092 > New Zealand: +64 (0)28 2558 3782 > ________ > > Community Meeting Calendar: > > APAC Schedule - > Every 2nd and 4th Tuesday at 11:30 AM IST > Bridge: https://bluejeans.com/441850968 > > NA/EMEA Schedule - > Every 1st and 3rd Tuesday at 01:00 PM EDT > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users >-- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20191220/c845afbe/attachment.html>
Actually I haven't clarified myself.FUSE mounts on the client side is connecting directly to all bricks consisted of the volume.If for some reason (bad routing, firewall blocked) there could be cases where the client can reach 2 out of 3 bricks and this can constantly cause healing to happen (as one of the bricks is never updated) which will degrade the performance and cause excessive network usage.As your attachment is from one of the gluster nodes, this could be the case. Best Regards,Strahil Nikolov ? ?????, 20 ???????? 2019 ?., 01:49:56 ?. ???????+2, David Cunningham <dcunningham at voisonics.com> ??????: Hi Strahil, The chart attached to my original email is taken from the GFS server. I'm not sure what you mean by accessing all bricks simultaneously. We've mounted it from the client like this:gfs1:/gvol0 /mnt/glusterfs/ glusterfs defaults,direct-io-mode=disable,_netdev,backupvolfile-server=gfs2,fetch-attempts=10 0 0 Should we do something different to access all bricks simultaneously? Thanks for your help! On Fri, 20 Dec 2019 at 11:47, Strahil Nikolov <hunter86_bg at yahoo.com> wrote: I'm not sure if you did measure the traffic from client side (tcpdump on a client machine) or from Server side. In both cases , please verify that the client accesses all bricks simultaneously, as this can cause unnecessary heals. Have you thought about upgrading to v6? There are some enhancements in v6 which could be beneficial. Yet, it is indeed strange that so much traffic is generated with FUSE. Another aproach is to test with NFSGanesha which suports pNFS and can natively speak with Gluster, which cant bring you closer to the previous setup and also provide some extra performance. Best Regards,Strahil Nikolov ? ?????????, 19 ???????? 2019 ?., 02:28:55 ?. ???????+2, David Cunningham <dcunningham at voisonics.com> ??????: Hi Raghavendra and Strahil, We are using GFS version 5.6-1.el7 from the CentOS repository. Unfortunately we can't modify the application and it expects to read and write from a normal filesystem. There's around 25GB of data being written during a business day, so over 10 hours that's around 0.7 MBps, which has me mystified as to how it can generate 114MBps of network traffic. Granted we have read traffic as well, but still. The chart shows much more inbound traffic to the GFS server than outbound, suggesting the problem is with data writes. Is it possible with GFS to not check with the other nodes when reading? Our data is mostly static and we don't require 100% guarantee that the data is up-to-date when reading. Thanks for any assistance. On Wed, 18 Dec 2019 at 16:39, Raghavendra Gowdappa <rgowdapp at redhat.com> wrote: What version of Glusterfs are you using? Though, not sure what's the root cause of your problem, just wanted to point out a bug with read-ahead which would cause read-amplification over network [1][2], which should be fixed in recent versions. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1214489[2] https://bugzilla.redhat.com/show_bug.cgi?id=1393419 On Wed, Dec 18, 2019 at 2:50 AM David Cunningham <dcunningham at voisonics.com> wrote: Hello, We switched a production system to using GFS instead of NFS at the weekend, however it didn't go well on Monday when full load hit. The application started crashing regularly and we had to revert to NFS. It seems that the problem was high network traffic used by GFS. We've two GFS nodes plus one arbiter node, each about 1.3ms latency from each other. Attached is a chart of network traffic on one of the GFS nodes. We see that it saturated the 1Gbps link before we reverted to NFS at 15:10. The question is, why does GFS use so much network traffic and is there anything we can do about it? NFS traffic doesn't exceed 4MBps, so 120MBps for GFS seems awfully high. It would also be good to have faster read performance from GFS, but that's another issue. Thanks in advance for any assistance. -- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782________ Community Meeting Calendar: APAC Schedule - Every 2nd and 4th Tuesday at 11:30 AM IST Bridge: https://bluejeans.com/441850968 NA/EMEA Schedule - Every 1st and 3rd Tuesday at 01:00 PM EDT Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782________ Community Meeting Calendar: APAC Schedule - Every 2nd and 4th Tuesday at 11:30 AM IST Bridge: https://bluejeans.com/441850968 NA/EMEA Schedule - Every 1st and 3rd Tuesday at 01:00 PM EDT Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20191220/7a4a5aa4/attachment.html>