David Cunningham
2019-Dec-19 00:28 UTC
[Gluster-users] GFS performance under heavy traffic
Hi Raghavendra and Strahil, We are using GFS version 5.6-1.el7 from the CentOS repository. Unfortunately we can't modify the application and it expects to read and write from a normal filesystem. There's around 25GB of data being written during a business day, so over 10 hours that's around 0.7 MBps, which has me mystified as to how it can generate 114MBps of network traffic. Granted we have read traffic as well, but still. The chart shows much more inbound traffic to the GFS server than outbound, suggesting the problem is with data writes. Is it possible with GFS to not check with the other nodes when reading? Our data is mostly static and we don't require 100% guarantee that the data is up-to-date when reading. Thanks for any assistance. On Wed, 18 Dec 2019 at 16:39, Raghavendra Gowdappa <rgowdapp at redhat.com> wrote:> What version of Glusterfs are you using? Though, not sure what's the root > cause of your problem, just wanted to point out a bug with read-ahead which > would cause read-amplification over network [1][2], which should be fixed > in recent versions. > > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1214489 > [2] https://bugzilla.redhat.com/show_bug.cgi?id=1393419 > > On Wed, Dec 18, 2019 at 2:50 AM David Cunningham < > dcunningham at voisonics.com> wrote: > >> Hello, >> >> We switched a production system to using GFS instead of NFS at the >> weekend, however it didn't go well on Monday when full load hit. The >> application started crashing regularly and we had to revert to NFS. It >> seems that the problem was high network traffic used by GFS. >> >> We've two GFS nodes plus one arbiter node, each about 1.3ms latency from >> each other. Attached is a chart of network traffic on one of the GFS nodes. >> We see that it saturated the 1Gbps link before we reverted to NFS at 15:10. >> >> The question is, why does GFS use so much network traffic and is there >> anything we can do about it? NFS traffic doesn't exceed 4MBps, so 120MBps >> for GFS seems awfully high. >> >> It would also be good to have faster read performance from GFS, but >> that's another issue. >> >> Thanks in advance for any assistance. >> >> -- >> David Cunningham, Voisonics Limited >> http://voisonics.com/ >> USA: +1 213 221 1092 >> New Zealand: +64 (0)28 2558 3782 >> ________ >> >> Community Meeting Calendar: >> >> APAC Schedule - >> Every 2nd and 4th Tuesday at 11:30 AM IST >> Bridge: https://bluejeans.com/441850968 >> >> NA/EMEA Schedule - >> Every 1st and 3rd Tuesday at 01:00 PM EDT >> Bridge: https://bluejeans.com/441850968 >> >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users >> >-- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20191219/0b1b7c3f/attachment.html>
Hi David, Did you try setting "direct-io-mode=disable" on the client mounts? As it is mostly static content it would help to use the kernel caching and read-ahead mechanisms. I think the default is enabled. Regards, Jorick Astrego On 12/19/19 1:28 AM, David Cunningham wrote:> Hi Raghavendra and Strahil, > > We are using GFS version 5.6-1.el7 from the CentOS repository. > Unfortunately we can't modify the application and it expects to read > and write from a normal filesystem. > > There's around 25GB of data being written during a business day, so > over 10 hours that's around 0.7 MBps, which has me mystified as to how > it can generate 114MBps of network traffic. Granted we have read > traffic as well, but still. The chart shows much more inbound traffic > to the GFS server than outbound, suggesting the problem is with data > writes. > > Is it possible with GFS to not check with the other nodes when > reading? Our data is mostly static and we don't require 100% guarantee > that the data is up-to-date when reading. > > Thanks for any assistance. > > > On Wed, 18 Dec 2019 at 16:39, Raghavendra Gowdappa > <rgowdapp at redhat.com <mailto:rgowdapp at redhat.com>> wrote: > > What version of Glusterfs are you using? Though, not sure what's > the root cause of your problem, just wanted to point out a bug > with read-ahead which would cause read-amplification over network > [1][2], which should be fixed in recent versions. > > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1214489 > [2] https://bugzilla.redhat.com/show_bug.cgi?id=1393419 > > On Wed, Dec 18, 2019 at 2:50 AM David Cunningham > <dcunningham at voisonics.com <mailto:dcunningham at voisonics.com>> wrote: > > Hello, > > We switched a production system to using GFS instead of NFS at > the weekend, however it didn't go well on Monday when full > load hit. The application started crashing regularly and we > had to revert to NFS. It seems that the problem was high > network traffic used by GFS. > > We've two GFS nodes plus one arbiter node, each about 1.3ms > latency from each other. Attached is a chart of network > traffic on one of the GFS nodes. We see that it saturated the > 1Gbps link before we reverted to NFS at 15:10. > > The question is, why does GFS use so much network traffic and > is there anything we can do about it? NFS traffic doesn't > exceed 4MBps, so 120MBps for GFS seems awfully high. > > It would also be good to have faster read performance from > GFS, but that's another issue. > > Thanks in advance for any assistance. > > -- > David Cunningham, Voisonics Limited > http://voisonics.com/ > USA: +1 213 221 1092 > New Zealand: +64 (0)28 2558 3782 > ________ > > Community Meeting Calendar: > > APAC Schedule - > Every 2nd and 4th Tuesday at 11:30 AM IST > Bridge: https://bluejeans.com/441850968 > > NA/EMEA Schedule - > Every 1st and 3rd Tuesday at 01:00 PM EDT > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > David Cunningham, Voisonics Limited > http://voisonics.com/ > USA: +1 213 221 1092 > New Zealand: +64 (0)28 2558 3782 > > ________ > > Community Meeting Calendar: > > APAC Schedule - > Every 2nd and 4th Tuesday at 11:30 AM IST > Bridge: https://bluejeans.com/441850968 > > NA/EMEA Schedule - > Every 1st and 3rd Tuesday at 01:00 PM EDT > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-usersMet vriendelijke groet, With kind regards, Jorick Astrego Netbulae Virtualization Experts ---------------- Tel: 053 20 30 270 info at netbulae.eu Staalsteden 4-3A KvK 08198180 Fax: 053 20 30 271 www.netbulae.eu 7547 TA Enschede BTW NL821234584B01 ---------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20191219/ae27cfb0/attachment.html>
I'm not sure if you did measure the traffic from client side (tcpdump on a client machine) or from Server side. In both cases , please verify that the client accesses all bricks simultaneously, as this can cause unnecessary heals. Have you thought about upgrading to v6? There are some enhancements in v6 which could be beneficial. Yet, it is indeed strange that so much traffic is generated with FUSE. Another aproach is to test with NFSGanesha which suports pNFS and can natively speak with Gluster, which cant bring you closer to the previous setup and also provide some extra performance. Best Regards,Strahil Nikolov ? ?????????, 19 ???????? 2019 ?., 02:28:55 ?. ???????+2, David Cunningham <dcunningham at voisonics.com> ??????: Hi Raghavendra and Strahil, We are using GFS version 5.6-1.el7 from the CentOS repository. Unfortunately we can't modify the application and it expects to read and write from a normal filesystem. There's around 25GB of data being written during a business day, so over 10 hours that's around 0.7 MBps, which has me mystified as to how it can generate 114MBps of network traffic. Granted we have read traffic as well, but still. The chart shows much more inbound traffic to the GFS server than outbound, suggesting the problem is with data writes. Is it possible with GFS to not check with the other nodes when reading? Our data is mostly static and we don't require 100% guarantee that the data is up-to-date when reading. Thanks for any assistance. On Wed, 18 Dec 2019 at 16:39, Raghavendra Gowdappa <rgowdapp at redhat.com> wrote: What version of Glusterfs are you using? Though, not sure what's the root cause of your problem, just wanted to point out a bug with read-ahead which would cause read-amplification over network [1][2], which should be fixed in recent versions. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1214489[2] https://bugzilla.redhat.com/show_bug.cgi?id=1393419 On Wed, Dec 18, 2019 at 2:50 AM David Cunningham <dcunningham at voisonics.com> wrote: Hello, We switched a production system to using GFS instead of NFS at the weekend, however it didn't go well on Monday when full load hit. The application started crashing regularly and we had to revert to NFS. It seems that the problem was high network traffic used by GFS. We've two GFS nodes plus one arbiter node, each about 1.3ms latency from each other. Attached is a chart of network traffic on one of the GFS nodes. We see that it saturated the 1Gbps link before we reverted to NFS at 15:10. The question is, why does GFS use so much network traffic and is there anything we can do about it? NFS traffic doesn't exceed 4MBps, so 120MBps for GFS seems awfully high. It would also be good to have faster read performance from GFS, but that's another issue. Thanks in advance for any assistance. -- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782________ Community Meeting Calendar: APAC Schedule - Every 2nd and 4th Tuesday at 11:30 AM IST Bridge: https://bluejeans.com/441850968 NA/EMEA Schedule - Every 1st and 3rd Tuesday at 01:00 PM EDT Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782________ Community Meeting Calendar: APAC Schedule - Every 2nd and 4th Tuesday at 11:30 AM IST Bridge: https://bluejeans.com/441850968 NA/EMEA Schedule - Every 1st and 3rd Tuesday at 01:00 PM EDT Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20191219/425af215/attachment.html>