I have a 2 node single process AFR setup with 1.544Mbps bandwidth between the 2 nodes. When I write a 1MB file to the gluster share it seems to AFR to the other node in real time killing my disk IO speeds on the gluster mount point. Is there anyway to fix this? Ideally I would like to see near real disk IO speeds from/to the local gluster mount point and let the afr play catch up in the background as the bandwidth becomes available. Gluster Spec File (same on both nodes) http://pastebin.com/m58dc49d4 IO speed tests: # time dd if=/dev/urandom of=/mnt/gluster/disktest count=1024 bs=1024 1024+0 records in 1024+0 records out 1048576 bytes (1.0 MB) copied, 8.34701 s, 126 kB/s real 0m8.547s user 0m0.000s sys 0m0.372s # time dd if=/dev/urandom of=/tmp/disktest count=1024 bs=1024 1024+0 records in 1024+0 records out 1048576 bytes (1.0 MB) copied, 0.253865 s, 4.1 MB/s real 0m0.259s user 0m0.000s sys 0m0.284s Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20090122/c6e83375/attachment.html>
Do you have write-behind loaded on the client side? For IO testing, use /dev/zero instead of /dev/urandom. avati On Fri, Jan 23, 2009 at 2:14 AM, Evan <_Gluster at devnada.com> wrote:> I have a 2 node single process AFR setup with 1.544Mbps bandwidth between > the 2 nodes. When I write a 1MB file to the gluster share it seems to AFR to > the other node in real time killing my disk IO speeds on the gluster mount > point. Is there anyway to fix this? Ideally I would like to see near real > disk IO speeds from/to the local gluster mount point and let the afr play > catch up in the background as the bandwidth becomes available. > > Gluster Spec File (same on both nodes) http://pastebin.com/m58dc49d4 > IO speed tests: > # time dd if=/dev/urandom of=/mnt/gluster/disktest count=1024 bs=1024 > 1024+0 records in > 1024+0 records out > 1048576 bytes (1.0 MB) copied, 8.34701 s, 126 kB/s > > real 0m8.547s > user 0m0.000s > sys 0m0.372s > > # time dd if=/dev/urandom of=/tmp/disktest count=1024 bs=1024 > 1024+0 records in > 1024+0 records out > 1048576 bytes (1.0 MB) copied, 0.253865 s, 4.1 MB/s > > real 0m0.259s > user 0m0.000s > sys 0m0.284s > > > Thanks > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users > >
Ideal what I'm trying to accomplish is to have multiple samba servers connected with a 1.544Mbps pipe stay in sync with each other. So its important for me to be able to have near really disk access speeds when reading and writing to the local gluster Node in the AFR group but as far as getting the data propagated out to the other nodes I know the 1.544Mbps pipe can't keep up so I'll take the fastest I can get as long as it doesn't affect the local performance (which is what I am seeing). On Fri, Jan 23, 2009 at 10:56 AM, Keith Freedman <freedman at freeformit.com>wrote:> At 10:18 AM 1/23/2009, Evan wrote: > >> I added the following to the bottom of my spec file: >> >> volume writebehind >> type performance/write-behind >> option aggregate-size 10MB # default is 0bytes >> option flush-behind off # default is 'off' >> subvolumes afr >> end-volume >> >> which gives me the following results when making a 10MB file >> # time dd if=/dev/zero of=/tmp/disktest count=10240 bs=1024 >> 10240+0 records in >> 10240+0 records out >> 10485760 bytes (10 MB) copied, 0.173179 s, 60.5 MB/s >> >> real 0m0.183s >> user 0m0.000s >> sys 0m0.204s >> >> # time dd if=/dev/zero of=/mnt/gluster/disktest count=10240 bs=1024 >> 10240+0 records in >> 10240+0 records out >> 10485760 bytes (10 MB) copied, 5.50861 s, 1.9 MB/s >> >> real 0m5.720s >> user 0m0.000s >> sys 0m0.060s >> >> Although this is better than I had before is there anyway to have gluster >> write the data to the localBrick and then sync/afr in the background so I >> could expect to see something closer to the 60 MB/s I see when writing to >> the local disk directly? >> > > what you really want is a delayed replication. I've asked for this in this > mailing list recently, and was told that it's something they've considered > (more as a DR feature than an HA feature), but it's not currently on the > list of priorities. > > The issue, as I see it, is if it's an HA feature, then you really need to > insure that the data is replicated before you let your application think the > data is written. If the replication was delayed, and the server went down, > the data is lost forever. This is bad for HA. > if it's a DR feature, then you're probably ok, because usually disaster > recovery scenarios can probably withstand some data loss, and you're more > interested in a point-in-time snapshot of the data. > > FUSE is a problem, and TCP/IP is a problem with much overhead and large > blocksizes. > > Ideally, glusters write-behind would be smart enough to aggregate smaller > blocks of data into a large write. I think this would solve a lot of the > problem you're having in your tests. > > Thanks >> >> aghavendra G <<mailto:raghavendra at zresearch.com>raghavendra at zresearch.com> >> wrote: >> above afr with afr as a subvolume >> >> On Fri, Jan 23, 2009 at 12:59 AM, Evan <_<mailto:Gluster at devnada.com> >> Gluster at devnada.com> wrote: >> Where should I put the write-behind translator? >> Just above afr with afr as a subvolume? Or should I put it just above my >> localBrick volume and below afr? >> >> >> Here is the output using /dev/zero: >> # time dd if=/dev/zero of=/mnt/gluster/disktest count=1024 bs=1024 >> >> 1024+0 records in >> 1024+0 records out >> 1048576 bytes (1.0 MB) copied, 1.90119 s, 552 kB/s >> >> real 0m2.098s >> user 0m0.000s >> sys 0m0.016s >> >> # time dd if=/dev/zero of=/tmp/disktest count=1024 bs=1024 >> >> 1024+0 records in >> 1024+0 records out >> 1048576 bytes (1.0 MB) copied, 0.0195388 s, 53.7 MB/s >> >> real 0m0.026s >> user 0m0.000s >> sys 0m0.028s >> >> >> Thanks >> >> >> On Thu, Jan 22, 2009 at 12:52 PM, Anand Avati <<mailto: >> avati at zresearch.com>avati at zresearch.com> wrote: >> Do you have write-behind loaded on the client side? For IO testing, >> use /dev/zero instead of /dev/urandom. >> >> avati >> >> On Fri, Jan 23, 2009 at 2:14 AM, Evan <_<mailto:Gluster at devnada.com> >> Gluster at devnada.com> wrote: >> > I have a 2 node single process AFR setup with 1.544Mbps bandwidth >> between >> > the 2 nodes. When I write a 1MB file to the gluster share it seems to >> AFR to >> > the other node in real time killing my disk IO speeds on the gluster >> mount >> > point. Is there anyway to fix this? Ideally I would like to see near >> real >> > disk IO speeds from/to the local gluster mount point and let the afr >> play >> > catch up in the background as the bandwidth becomes available. >> > >> > Gluster Spec File (same on both nodes) <http://pastebin.com/m58dc49d4> >> http://pastebin.com/m58dc49d4 >> > IO speed tests: >> > # time dd if=/dev/urandom of=/mnt/gluster/disktest count=1024 bs=1024 >> > 1024+0 records in >> > 1024+0 records out >> > 1048576 bytes (1.0 MB) copied, 8.34701 s, 126 kB/s >> > >> > real 0m8.547s >> > user 0m0.000s >> > sys 0m0.372s >> > >> > # time dd if=/dev/urandom of=/tmp/disktest count=1024 bs=1024 >> > 1024+0 records in >> > 1024+0 records out >> > 1048576 bytes (1.0 MB) copied, 0.253865 s, 4.1 MB/s >> > >> > real 0m0.259s >> > user 0m0.000s >> > sys 0m0.284s >> > >> > >> > Thanks >> > >> > _______________________________________________ >> > Gluster-users mailing list >> > <mailto:Gluster-users at gluster.org>Gluster-users at gluster.org >> > http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users >> > >> > >> >> >> >> _______________________________________________ >> Gluster-users mailing list >> <mailto:Gluster-users at gluster.org>Gluster-users at gluster.org >> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users >> >> >> >> >> -- >> Raghavendra G >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20090123/5f78725f/attachment.html>
Keith Freedman
2009-Jan-23 19:12 UTC
[Gluster-users] Gluster-users Digest, Vol 9, Issue 66
dont forget to set : option read-subvolume BRICKNAME in your AFR config, that''ll improve your read performance significantly. At 11:06 AM 1/23/2009, Evan wrote:>Ideal what I''m trying to accomplish is to have multiple samba >servers connected with a 1.544Mbps pipe stay in sync with each >other. So its important for me to be able to have near really disk >access speeds when reading and writing to the local gluster Node in >the AFR group but as far as getting the data propagated out to the >other nodes I know the 1.544Mbps pipe can''t keep up so I''ll take the >fastest I can get as long as it doesn''t affect the local performance >(which is what I am seeing). > > > >On Fri, Jan 23, 2009 at 10:56 AM, Keith Freedman ><<mailto:freedman at freeformit.com>freedman at freeformit.com> wrote: >At 10:18 AM 1/23/2009, Evan wrote: >I added the following to the bottom of my spec file: > >volume writebehind > type performance/write-behind > option aggregate-size 10MB # default is 0bytes > option flush-behind off # default is ''off'' > subvolumes afr >end-volume > >which gives me the following results when making a 10MB file ># time dd if=/dev/zero of=/tmp/disktest count=10240 bs=1024 >10240+0 records in >10240+0 records out >10485760 bytes (10 MB) copied, 0.173179 s, 60.5 MB/s > >real 0m0.183s >user 0m0.000s >sys 0m0.204s > ># time dd if=/dev/zero of=/mnt/gluster/disktest count=10240 bs=1024 >10240+0 records in >10240+0 records out >10485760 bytes (10 MB) copied, 5.50861 s, 1.9 MB/s > >real 0m5.720s >user 0m0.000s >sys 0m0.060s > >Although this is better than I had before is there anyway to have >gluster write the data to the localBrick and then sync/afr in the >background so I could expect to see something closer to the 60 MB/s >I see when writing to the local disk directly? > > >what you really want is a delayed replication. I''ve asked for this >in this mailing list recently, and was told that it''s something >they''ve considered (more as a DR feature than an HA feature), but >it''s not currently on the list of priorities. > >The issue, as I see it, is if it''s an HA feature, then you really >need to insure that the data is replicated before you let your >application think the data is written. If the replication was >delayed, and the server went down, the data is lost forever. This >is bad for HA. >if it''s a DR feature, then you''re probably ok, because usually >disaster recovery scenarios can probably withstand some data loss, >and you''re more interested in a point-in-time snapshot of the data. > >FUSE is a problem, and TCP/IP is a problem with much overhead and >large blocksizes. > >Ideally, glusters write-behind would be smart enough to aggregate >smaller blocks of data into a large write. I think this would solve >a lot of the problem you''re having in your tests. > >Thanks > > >aghavendra G ><<mailto:raghavendra at zresearch.com><mailto:raghavendra at zresearch.com>raghavendra at zresearch.com> >wrote: >above afr with afr as a subvolume > >On Fri, Jan 23, 2009 at 12:59 AM, Evan ><_<mailto:Gluster at devnada.com><mailto:Gluster at devnada.com>Gluster at devnada.com> >wrote: >Where should I put the write-behind translator? >Just above afr with afr as a subvolume? Or should I put it just >above my localBrick volume and below afr? > > >Here is the output using /dev/zero: ># time dd if=/dev/zero of=/mnt/gluster/disktest count=1024 bs=1024 > >1024+0 records in >1024+0 records out >1048576 bytes (1.0 MB) copied, 1.90119 s, 552 kB/s > >real 0m2.098s >user 0m0.000s >sys 0m0.016s > ># time dd if=/dev/zero of=/tmp/disktest count=1024 bs=1024 > >1024+0 records in >1024+0 records out >1048576 bytes (1.0 MB) copied, 0.0195388 s, 53.7 MB/s > >real 0m0.026s >user 0m0.000s >sys 0m0.028s > > >Thanks > > >On Thu, Jan 22, 2009 at 12:52 PM, Anand Avati ><<mailto:avati at zresearch.com><mailto:avati at zresearch.com>avati at zresearch.com> >wrote: >Do you have write-behind loaded on the client side? For IO testing, >use /dev/zero instead of /dev/urandom. > >avati > >On Fri, Jan 23, 2009 at 2:14 AM, Evan ><_<mailto:Gluster at devnada.com><mailto:Gluster at devnada.com>Gluster at devnada.com> >wrote: > > I have a 2 node single process AFR setup with 1.544Mbps bandwidth between > > the 2 nodes. When I write a 1MB file to the gluster share it > seems to AFR to > > the other node in real time killing my disk IO speeds on the gluster mount > > point. Is there anyway to fix this? Ideally I would like to see near real > > disk IO speeds from/to the local gluster mount point and let the afr play > > catch up in the background as the bandwidth becomes available. > > > > Gluster Spec File (same on both nodes) > <<http://pastebin.com/m58dc49d4>http://pastebin.com/m58dc49d4>http://pastebin.com/m58dc49d4 > > > IO speed tests: > > # time dd if=/dev/urandom of=/mnt/gluster/disktest count=1024 bs=1024 > > 1024+0 records in > > 1024+0 records out > > 1048576 bytes (1.0 MB) copied, 8.34701 s, 126 kB/s > > > > real 0m8.547s > > user 0m0.000s > > sys 0m0.372s > > > > # time dd if=/dev/urandom of=/tmp/disktest count=1024 bs=1024 > > 1024+0 records in > > 1024+0 records out > > 1048576 bytes (1.0 MB) copied, 0.253865 s, 4.1 MB/s > > > > real 0m0.259s > > user 0m0.000s > > sys 0m0.284s > > > > > > Thanks > > > > _______________________________________________ > > Gluster-users mailing list > > > <mailto:Gluster-users at gluster.org><mailto:Gluster-users at gluster.org>Gluster-users at gluster.org > > > http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users > > > > > > > >_______________________________________________ >Gluster-users mailing list ><mailto:Gluster-users at gluster.org><mailto:Gluster-users at gluster.org>Gluster-users at gluster.org > >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users > > > > >-- >Raghavendra G > > >_______________________________________________ >Gluster-users mailing list ><mailto:Gluster-users at gluster.org>Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users > >
Ya I've had that set scene I started, any other suggested tweaks? Thanks On Fri, Jan 23, 2009 at 11:12 AM, Keith Freedman <freedman at freeformit.com>wrote:> dont forget to set : > option read-subvolume BRICKNAME > > in your AFR config, that'll improve your read performance significantly. > > > At 11:06 AM 1/23/2009, Evan wrote: > >> Ideal what I'm trying to accomplish is to have multiple samba servers >> connected with a 1.544Mbps pipe stay in sync with each other. So its >> important for me to be able to have near really disk access speeds when >> reading and writing to the local gluster Node in the AFR group but as far as >> getting the data propagated out to the other nodes I know the 1.544Mbps pipe >> can't keep up so I'll take the fastest I can get as long as it doesn't >> affect the local performance (which is what I am seeing). >> >> >> >> On Fri, Jan 23, 2009 at 10:56 AM, Keith Freedman <<mailto: >> freedman at freeformit.com>freedman at freeformit.com> wrote: >> At 10:18 AM 1/23/2009, Evan wrote: >> I added the following to the bottom of my spec file: >> >> volume writebehind >> type performance/write-behind >> option aggregate-size 10MB # default is 0bytes >> option flush-behind off # default is 'off' >> subvolumes afr >> end-volume >> >> which gives me the following results when making a 10MB file >> # time dd if=/dev/zero of=/tmp/disktest count=10240 bs=1024 >> 10240+0 records in >> 10240+0 records out >> 10485760 bytes (10 MB) copied, 0.173179 s, 60.5 MB/s >> >> real 0m0.183s >> user 0m0.000s >> sys 0m0.204s >> >> # time dd if=/dev/zero of=/mnt/gluster/disktest count=10240 bs=1024 >> 10240+0 records in >> 10240+0 records out >> 10485760 bytes (10 MB) copied, 5.50861 s, 1.9 MB/s >> >> real 0m5.720s >> user 0m0.000s >> sys 0m0.060s >> >> Although this is better than I had before is there anyway to have gluster >> write the data to the localBrick and then sync/afr in the background so I >> could expect to see something closer to the 60 MB/s I see when writing to >> the local disk directly? >> >> >> what you really want is a delayed replication. I've asked for this in >> this mailing list recently, and was told that it's something they've >> considered (more as a DR feature than an HA feature), but it's not currently >> on the list of priorities. >> >> The issue, as I see it, is if it's an HA feature, then you really need to >> insure that the data is replicated before you let your application think the >> data is written. If the replication was delayed, and the server went down, >> the data is lost forever. This is bad for HA. >> if it's a DR feature, then you're probably ok, because usually disaster >> recovery scenarios can probably withstand some data loss, and you're more >> interested in a point-in-time snapshot of the data. >> >> FUSE is a problem, and TCP/IP is a problem with much overhead and large >> blocksizes. >> >> Ideally, glusters write-behind would be smart enough to aggregate smaller >> blocks of data into a large write. I think this would solve a lot of the >> problem you're having in your tests. >> >> Thanks >> >> >> aghavendra G <<mailto:raghavendra at zresearch.com><mailto: >> raghavendra at zresearch.com>raghavendra at zresearch.com> wrote: >> above afr with afr as a subvolume >> >> On Fri, Jan 23, 2009 at 12:59 AM, Evan <_<mailto:Gluster at devnada.com >> ><mailto:Gluster at devnada.com>Gluster at devnada.com> wrote: >> Where should I put the write-behind translator? >> Just above afr with afr as a subvolume? Or should I put it just above my >> localBrick volume and below afr? >> >> >> Here is the output using /dev/zero: >> # time dd if=/dev/zero of=/mnt/gluster/disktest count=1024 bs=1024 >> >> 1024+0 records in >> 1024+0 records out >> 1048576 bytes (1.0 MB) copied, 1.90119 s, 552 kB/s >> >> real 0m2.098s >> user 0m0.000s >> sys 0m0.016s >> >> # time dd if=/dev/zero of=/tmp/disktest count=1024 bs=1024 >> >> 1024+0 records in >> 1024+0 records out >> 1048576 bytes (1.0 MB) copied, 0.0195388 s, 53.7 MB/s >> >> real 0m0.026s >> user 0m0.000s >> sys 0m0.028s >> >> >> Thanks >> >> >> On Thu, Jan 22, 2009 at 12:52 PM, Anand Avati <<mailto: >> avati at zresearch.com><mailto:avati at zresearch.com>avati at zresearch.com> >> wrote: >> Do you have write-behind loaded on the client side? For IO testing, >> use /dev/zero instead of /dev/urandom. >> >> avati >> >> On Fri, Jan 23, 2009 at 2:14 AM, Evan <_<mailto:Gluster at devnada.com >> ><mailto:Gluster at devnada.com>Gluster at devnada.com> wrote: >> > I have a 2 node single process AFR setup with 1.544Mbps bandwidth >> between >> > the 2 nodes. When I write a 1MB file to the gluster share it seems to >> AFR to >> > the other node in real time killing my disk IO speeds on the gluster >> mount >> > point. Is there anyway to fix this? Ideally I would like to see near >> real >> > disk IO speeds from/to the local gluster mount point and let the afr >> play >> > catch up in the background as the bandwidth becomes available. >> > >> > Gluster Spec File (same on both nodes) <<http://pastebin.com/m58dc49d4> >> http://pastebin.com/m58dc49d4>http://pastebin.com/m58dc49d4 >> >> > IO speed tests: >> > # time dd if=/dev/urandom of=/mnt/gluster/disktest count=1024 bs=1024 >> > 1024+0 records in >> > 1024+0 records out >> > 1048576 bytes (1.0 MB) copied, 8.34701 s, 126 kB/s >> > >> > real 0m8.547s >> > user 0m0.000s >> > sys 0m0.372s >> > >> > # time dd if=/dev/urandom of=/tmp/disktest count=1024 bs=1024 >> > 1024+0 records in >> > 1024+0 records out >> > 1048576 bytes (1.0 MB) copied, 0.253865 s, 4.1 MB/s >> > >> > real 0m0.259s >> > user 0m0.000s >> > sys 0m0.284s >> > >> > >> > Thanks >> > >> > _______________________________________________ >> > Gluster-users mailing list >> > <mailto:Gluster-users at gluster.org><mailto:Gluster-users at gluster.org> >> Gluster-users at gluster.org >> >> > http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users >> > >> > >> >> >> >> _______________________________________________ >> Gluster-users mailing list >> <mailto:Gluster-users at gluster.org><mailto:Gluster-users at gluster.org> >> Gluster-users at gluster.org >> >> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users >> >> >> >> >> -- >> Raghavendra G >> >> >> _______________________________________________ >> Gluster-users mailing list >> <mailto:Gluster-users at gluster.org>Gluster-users at gluster.org >> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users >> >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20090123/40beed59/attachment.html>