Joe Landman
2011-May-02 22:08 UTC
[Gluster-users] Hopefully answering some mirroring questions asked here and offline
Hi folks We've fielded a number of mirroring questions offline as well as watched/participated in discussions here. I thought it was important to make sure some of these are answered and searchable on the lists. One major question that kept arising was as follows: q: If I have a large image file (say a VM vmdk/other format) on a mirrored volume, will one small change of a few bytes result in a resync of the entire file? a: No. To test this, we created a 20GB file on a mirror volume. root at metal:/local2/home/landman# ls -alF /mirror1gfs/big.file -rw-r--r-- 1 root root 21474836490 2011-05-02 12:44 /mirror1gfs/big.file Then using the following quick and dirty Perl, we appended about 10-20 bytes to the file. #!/usr/bin/env perl my $file=shift; my $fh; open($fh,">>".$file); print $fh "end ".$$."\n"; close($fh); root at metal:/local2/home/landman# ./app.pl /mirror1gfs/big.file then I had to write a quick and dirty tail replacement, as I've discovered that tail doesn't seek ... (yeah, it started reading every 'line' of that file ...) #!/usr/bin/env perl my $file=shift; my $fh; my $buf; open($fh,"<".$file); seek $fh,-200,2; read $fh,$buf,200; printf "buffer: \'%s\'\n",$buf; close($fh); root at metal:/local2/home/landman# ./tail.pl /mirror1gfs/big.file buffer: 'end 19362' While running the app.pl, I did not see any massive resyncs. I had dstat running in another window. You might say, that this is irrelevant, as we only appended, and that could be special cased. So I wrote a random updater, that updated at random spots throughtout the large file (sorta like a VM vmdk and other files). #!/usr/bin/env perl my $file=shift; my $fh; my $buf; my @stat; my $loc; @stat = stat($file); $loc = int(rand($stat[7])); open($fh,">>+".$file); seek $fh,$loc,0; printf $fh "I was here!!!"; printf "loc: %i\n",$loc; close($fh); root at metal:/local2/home/landman# ./randupd.pl /mirror1gfs/big.file loc: 17598205436 root at metal:/local2/home/landman# ./randupd.pl /mirror1gfs/big.file loc: 16468787891 root at metal:/local2/home/landman# ./randupd.pl /mirror1gfs/big.file loc: 9271612568 root at metal:/local2/home/landman# ./randupd.pl /mirror1gfs/big.file loc: 1356667302 root at metal:/local2/home/landman# ./randupd.pl /mirror1gfs/big.file loc: 12365324308 root at metal:/local2/home/landman# ./randupd.pl /mirror1gfs/big.file loc: 15654714313 root at metal:/local2/home/landman# ./randupd.pl /mirror1gfs/big.file loc: 10127739152 root at metal:/local2/home/landman# ./randupd.pl /mirror1gfs/big.file loc: 10259920623 and again, no massive resyncs. So I think its fairly safe to say that the concern over massive resyncs for small updates is not something we see in the field. Regards, Joe -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615
Anand Avati
2011-May-03 05:06 UTC
[Gluster-users] Hopefully answering some mirroring questions asked here and offline
Thanks for the post Joe. We introduced the "diff" based self heal algorithm in 3.1 release. Avati On Tue, May 3, 2011 at 3:38 AM, Joe Landman <landman at scalableinformatics.com> wrote:> Hi folks > > We've fielded a number of mirroring questions offline as well as > watched/participated in discussions here. I thought it was important to > make sure some of these are answered and searchable on the lists. > > One major question that kept arising was as follows: > > q: If I have a large image file (say a VM vmdk/other format) on a mirrored > volume, will one small change of a few bytes result in a resync of the > entire file? > > a: No. > > To test this, we created a 20GB file on a mirror volume. > > root at metal:/local2/home/landman# ls -alF /mirror1gfs/big.file > -rw-r--r-- 1 root root 21474836490 2011-05-02 12:44 /mirror1gfs/big.file > > Then using the following quick and dirty Perl, we appended about 10-20 > bytes to the file. > > #!/usr/bin/env perl > > my $file=shift; > my $fh; > open($fh,">>".$file); > print $fh "end ".$$."\n"; > close($fh); > > > root at metal:/local2/home/landman# ./app.pl /mirror1gfs/big.file > > then I had to write a quick and dirty tail replacement, as I've discovered > that tail doesn't seek ... (yeah, it started reading every 'line' of that > file ...) > > #!/usr/bin/env perl > > my $file=shift; > my $fh; > my $buf; > > open($fh,"<".$file); > seek $fh,-200,2; > read $fh,$buf,200; > printf "buffer: \'%s\'\n",$buf; > close($fh); > > > root at metal:/local2/home/landman# ./tail.pl /mirror1gfs/big.file > buffer: 'end 19362' > > While running the app.pl, I did not see any massive resyncs. I had dstat > running in another window. > > You might say, that this is irrelevant, as we only appended, and that could > be special cased. > > So I wrote a random updater, that updated at random spots throughtout the > large file (sorta like a VM vmdk and other files). > > > #!/usr/bin/env perl > > my $file=shift; > my $fh; > my $buf; > my @stat; > my $loc; > > @stat = stat($file); > $loc = int(rand($stat[7])); > open($fh,">>+".$file); > seek $fh,$loc,0; > printf $fh "I was here!!!"; > printf "loc: %i\n",$loc; > close($fh); > > root at metal:/local2/home/landman# ./randupd.pl /mirror1gfs/big.file > loc: 17598205436 > root at metal:/local2/home/landman# ./randupd.pl /mirror1gfs/big.file > loc: 16468787891 > root at metal:/local2/home/landman# ./randupd.pl /mirror1gfs/big.file > loc: 9271612568 > root at metal:/local2/home/landman# ./randupd.pl /mirror1gfs/big.file > loc: 1356667302 > root at metal:/local2/home/landman# ./randupd.pl /mirror1gfs/big.file > loc: 12365324308 > root at metal:/local2/home/landman# ./randupd.pl /mirror1gfs/big.file > loc: 15654714313 > root at metal:/local2/home/landman# ./randupd.pl /mirror1gfs/big.file > loc: 10127739152 > root at metal:/local2/home/landman# ./randupd.pl /mirror1gfs/big.file > loc: 10259920623 > > and again, no massive resyncs. > > So I think its fairly safe to say that the concern over massive resyncs for > small updates is not something we see in the field. > > Regards, > > Joe > > -- > Joseph Landman, Ph.D > Founder and CEO > Scalable Informatics Inc. > email: landman at scalableinformatics.com > web : http://scalableinformatics.com > http://scalableinformatics.com/sicluster > phone: +1 734 786 8423 x121 > fax : +1 866 888 3112 > cell : +1 734 612 4615 > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20110503/f688a942/attachment.html>