Joe Landman
2011-May-02 22:08 UTC
[Gluster-users] Hopefully answering some mirroring questions asked here and offline
Hi folks
We've fielded a number of mirroring questions offline as well as
watched/participated in discussions here. I thought it was important to
make sure some of these are answered and searchable on the lists.
One major question that kept arising was as follows:
q: If I have a large image file (say a VM vmdk/other format) on a
mirrored volume, will one small change of a few bytes result in a resync
of the entire file?
a: No.
To test this, we created a 20GB file on a mirror volume.
root at metal:/local2/home/landman# ls -alF /mirror1gfs/big.file
-rw-r--r-- 1 root root 21474836490 2011-05-02 12:44 /mirror1gfs/big.file
Then using the following quick and dirty Perl, we appended about 10-20
bytes to the file.
#!/usr/bin/env perl
my $file=shift;
my $fh;
open($fh,">>".$file);
print $fh "end ".$$."\n";
close($fh);
root at metal:/local2/home/landman# ./app.pl /mirror1gfs/big.file
then I had to write a quick and dirty tail replacement, as I've
discovered that tail doesn't seek ... (yeah, it started reading every
'line' of that file ...)
#!/usr/bin/env perl
my $file=shift;
my $fh;
my $buf;
open($fh,"<".$file);
seek $fh,-200,2;
read $fh,$buf,200;
printf "buffer: \'%s\'\n",$buf;
close($fh);
root at metal:/local2/home/landman# ./tail.pl /mirror1gfs/big.file
buffer: 'end 19362'
While running the app.pl, I did not see any massive resyncs. I had
dstat running in another window.
You might say, that this is irrelevant, as we only appended, and that
could be special cased.
So I wrote a random updater, that updated at random spots throughtout
the large file (sorta like a VM vmdk and other files).
#!/usr/bin/env perl
my $file=shift;
my $fh;
my $buf;
my @stat;
my $loc;
@stat = stat($file);
$loc = int(rand($stat[7]));
open($fh,">>+".$file);
seek $fh,$loc,0;
printf $fh "I was here!!!";
printf "loc: %i\n",$loc;
close($fh);
root at metal:/local2/home/landman# ./randupd.pl /mirror1gfs/big.file
loc: 17598205436
root at metal:/local2/home/landman# ./randupd.pl /mirror1gfs/big.file
loc: 16468787891
root at metal:/local2/home/landman# ./randupd.pl /mirror1gfs/big.file
loc: 9271612568
root at metal:/local2/home/landman# ./randupd.pl /mirror1gfs/big.file
loc: 1356667302
root at metal:/local2/home/landman# ./randupd.pl /mirror1gfs/big.file
loc: 12365324308
root at metal:/local2/home/landman# ./randupd.pl /mirror1gfs/big.file
loc: 15654714313
root at metal:/local2/home/landman# ./randupd.pl /mirror1gfs/big.file
loc: 10127739152
root at metal:/local2/home/landman# ./randupd.pl /mirror1gfs/big.file
loc: 10259920623
and again, no massive resyncs.
So I think its fairly safe to say that the concern over massive resyncs
for small updates is not something we see in the field.
Regards,
Joe
--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman at scalableinformatics.com
web : http://scalableinformatics.com
http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax : +1 866 888 3112
cell : +1 734 612 4615
Anand Avati
2011-May-03 05:06 UTC
[Gluster-users] Hopefully answering some mirroring questions asked here and offline
Thanks for the post Joe. We introduced the "diff" based self heal algorithm in 3.1 release. Avati On Tue, May 3, 2011 at 3:38 AM, Joe Landman <landman at scalableinformatics.com> wrote:> Hi folks > > We've fielded a number of mirroring questions offline as well as > watched/participated in discussions here. I thought it was important to > make sure some of these are answered and searchable on the lists. > > One major question that kept arising was as follows: > > q: If I have a large image file (say a VM vmdk/other format) on a mirrored > volume, will one small change of a few bytes result in a resync of the > entire file? > > a: No. > > To test this, we created a 20GB file on a mirror volume. > > root at metal:/local2/home/landman# ls -alF /mirror1gfs/big.file > -rw-r--r-- 1 root root 21474836490 2011-05-02 12:44 /mirror1gfs/big.file > > Then using the following quick and dirty Perl, we appended about 10-20 > bytes to the file. > > #!/usr/bin/env perl > > my $file=shift; > my $fh; > open($fh,">>".$file); > print $fh "end ".$$."\n"; > close($fh); > > > root at metal:/local2/home/landman# ./app.pl /mirror1gfs/big.file > > then I had to write a quick and dirty tail replacement, as I've discovered > that tail doesn't seek ... (yeah, it started reading every 'line' of that > file ...) > > #!/usr/bin/env perl > > my $file=shift; > my $fh; > my $buf; > > open($fh,"<".$file); > seek $fh,-200,2; > read $fh,$buf,200; > printf "buffer: \'%s\'\n",$buf; > close($fh); > > > root at metal:/local2/home/landman# ./tail.pl /mirror1gfs/big.file > buffer: 'end 19362' > > While running the app.pl, I did not see any massive resyncs. I had dstat > running in another window. > > You might say, that this is irrelevant, as we only appended, and that could > be special cased. > > So I wrote a random updater, that updated at random spots throughtout the > large file (sorta like a VM vmdk and other files). > > > #!/usr/bin/env perl > > my $file=shift; > my $fh; > my $buf; > my @stat; > my $loc; > > @stat = stat($file); > $loc = int(rand($stat[7])); > open($fh,">>+".$file); > seek $fh,$loc,0; > printf $fh "I was here!!!"; > printf "loc: %i\n",$loc; > close($fh); > > root at metal:/local2/home/landman# ./randupd.pl /mirror1gfs/big.file > loc: 17598205436 > root at metal:/local2/home/landman# ./randupd.pl /mirror1gfs/big.file > loc: 16468787891 > root at metal:/local2/home/landman# ./randupd.pl /mirror1gfs/big.file > loc: 9271612568 > root at metal:/local2/home/landman# ./randupd.pl /mirror1gfs/big.file > loc: 1356667302 > root at metal:/local2/home/landman# ./randupd.pl /mirror1gfs/big.file > loc: 12365324308 > root at metal:/local2/home/landman# ./randupd.pl /mirror1gfs/big.file > loc: 15654714313 > root at metal:/local2/home/landman# ./randupd.pl /mirror1gfs/big.file > loc: 10127739152 > root at metal:/local2/home/landman# ./randupd.pl /mirror1gfs/big.file > loc: 10259920623 > > and again, no massive resyncs. > > So I think its fairly safe to say that the concern over massive resyncs for > small updates is not something we see in the field. > > Regards, > > Joe > > -- > Joseph Landman, Ph.D > Founder and CEO > Scalable Informatics Inc. > email: landman at scalableinformatics.com > web : http://scalableinformatics.com > http://scalableinformatics.com/sicluster > phone: +1 734 786 8423 x121 > fax : +1 866 888 3112 > cell : +1 734 612 4615 > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20110503/f688a942/attachment.html>