thr3ads.net - rsync - combining --preallocate and --fuzzy [Apr 2008]

If this information is useful, please help other people find it:
Share via:

John Taylor

2008-Apr-03 19:57 UTC

combining --preallocate and --fuzzy

Greetings,

I would like to write a patch for rsync but need some help getting
started.  Here is my situation.  I am using cwrsync to copy files from
one Windows server to another Windows server.  One file that I need
to backup is 130 GB.  The daily changes occur all throughout the file,
not just at the end of the file.  File names look like this:

Db_20080402_0003_DB.BAK
Db_20080403_0003_DB.BAK

Therefore, I can use the .fuzzy switch as a basis for rsync.  I want to
take this one step farther.

What I would like to accomplish is the merging of the .preallocate patch
and the .fuzzy option.  When these 2 switches are used together (at least
on Windows platforms), I want rsync to determine which fuzzy file to use
(from the fuzzy_distance() function), but then go ahead and preallocate
the new file, not with posix_fallocate(), but with the contents of the
file matched from fuzzy_distance().  This would keep the destination
file from being severely fragmented.

What I see happening is rsync doing a local file copy first, but not
block by block.  When you copy a file on Windows, with the copy command,
the entire file (size) is preallocated so that no fragmentation occurs.
After this step, the copy command  performs the actual transfer of data
from the source file to the destination file.  You could think of this
almost as a local, non-fragmenting file copy operation, and then the
rsync algorithm is used to update the new, destination file.

How hard would it be to write this patch?  I don.t mind doing the coding,
but would love to hear some strategies on how to accomplish this goal.
Congratulations on getting version 3 released!

Thanks,
-John Taylor

Matt McCutchen

2008-Apr-04 00:55 UTC

head link

combining --preallocate and --fuzzy

On Thu, 2008-04-03 at 15:49 -0400, John Taylor wrote:> What I would like to accomplish is the merging of the .preallocate patch
> and the .fuzzy option.  When these 2 switches are used together (at least
> on Windows platforms), I want rsync to determine which fuzzy file to use
> (from the fuzzy_distance() function), but then go ahead and preallocate
> the new file, not with posix_fallocate(), but with the contents of the
> file matched from fuzzy_distance().  This would keep the destination
> file from being severely fragmented.
> 
> What I see happening is rsync doing a local file copy first, but not
> block by block.  When you copy a file on Windows, with the copy command,
> the entire file (size) is preallocated so that no fragmentation occurs.
> After this step, the copy command  performs the actual transfer of data
> from the source file to the destination file.  You could think of this
> almost as a local, non-fragmenting file copy operation, and then the
> rsync algorithm is used to update the new, destination file.
Do I understand correctly: you want rsync to copy the fuzzy basis file
to the new destination file (either by reading and writing or with some
special Windows system call) and then proceed with the file transfer?  I
don't see how the extra step would reduce fragmentation compared to
rsync's current technique.  Rsync currently preallocates the new
destination file to the length of the source file using posix_fallocate,
which (IIRC) maps to Windows's SetEndOfFile, and then fills in the data.
This is pretty much the same as what the Windows copy command does, and
Rob Bosch has found that it mostly eliminates fragmentation.

Matt
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url :
http://lists.samba.org/archive/rsync/attachments/20080403/de3398ed/attachment.bin

Apparently Analagous Threads

Search for more possibly parallel threads

rsync - Apr 2008 - combining --preallocate and --fuzzy

combining --preallocate and --fuzzy

combining --preallocate and --fuzzy

Apparently Analagous Threads