Greetings, I would like to write a patch for rsync but need some help getting started. Here is my situation. I am using cwrsync to copy files from one Windows server to another Windows server. One file that I need to backup is 130 GB. The daily changes occur all throughout the file, not just at the end of the file. File names look like this: Db_20080402_0003_DB.BAK Db_20080403_0003_DB.BAK Therefore, I can use the .fuzzy switch as a basis for rsync. I want to take this one step farther. What I would like to accomplish is the merging of the .preallocate patch and the .fuzzy option. When these 2 switches are used together (at least on Windows platforms), I want rsync to determine which fuzzy file to use (from the fuzzy_distance() function), but then go ahead and preallocate the new file, not with posix_fallocate(), but with the contents of the file matched from fuzzy_distance(). This would keep the destination file from being severely fragmented. What I see happening is rsync doing a local file copy first, but not block by block. When you copy a file on Windows, with the copy command, the entire file (size) is preallocated so that no fragmentation occurs. After this step, the copy command performs the actual transfer of data from the source file to the destination file. You could think of this almost as a local, non-fragmenting file copy operation, and then the rsync algorithm is used to update the new, destination file. How hard would it be to write this patch? I don.t mind doing the coding, but would love to hear some strategies on how to accomplish this goal. Congratulations on getting version 3 released! Thanks, -John Taylor
On Thu, 2008-04-03 at 15:49 -0400, John Taylor wrote:> What I would like to accomplish is the merging of the .preallocate patch > and the .fuzzy option. When these 2 switches are used together (at least > on Windows platforms), I want rsync to determine which fuzzy file to use > (from the fuzzy_distance() function), but then go ahead and preallocate > the new file, not with posix_fallocate(), but with the contents of the > file matched from fuzzy_distance(). This would keep the destination > file from being severely fragmented. > > What I see happening is rsync doing a local file copy first, but not > block by block. When you copy a file on Windows, with the copy command, > the entire file (size) is preallocated so that no fragmentation occurs. > After this step, the copy command performs the actual transfer of data > from the source file to the destination file. You could think of this > almost as a local, non-fragmenting file copy operation, and then the > rsync algorithm is used to update the new, destination file.Do I understand correctly: you want rsync to copy the fuzzy basis file to the new destination file (either by reading and writing or with some special Windows system call) and then proceed with the file transfer? I don't see how the extra step would reduce fragmentation compared to rsync's current technique. Rsync currently preallocates the new destination file to the length of the source file using posix_fallocate, which (IIRC) maps to Windows's SetEndOfFile, and then fills in the data. This is pretty much the same as what the Windows copy command does, and Rob Bosch has found that it mostly eliminates fragmentation. Matt -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.samba.org/archive/rsync/attachments/20080403/de3398ed/attachment.bin