Hi, I have a file that changes slightly in size every day and has the timestamp appended to it.. for example on the 14th may: MybackedUpFileBlabla_200905140219.bak This is transferred by rsync to another server. The next day that file is deleted and substituted by a new file on the sender.. the new file would be named for example (15th May): MybackedUpFileBlabla_200905150221.bak The new file will be generally slightly larger in size, but the containing directory is exactly the same. I was hoping to use --fuzzy and --delete-after, but it doesn't seem to be speeding up the transfer. I am assuming that this is because I have both a change in name AND a change is size/modtime? I was looking into the find_fuzzy function, but i'm not sure if there's anything I can tweak in there to make this work. Thanks for any help Julian -------------- next part -------------- HTML attachment scrubbed and removed
Not sure if this got through to the list as I haven't received it back as usually happens...> Hi, > I have a file that changes slightly in size every day and has the timestamp > appended to it.. for example on the 14th may: > > MybackedUpFileBlabla_200905140219.bak > > This is transferred by rsync to another server. > The next day that file is deleted and substituted by a new file on the > sender.. the new file would be named for example (15th May): > > MybackedUpFileBlabla_200905150221.bak > > The new file will be generally slightly larger in size, but the containing > directory is exactly the same. > I was hoping to use --fuzzy and --delete-after, but it doesn't seem to be > speeding up the transfer. I am assuming that this is because I have both a > change in name AND a change is size/modtime? > > I was looking into the find_fuzzy function, but i'm not sure if there's > anything I can tweak in there to make this work. > > Thanks for any help > Julian > >-------------- next part -------------- HTML attachment scrubbed and removed
On Thu, May 14, 2009 at 4:10 AM, Julian Pace Ross <linux@prisma.com.mt> wrote:> Hi, > I have a file that changes slightly in size every day and has the timestamp > appended to it.. for example on the 14th may: > MybackedUpFileBlabla_200905140219.bak > This is transferred by rsync to another server. > The next day that file is deleted and substituted by a new file on the > sender.. the new file would be named for example (15th May): > MybackedUpFileBlabla_200905150221.bak > The new file will be generally slightly larger in size, but the containing > directory is exactly the same. > I was hoping to use --fuzzy and --delete-after, but it doesn't seem to be > speeding up the transfer. I am assuming that this is because I have both a > change in name AND a change is size/modtime? > I was looking into the find_fuzzy function, but i'm not sure if there's > anything I can tweak in there to make this work.I am using rsync for the exact same purpose, with very similar file names and it seems to work just fine on 3.0.5 running on both Linux and Windows (cwrsync). Some possible causes I've encountered: o The source files are compressed or encrypted, which will prevent sync from matching any blocks. gzip includes a special "rsync-friendly" compression mode, but all other popular forms of compression prevent rsync from finding matches. o The source files are very large, and the default rsync block size for large files prevents matches from being found. You can try forcing a smaller block size (trading CPU time for bandwidth). o The source files are some sort of indexed database files. (SQL Server uses a .bak extension) If you rebuild or refresh database indexes between your backups, this actually changes every page of the database, preventing rsync from finding matches. Also, if you use indexes on non-sequential clustering indexes, even small amounts of data change can result in updates to nearly every database page. -- RPM
On Wed, May 20, 2009 at 2:26 AM, Julian Pace Ross <linux@prisma.com.mt> wrote:> Thanks Ryan! > In fact I found it's a combination of factors you mentioned... i.e. a > compressed SQL .bak file, so contrary to what I thought, the fuzzy file was > indeed being found but no matches were being found in the file... thanks > again for the info.If you have the disk space at both ends, I would suggest doing what I do for SQL backup synchronization. 1) Write *uncompressed* .bak files for your databases (with timestamps in the file name, such as those produced by the database maintenance plan engine). This enables the use of --fuzzy, as you have discovered. 2) use Rsync to transfer the uncompressed files, but with the -z option enbaled. This compresses the data over the wire, but decompresses it at the receiving end. 3) Adjust the rsync block size to something smaller if necessary to find more matches. I basically went down to 32KB rsync blocks for one 15 GB database file (rsync would by default use something like 129KB on a file this big). This eats up a lot more CPU, but if irsync can still output data faster than your network connection can handle, it is the most time-efficient way to go. Use multiples of 8KB, as that is the internal page size inherent in MS SQL Server databases. Trial and error is your friend here. Run rsyc with low priority (START /LOW rsync.exe) so the CPU usage doesn't impact SQL Server. 4) Minimize any jobs you have to automatically rebuild indexes. Use UPDATE STATISTICS instead on a daily basis, and rebuild only when index fragmentation gets heavy. There are lots of scripts out there on the net which will automate that for you. 5) Minimize the rebuilds of denormalized "reporting" tables or other non-essential data. Move these off into other databases that you don't replicate if possible. 6) Watch out for non-sequential clustered indexes. We use GUIDs for primary keys on many tables, and this causes updates and inserts to be spread randomly throughout the table as it is physically stored. Even channging just 5% of the data can result in a change to every database page in such a scenario). Hot tables which use emails or other VARCHAR fields as clustered index keys also result in similar behavior. Most of these suggestions would apply for rsyncing any sort of database backup file... Exchange, PostgreSQL, Oracle, or even (horror!) MySQL. -- RPM