Delian Krustev
2019-Feb-13 15:47 UTC
rsync rewrites all blocks of large files although it uses delta transfer
Hi All, For a backup purpose I'm trying to transfer only the changed blocks of large files. Thus I've run "rsync" with the appropriate options: RSYNC_BKPDIR=`mktemp -d` rsync \ --archive \ --no-whole-file \ --inplace \ --backup \ --backup-dir="$RSYNC_BKPDIR" \ --verbose \ --stats \ /var/backups/mysql-dbs/. \ /mnt/bkp/var/backups/mysql-dbs/. The problem is that although "rsync" shows that delta transfer is used(when run with -vv) and only small amount if data is transferred, the target files look to be overwritten in full. Here is the output of "rsync" and some more debugging info: #################################################### sending incremental file list ./ horde.data.sql horde.schema.sql LARGEDB.data.sql LARGEDB.schema.sql mysql.data.sql mysql.schema.sql phpmyadmin.data.sql phpmyadmin.schema.sql Number of files: 9 (reg: 8, dir: 1) Number of created files: 0 Number of deleted files: 0 Number of regular files transferred: 8 Total file size: 1,944,522,704 bytes Total transferred file size: 1,944,522,704 bytes Literal data: 21,421,681 bytes Matched data: 1,923,101,023 bytes File list size: 0 File list generation time: 0.001 seconds File list transfer time: 0.000 seconds Total bytes sent: 21,612,218 Total bytes received: 323,302 sent 21,612,218 bytes received 323,302 bytes 259,591.95 bytes/sec total size is 1,944,522,704 speedup is 88.65 # du -m 1.9G /tmp/tmp.8gBzjNQOQZ 1.9G /tmp/tmp.8gBzjNQOQZ # tree -a /tmp/tmp.8gBzjNQOQZ /tmp/tmp.8gBzjNQOQZ ├── horde.data.sql ├── horde.schema.sql ├── LARGEDB.data.sql ├── LARGEDB.schema.sql ├── mysql.data.sql ├── mysql.schema.sql ├── phpmyadmin.data.sql └── phpmyadmin.schema.sql 0 directories, 8 files Free space at the beginning and end of the backup: Filesystem 1M-blocks Used Available Use% Mounted on /dev/mapper/bkp 102392 76872 20400 80% /mnt/bkp /dev/mapper/bkp 102392 78768 18504 81% /mnt/bkp #################################################### As can be seen "rsync" has sent about 20M and received 300K of data. However the filesystem has allocated almost 2G, which is the total size of the files being backed up. The filesystem mounted on "/mnt/bkp" is of type "nilfs2", which is a log structured filesystem. I'm using its snapshotting feature to keep backups for past dates. Is there anything that can be done in order "rsync" to overwrite only the changed blocks ? P.S. I guess that it will be the same for copy-on-write filesystems, e.g. BTRFS or ZFS. Cheers -- Delian
Kevin Korb
2019-Feb-13 16:29 UTC
rsync rewrites all blocks of large files although it uses delta transfer
With --backup in order to end up with 2 files it has to write out a whole new file. Sure, it only sent the differences (normally that means over the network but there is no network here) but the writing end was told to duplicate the file being updated before updating it. On 2/13/19 10:47 AM, Delian Krustev via rsync wrote:> Hi All, > > For a backup purpose I'm trying to transfer only the changed blocks of > large files. Thus I've run "rsync" with the appropriate options: > > RSYNC_BKPDIR=`mktemp -d` > rsync \ > --archive \ > --no-whole-file \ > --inplace \ > --backup \ > --backup-dir="$RSYNC_BKPDIR" \ > --verbose \ > --stats \ > /var/backups/mysql-dbs/. \ > /mnt/bkp/var/backups/mysql-dbs/. > > The problem is that although "rsync" shows that delta transfer is used(when > run with -vv) and only small amount if data is transferred, the target files > look to be overwritten in full. > > Here is the output of "rsync" and some more debugging info: > > > #################################################### > sending incremental file list > ./ > horde.data.sql > horde.schema.sql > LARGEDB.data.sql > LARGEDB.schema.sql > mysql.data.sql > mysql.schema.sql > phpmyadmin.data.sql > phpmyadmin.schema.sql > > Number of files: 9 (reg: 8, dir: 1) > Number of created files: 0 > Number of deleted files: 0 > Number of regular files transferred: 8 > Total file size: 1,944,522,704 bytes > Total transferred file size: 1,944,522,704 bytes > Literal data: 21,421,681 bytes > Matched data: 1,923,101,023 bytes > File list size: 0 > File list generation time: 0.001 seconds > File list transfer time: 0.000 seconds > Total bytes sent: 21,612,218 > Total bytes received: 323,302 > > sent 21,612,218 bytes received 323,302 bytes 259,591.95 bytes/sec > total size is 1,944,522,704 speedup is 88.65 > > # du -m 1.9G /tmp/tmp.8gBzjNQOQZ > 1.9G /tmp/tmp.8gBzjNQOQZ > > # tree -a /tmp/tmp.8gBzjNQOQZ > /tmp/tmp.8gBzjNQOQZ > ├── horde.data.sql > ├── horde.schema.sql > ├── LARGEDB.data.sql > ├── LARGEDB.schema.sql > ├── mysql.data.sql > ├── mysql.schema.sql > ├── phpmyadmin.data.sql > └── phpmyadmin.schema.sql > > 0 directories, 8 files > > Free space at the beginning and end of the backup: > Filesystem 1M-blocks Used Available Use% Mounted on > /dev/mapper/bkp 102392 76872 20400 80% /mnt/bkp > /dev/mapper/bkp 102392 78768 18504 81% /mnt/bkp > > #################################################### > > As can be seen "rsync" has sent about 20M and received 300K of data. However > the filesystem has allocated almost 2G, which is the total size of the files > being backed up. > > The filesystem mounted on "/mnt/bkp" is of type "nilfs2", which is a log > structured filesystem. I'm using its snapshotting feature to keep backups for > past dates. > > > Is there anything that can be done in order "rsync" to overwrite only the > changed blocks ? > > > > > P.S. I guess that it will be the same for copy-on-write filesystems, e.g. > BTRFS or ZFS. > > > > Cheers > -- > Delian >-- ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._., Kevin Korb Phone: (407) 252-6853 Systems Administrator Internet: FutureQuest, Inc. Kevin at FutureQuest.net (work) Orlando, Florida kmk at sanitarium.net (personal) Web page: https://sanitarium.net/ PGP public key available on web site. ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._., -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 195 bytes Desc: OpenPGP digital signature URL: <http://lists.samba.org/pipermail/rsync/attachments/20190213/71d497f9/signature.sig>
Delian Krustev
2019-Feb-13 22:26 UTC
rsync rewrites all blocks of large files although it uses delta transfer
On Wednesday, February 13, 2019 11:29:44 AM EET Kevin Korb via rsync <rsync at lists.samba.org> wrote:> With --backup in order to end up with 2 files it has to write out a > whole new file. > Sure, it only sent the differences (normally that means > over the network but there is no network here) but the writing end was > told to duplicate the file being updated before updating it.The copy is needed for the comparison of the blocks as "--inplace" overwrites the destination file. I've tried without "--backup" but then the delta transfers too much data - close to the size of the backed-up files. The copy is in a temp file system which is discarded after the backup (by "rm -rf"). This temp filesystem is not log structured or copy-on-write so having a copy there is not a big problem. Although I don't want a backup of all files which are modified but rather a TMPDIR. The ideal workflow would be to compare SRC and DST and write changed blocks to the TMPDIR, then read them from TMPDIR and apply it to DST. Cheers -- Delian
Remi Gauvin
2019-Feb-13 23:20 UTC
rsync rewrites all blocks of large files although it uses delta transfer
On 2019-02-13 10:47 a.m., Delian Krustev via rsync wrote:> > > Free space at the beginning and end of the backup: > Filesystem 1M-blocks Used Available Use% Mounted on > /dev/mapper/bkp 102392 76872 20400 80% /mnt/bkp > /dev/mapper/bkp 102392 78768 18504 81% /mnt/bkp > > #################################################### > > As can be seen "rsync" has sent about 20M and received 300K of data. However > the filesystem has allocated almost 2G, which is the total size of the files > being backed up. > > The filesystem mounted on "/mnt/bkp" is of type "nilfs2", which is a log > structured filesystem. I'm using its snapshotting feature to keep backups for > past dates.Have you run the nifs-clean before checking this free space comparison? Maybe there is just large amplification created by Rsyn's many small writes when using --inplace. -------------- next part -------------- A non-text attachment was scrubbed... Name: remi.vcf Type: text/x-vcard Size: 193 bytes Desc: not available URL: <http://lists.samba.org/pipermail/rsync/attachments/20190213/55ce47fe/remi.vcf> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 473 bytes Desc: OpenPGP digital signature URL: <http://lists.samba.org/pipermail/rsync/attachments/20190213/55ce47fe/signature.sig>
Delian Krustev
2019-Feb-13 23:56 UTC
rsync rewrites all blocks of large files although it uses delta transfer
On Wednesday, February 13, 2019 6:20:13 PM EET Remi Gauvin via rsync <rsync at lists.samba.org> wrote:> Have you run the nifs-clean before checking this free space comparison? > Maybe there is just large amplification created by Rsyn's many small > writes when using --inplace.nilfs-clean is being suspended for the time of the backup. It would have idled if the fullness threshold of the FS (90% by default) have not been reached. The problem is probably that these mysqldump files have changed data near the beginning of the files. Thus any later blocks have to be overwritten. In order to avoid this "rsync" would have to allocate and deallocate space in the middle of the file: http://man7.org/linux/man-pages/man2/fallocate.2.html and unfortunately the respective syscalls are not portable, quite new and filesystem specific. Would have been nice to have these for all OSes and filesystems though. And better yet not aligned on FS block size. E.g.: - give me 5 new blocks in the middle of file F starting at POS - do not use the entire last block of these 5 but rather only X bytes of it. or - replace block 5 with "this" partial block data - truncate blocks 6 to 20 I can find a usage for them in many application workflows - from text editors trough databases to backup software .. Cheers -- Delian
Maybe Matching Threads
- rsync rewrites all blocks of large files although it uses delta transfer
- rsync rewrites all blocks of large files although it uses delta transfer
- rsync rewrites all blocks of large files although it uses delta transfer
- rsync rewrites all blocks of large files although it uses delta transfer
- Still having trouble with copying large files