Robert Bell
2013-Jan-15 03:45 UTC
rsync - using a --files-from list to cut out scanning. How to handle deletions?
Folks, We use rsync extensively for protecting data by making backups. Thank you to the authors and maintainers. Like many others, we use the --link-dest option to cut down on the space occupied by the backups. Unlike many others, we re-cycle old backup directories. Since most file systems change only slowly (ours average about 0.5% of files and about 1.5% of data being churned each day), a recycled directory is a good start for the next backup. Our most common case is that a directory from 5 days ago becomes the target for the current backup, with the yesterday's backup being provided by a --link-dest= setting. Since the source file system changes only slowly, I have been thinking about ways to speed up the backups in the future. One way is to have the backups deal only with files that have changed on the source since the last backup. This would save having to scan the whole source and destination areas each time a backup is done. The Linux inotify capability looks like it might be useful for collecting a list of changed files. Has anyone done this? However, there is one case that I have not been able to get to work in a test of rsync. This is the case where a file exists in the destination, does not exist in the source, but is named in the --files-from= list. This would be the case if a file had been deleted from the source. We would want rsync in this case to delete the file on the destination. However, with a test command like: rsync -a -i --delete --files-from=list --link-dest=../linked source/ dest I was unable to get rsync to delete on the destination a file which did not exist in the source but was named in the list. rsync baulked at a file being listed that was not in the source. For example: rsync: link_stat "/data/flush/inter/bel107.80527/source/0yyy" failed: No such file or directory (2) [The test file 0yyy existed in the destination, the link-dest area and in the list, but not in the source.] Thanks to those who have read down to here. :-) Regards Rob. Bell e-mail: Robert.Bell at csiro.au -- Dr Robert C. Bell, BSc (Hons) PhD Technical Services Manager Advanced Scientific Computing CSIRO IM&T Phone: +61 3 9669 8102 | Mobile: +61 428 108 333 | CSIRO 93 3810 Robert.Bell at csiro.au | http://www.csiro.au/ | http://www.hpsc.csiro.au/ Addresses: Street: CSIRO ASC Level 11, 700 Collins Street, Docklands Vic 3008, Australia Postal: CSIRO ASC Level 11, GPO Box 1289, Melbourne Vic 3001, Australia PLEASE NOTE The information contained in this email may be confidential or privileged. Any unauthorised use or disclosure is prohibited. If you have received this email in error, please delete it immediately and notify the sender by return email. Thank you. To the extent permitted by law, CSIRO does not represent, warrant and/or guarantee that the integrity of this communication has been maintained or that the communication is free of errors, virus, interception or interference. Please consider the environment before printing this email.
Kevin Korb
2013-Jan-15 14:25 UTC
rsync - using a --files-from list to cut out scanning. How to handle deletions?
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 If you are going to do it this way please be aware of: https://bugzilla.samba.org/show_bug.cgi?id=8712 and https://bugzilla.samba.org/show_bug.cgi?id=5644 If a file exists in the target directory when using --link-dest rsync modifies the link rather than replacing it which means you don't have history for files that have been replaced rather than added or deleted. If you are dealing with backing up many millions of files then I suggest looking into a more advanced filesystem that can handle this functionality internally rather than using --link-dest. Currently that is limited to ZFS or BTRFS (if you are brave). Both of these filesystems have subvolumes and subvolume snapshot capabilities. This means you can do something similar to an lvm2 snapshot at the directory level instead of the whole filesystem. You can rsync with the same target directory each run and do a snapshot of that target between runs. The recycling concept is not needed because deleting an old snapshot is much faster than doing an rm -rf on a huge tree of hard links. This is especially true on ZFS which usually does the job in <1 second regardless of size. Unfortunately BTRFS usually completes the command quickly but the space is then slowly reclaimed by a kernel thread in the background. Here is something I wrote up about it a while back: http://sanitarium.net/golug/rsync+btrfs_backups_2011.html It is a little out of date now and since I wrote it for a LUG it only covers BTRFS. A FreeBSD 9 system with at least 8GB of RAM running ZFS will outperform pretty much any Linux system running BTRFS (currently) which will outperform any Linux system running ext4 and --link-dest. On 01/14/13 22:45, Robert Bell wrote:> Folks, > > We use rsync extensively for protecting data by making backups. > Thank you to the authors and maintainers. > > > Like many others, we use the --link-dest option to cut down on the > space occupied by the backups. > > Unlike many others, we re-cycle old backup directories. Since most > file systems change only slowly (ours average about 0.5% of files > and about 1.5% of data being churned each day), a recycled > directory is a good start for the next backup. Our most common > case is that a directory from 5 days ago becomes the target for the > current backup, with the yesterday's backup being provided by a > --link-dest= setting. > > Since the source file system changes only slowly, I have been > thinking about ways to speed up the backups in the future. One way > is to have the backups deal only with files that have changed on > the source since the last backup. This would save having to scan > the whole source and destination areas each time a backup is done. > The Linux inotify capability looks like it might be useful for > collecting a list of changed files. > > Has anyone done this? > > However, there is one case that I have not been able to get to work > in a test of rsync. This is the case where a file exists in the > destination, does not exist in the source, but is named in the > --files-from= list. This would be the case if a file had been > deleted from the source. We would want rsync in this case to > delete the file on the destination. > > However, with a test command like: > > rsync -a -i --delete --files-from=list --link-dest=../linked > source/ dest > > I was unable to get rsync to delete on the destination a file which > did not exist in the source but was named in the list. rsync > baulked at a file being listed that was not in the source. For > example: > > rsync: link_stat "/data/flush/inter/bel107.80527/source/0yyy" > failed: No such file or directory (2) > > [The test file 0yyy existed in the destination, the link-dest area > and in the list, but not in the source.] > > Thanks to those who have read down to here. :-) > > Regards Rob. Bell e-mail: Robert.Bell at csiro.au -- Dr > Robert C. Bell, BSc (Hons) PhD Technical Services Manager Advanced > Scientific Computing CSIRO IM&T > > Phone: +61 3 9669 8102 | Mobile: +61 428 108 333 | CSIRO 93 3810 > Robert.Bell at csiro.au | http://www.csiro.au/ | > http://www.hpsc.csiro.au/ Addresses: Street: CSIRO ASC Level 11, > 700 Collins Street, Docklands Vic 3008, Australia Postal: CSIRO ASC > Level 11, GPO Box 1289, Melbourne Vic 3001, Australia > > PLEASE NOTE > > The information contained in this email may be confidential or > privileged. Any unauthorised use or disclosure is prohibited. If > you have received this email in error, please delete it immediately > and notify the sender by return email. Thank you. To the extent > permitted by law, CSIRO does not represent, warrant and/or > guarantee that the integrity of this communication has been > maintained or that the communication is free of errors, virus, > interception or interference. > > Please consider the environment before printing this email.- -- ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~ Kevin Korb Phone: (407) 252-6853 Systems Administrator Internet: FutureQuest, Inc. Kevin at FutureQuest.net (work) Orlando, Florida kmk at sanitarium.net (personal) Web page: http://www.sanitarium.net/ PGP public key available on web site. ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~ -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iEYEARECAAYFAlD1ZsEACgkQVKC1jlbQAQcqBwCg7AEnzQQj9vFV9WWnpIYfQS2W EvoAoIFjtx8/CBpejNZ6jH7QYtvL+b8U =+YcS -----END PGP SIGNATURE-----
Wayne Davison
2013-Jan-18 20:16 UTC
rsync - using a --files-from list to cut out scanning. How to handle deletions?
On Mon, Jan 14, 2013 at 7:45 PM, Robert Bell <Robert.Bell at csiro.au> wrote:> Our most common case is that a directory from 5 days ago becomes the > target for the current backup, with the yesterday's backup being provided > by a --link-dest= setting. >This will be better supported in 3.1.0, which was just updated to handle existing files in a better manner -- it will now hard-link into the alt-dest dirs even for existing files. What has not changed is that changed attributes for existing files will still be made in-place, so things like permission changes or xattr changes can affect older files if a hard-linked older version is already in the destination. This is the case where a file exists in the destination, does not exist in> the source, but is named in the --files-from= list. >Use the --delete-missing-args option of 3.1.0. Though it is not yet released, it hopefully will be soon, and is working fine for general use (I use it at my work). ..wayne.. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.samba.org/pipermail/rsync/attachments/20130118/7fb4f784/attachment.html>
Maybe Matching Threads
- rsync - using a --files-from list to cut out scanning. How to handle deletions? (fwd)
- rsync - using a --files-from list to cut out scanning. How to handle deletions? (fwd)B
- Changed attributes for a file in destination that is hard linked get propagated to --link-dest directories
- [patch] link-dest messages and max-size warnings (fwd)
- Backup scripts - recycling old backup directories