> If your goal is to reduce storage, and scanning inodes doesnt matter, > use --link-dest for targets. However, that'll keep a backup for every > time that you run it, by link-desting yesterday's copy.The goal was not to reduce storage, it was to reduce work. A full rsync takes more than the whole night, and the destination server is almost unusable for anything else when it is doing its rsyncs. I am sorry if this was unclear. I just want to give rsync a hint that comparing files and directories that are older than one week on the source side is a waste of time and effort, as the rsync is done every day, so they can safely be assumed to be in sync already. Dirk van Deun -- Ceterum censeo Redmond delendum
> The goal was not to reduce storage, it was to reduce work. A full > rsync takes more than the whole night, and the destination server is > almost unusable for anything else when it is doing its rsyncs. I > am sorry if this was unclear. I just want to give rsync a hint that > comparing files and directories that are older than one week on > the source side is a waste of time and effort, as the rsync is done > every day, so they can safely be assumed to be in sync already.I thought something rang a bell ... From the man page :> -I, --ignore-times > Normally rsync will skip any files that are already the > same size and have the same modification time-stamp. > This option turns off this "quick check" behavior, > causing all files to be updated.As I read this, the default is to look at the file size/timestamp and if they match then do nothing as they are assumed to be identical. So unless you have specified this, then files which have already been copied should be ignored - the check should be quite low in CPU, at least compared to the "cost" of generating a file checksum etc. AFAIK there is no option to completely ignore files by timestamp - at least not within rsync itself.
What is taking time, scanning inodes on the destination, or recopying the entire backup because of either source read speed, target write speed or a slow interconnect between them? Do you keep a full new backup every day, or are you just overwriting the target directory? /kc On Wed, Jul 01, 2015 at 10:06:57AM +0200, Dirk van Deun said: >> If your goal is to reduce storage, and scanning inodes doesnt matter, >> use --link-dest for targets. However, that'll keep a backup for every >> time that you run it, by link-desting yesterday's copy. > >The goal was not to reduce storage, it was to reduce work. A full >rsync takes more than the whole night, and the destination server is >almost unusable for anything else when it is doing its rsyncs. I >am sorry if this was unclear. I just want to give rsync a hint that >comparing files and directories that are older than one week on >the source side is a waste of time and effort, as the rsync is done >every day, so they can safely be assumed to be in sync already. > >Dirk van Deun >-- >Ceterum censeo Redmond delendum -- Ken Chase - ken at heavycomputing.ca skype:kenchase23 +1 416 897 6284 Toronto Canada Heavy Computing - Clued bandwidth, colocation and managed linux VPS @151 Front St. W.
You could use find to build a filter to use with rsync, then update the filter every few days if it takes too long to create. I have used a script to build a filter on the source server to exclude anything over 5 days old, invoked when the sync starts, but it only parses around 2000 files per run. Mark. On 2/07/2015 2:34 a.m., Ken Chase wrote:> What is taking time, scanning inodes on the destination, or recopying the entire > backup because of either source read speed, target write speed or a slow interconnect > between them? > > Do you keep a full new backup every day, or are you just overwriting the target > directory? > > /kc > > > On Wed, Jul 01, 2015 at 10:06:57AM +0200, Dirk van Deun said: > >> If your goal is to reduce storage, and scanning inodes doesnt matter, > >> use --link-dest for targets. However, that'll keep a backup for every > >> time that you run it, by link-desting yesterday's copy. > > > >The goal was not to reduce storage, it was to reduce work. A full > >rsync takes more than the whole night, and the destination server is > >almost unusable for anything else when it is doing its rsyncs. I > >am sorry if this was unclear. I just want to give rsync a hint that > >comparing files and directories that are older than one week on > >the source side is a waste of time and effort, as the rsync is done > >every day, so they can safely be assumed to be in sync already. > > > >Dirk van Deun > >-- > >Ceterum censeo Redmond delendum >
> What is taking time, scanning inodes on the destination, or recopying the entire > backup because of either source read speed, target write speed or a slow interconnect > between them?It takes hours to traverse all these directories with loads of small files on the backup server. That is the limiting factor. Not even copying: just checking the timestamp and size of the old copies. The source server is the actual live system, which has fast disks, so I can afford to move the burden to the source side, using the find utility to select homes that have been touched recently and using rsync only on these. But it would be nice if a clever invocation of rsync could remove the extra burden entirely. Dirk van Deun -- Ceterum censeo Redmond delendum
On Wed, Jul 01, 2015 at 02:05:50PM +0100, Simon Hobson said: >As I read this, the default is to look at the file size/timestamp and if they match then do nothing as they are assumed to be identical. So unless you have specified this, then files which have already been copied should be ignored - the check should be quite low in CPU, at least compared to the "cost" of generating a file checksum etc. This belies the issue of many rsync users not sufficiently abusing rsync to do backups like us idiots do! :) You have NO IDEA how long it takes to scan 100M files on a 7200 rpm disk. It becomes the dominant issue - CPU isnt the issue at all. (Additionally, I would think that metadata scanning could max out only 2 cores anyway - 1 for rsync's userland gobbling of another core of kernel running the fs scanning inodes). This is why throwing away all that metadata seems silly. Keeping detailed logs and parsing them before copy would be good, but requires an external selection script before rsync starts, the script handing rsync a list of files to copy directly. Unfortunate because rsync's scan method is quite advanced, but doesnt avoid this pitfall. Additionally, I dont know if linux (or freebsd or any unix) can be told to cache metadata more aggressively than data - not much point for the latter on a backup server. The former would be great. I dont know how big metadata is in ram either for typical OS's, per inode. /kc -- Ken Chase - ken at heavycomputing.ca skype:kenchase23 +1 416 897 6284 Toronto Canada Heavy Computing - Clued bandwidth, colocation and managed linux VPS @151 Front St. W.