Hi, I used to rsync a /home with thousands of home directories every night, although only a hundred or so would be used on a typical day, and many of them have not been used for ages. This became too large a burden on the poor old destination server, so I switched to a script that uses "find -ctime -7" on the source to select recently used homes first, and then rsyncs only those. (A week being a more than good enough safety margin in case something goes wrong occasionally.) Is there a smarter way to do this, using rsync only ? I would like to use rsync with a cut-off time, saying "if a file is older than this, don't even bother checking it on the destination server (and the same for directories -- but without ending a recursive traversal)". Now I am traversing some directories twice on the source server to lighten the burden on the destination server (first find, then rsync). Best, Dirk van Deun -- Ceterum censeo Redmond delendum
At 10:32 30.06.2015, Dirk van Deun wrote:>Hi, > >I used to rsync a /home with thousands of home directories every >night, although only a hundred or so would be used on a typical day, >and many of them have not been used for ages. This became too large a >burden on the poor old destination server, so I switched to a script >that uses "find -ctime -7" on the source to select recently used homes >first, and then rsyncs only those. (A week being a more than good >enough safety margin in case something goes wrong occasionally.)Doing it this way you can't delete files that have disappeared or been renamed.>Is there a smarter way to do this, using rsync only ? I would like to >use rsync with a cut-off time, saying "if a file is older than this, >don't even bother checking it on the destination server (and the same >for directories -- but without ending a recursive traversal)". Now >I am traversing some directories twice on the source server to lighten >the burden on the destination server (first find, then rsync).I would split up the tree into several sub trees and snyc them normally, like /home/a* etc. You can then distribute the calls over several days. If that is still too much then maybe to the find call but then sync the whole user's home instead of just the found files. bye Fabi
If your goal is to reduce storage, and scanning inodes doesnt matter, use --link-dest for targets. However, that'll keep a backup for every time that you run it, by link-desting yesterday's copy. Y end up with a backup tree dir per day, with files hardlinked against all other backup dirs. My (and many others) here's solution is to mv $ancientbackup $today; rsync --del --link-dest=$yest source:$dirs $today creating gaps in the ancient sequence of days of backups - so I end up keeping (very roughly) 1,2,3,4,7,10,15,21,30,45,60,90,120,180 days old backups (of course this isnt how it works, there's some binary counting going on in there, so the elimination isnt exactly like that - every day each of those gets a day older. There are some tower of hanoi-like solutions to this for automated backups.) This means something twice as old has twice as few backups for the same time range, meaning I keep the same frequency*age value for each backup timerange into the past. The result is a set of dirs dated (in my case) 20150630 for eg, which looks exactly like the actual source tree i backed up, but only taking up space of changed files since yesterday. (caveat: it's hardlinked against all the other backups, thus using no more space on disk HOWEVER, some server stuff like postfix doenst like hardlinked files in its spool due to security concerns - so if you should boot/use the backup itself without making a plain copy (which is recommended) 1) postfix et al will yell 2) you will be modifying the whole set of dirs that point to the inode you just booted/used). My solution avoids scanning the source twice (which in my case of backing up 5x 10M files off servers daily is a huge cost), important because the scantime takes longer than the backup/xfer time (gigE network for a mere 20,000 changed files per 10M seems average per box of 5). Also it's production gear - as little time as possible thrashing the box (and its poor metadata cache) is important for performance. Getting the backups done during the night lull is therefore required. I dont have time to delete (nor the disk RMA cycle patience) 10M files on the receiving side just to spend 5 hours recreating them; 20,000 seems better to me. You could also use --backup and --backup-dir, but I dont do it that way. /kc On Tue, Jun 30, 2015 at 10:32:31AM +0200, Dirk van Deun said: >Hi, > >I used to rsync a /home with thousands of home directories every >night, although only a hundred or so would be used on a typical day, >and many of them have not been used for ages. This became too large a >burden on the poor old destination server, so I switched to a script >that uses "find -ctime -7" on the source to select recently used homes >first, and then rsyncs only those. (A week being a more than good >enough safety margin in case something goes wrong occasionally.) > >Is there a smarter way to do this, using rsync only ? I would like to >use rsync with a cut-off time, saying "if a file is older than this, >don't even bother checking it on the destination server (and the same >for directories -- but without ending a recursive traversal)". Now >I am traversing some directories twice on the source server to lighten >the burden on the destination server (first find, then rsync). > >Best, > >Dirk van Deun >-- >Ceterum censeo Redmond delendum >-- >Please use reply-all for most replies to avoid omitting the mailing list. >To unsubscribe or change options: lists.samba.org/mailman/listinfo/rsync >Before posting, read: catb.org/~esr/faqs/smart-questions.html -- Ken Chase - ken at heavycomputing.ca skype:kenchase23 +1 416 897 6284 Toronto Canada Heavy Computing - Clued bandwidth, colocation and managed linux VPS @151 Front St. W.
> >I used to rsync a /home with thousands of home directories every > >night, although only a hundred or so would be used on a typical day, > >and many of them have not been used for ages. This became too large a > >burden on the poor old destination server, so I switched to a script > >that uses "find -ctime -7" on the source to select recently used homes > >first, and then rsyncs only those. (A week being a more than good > >enough safety margin in case something goes wrong occasionally.) > > Doing it this way you can't delete files that have disappeared or been > renamed. > > >Is there a smarter way to do this, using rsync only ? I would like to > >use rsync with a cut-off time, saying "if a file is older than this, > >don't even bother checking it on the destination server (and the same > >for directories -- but without ending a recursive traversal)". Now > >I am traversing some directories twice on the source server to lighten > >the burden on the destination server (first find, then rsync). > > I would split up the tree into several sub trees and snyc them > normally, like /home/a* etc. You can then distribute the calls > over several days. If that is still too much then maybe to the > find call but then sync the whole user's home instead of just > the found files.As I did say in my original mail, but apparently did not emphasize sufficiently, rsyncing complete homes if anything changed in them is actually what I do; so files that have been deleted or renamed are handled correctly. Anyway, the first paragraph was just to provide some context: my real question is: can you specify a cut-off time using rsync only, meaning that files are ignored and directories are considered up to date on the destination server if they have not been touched for x days on the source ? Dirk van Deun -- Ceterum censeo Redmond delendum
> If your goal is to reduce storage, and scanning inodes doesnt matter, > use --link-dest for targets. However, that'll keep a backup for every > time that you run it, by link-desting yesterday's copy.The goal was not to reduce storage, it was to reduce work. A full rsync takes more than the whole night, and the destination server is almost unusable for anything else when it is doing its rsyncs. I am sorry if this was unclear. I just want to give rsync a hint that comparing files and directories that are older than one week on the source side is a waste of time and effort, as the rsync is done every day, so they can safely be assumed to be in sync already. Dirk van Deun -- Ceterum censeo Redmond delendum