On 11/09/2015 09:22 PM, Arun Khan wrote:> You can use "newer" options of the find command and pass the file list > to rsync or scp to "backup" only those files that have changed since > the last run. You can keep a file like .lastbackup and timestamp it > (touch) at the start of the backup process. Next backup you compare > the current timestamp with the timestamp on this file.Absolutely none of that is necessary with rsync, and the process you described is likely to miss files that are modified while "find" runs. If you're going to use rsync to make backups, just use a frontend like rsnapshot or backuppc.
On Nov 10, 2015, at 8:46 AM, Gordon Messmer <gordon.messmer at gmail.com> wrote:> > On 11/09/2015 09:22 PM, Arun Khan wrote: >> You can use "newer" options of the find command and pass the file list > > the process you described is likely to miss files that are modified while "find" runs.Well, be fair, rsync can also miss files if files are changing while the backup occurs. Once rsync has passed through a given section of the tree, it will not see any subsequent changes. If you need guaranteed-complete filesystem-level snapshots, you need to be using something at the kernel level that can atomically collect the set of modified blocks/files, rather than something that crawls the tree in user space. On the BSD Now podcast, they recently told a war story about moving one of the main FreeBSD servers to a new data center. rsync was taking 21 hours in back-to-back runs purely due to the amount of files on that server, which gave plenty of time for files to change since the last run. Solution? ZFS send: http://128bitstudios.com/2010/07/23/fun-with-zfs-send-and-receive/
On 11/10/2015 12:16 PM, Warren Young wrote:> > Well, be fair, rsync can also miss files if files are changing while the backup occurs. Once rsync has passed through a given section of the tree, it will not see any subsequent changes.I think you miss my meaning. Consider this sequence of events: * "find" begins and processes dirA and then dirB * another application writes files in dirA * "find" completes * a new timestamp file is written Now, the new file in dirA wasn't seen by find during this run, and it won't be seen on the next run either. That's what I mean by missed. Not temporarily missed, but permanently. That file won't ever be backed up in this very na?ve process. There's no benefit to the process, either. rsync can efficiently examine and synchronize filesystems without using find. And while it may miss files that are written while it's running, it *will* get them on the next run, unlike using "find".> If you need guaranteed-complete filesystem-level snapshots, you need to be using something at the kernel level that can atomically collect the set of modified blocks/files, rather than something that crawls the tree in user space.Generally, I agree with you. In fact: https://bitbucket.org/gordonmessmer/dragonsdawn-snapshot https://github.com/rsnapshot/rsnapshot/pull/44 Doing block-level differentials is nice, if you're using ZFS. But not everyone wants to run ZFS on Linux. I do think that backing up snapshots is important, though.
I did exactly this with ZFS on Linux and cut over 24 hours of backup lag to just minutes. If you're managing data at scale, ZFS just rocks... On Tuesday, November 10, 2015 01:16:28 PM Warren Young wrote:> On Nov 10, 2015, at 8:46 AM, Gordon Messmer <gordon.messmer at gmail.com>wrote:> > On 11/09/2015 09:22 PM, Arun Khan wrote: > >> You can use "newer" options of the find command and pass the file list > > > > the process you described is likely to miss files that are modified while > > "find" runs. > Well, be fair, rsync can also miss files if files are changing while the > backup occurs. Once rsync has passed through a given section of the tree, > it will not see any subsequent changes. > > If you need guaranteed-complete filesystem-level snapshots, you need to be > using something at the kernel level that can atomically collect the set of > modified blocks/files, rather than something that crawls the tree in user > space. > > On the BSD Now podcast, they recently told a war story about moving one of > the main FreeBSD servers to a new data center. rsync was taking 21 hours > in back-to-back runs purely due to the amount of files on that server, > which gave plenty of time for files to change since the last run. > > Solution? ZFS send: > > http://128bitstudios.com/2010/07/23/fun-with-zfs-send-and-receive/ > _______________________________________________ > CentOS mailing list > CentOS at centos.org > https://lists.centos.org/mailman/listinfo/centos
On 11/11/15 02:46, Gordon Messmer wrote:> > ... the process you described is likely to miss files that are > modified while "find" runs. >That's just being picky for the sake of it. A backup is a *point-in-time* snapshot of the files being backed up. It will not capture files modified after that point. So, saying that find won't find files modified while the backup is running is frankly the same as saying it won't find files modified anytime in the future after that *point-in-time* when the backup started! If there's a point to be made by the quoted statement above, I missed it and I surely deserve to be educated! ak.