Andrew Gideon
2015-Jul-13 13:53 UTC
rsync --link-dest and --files-from lead by a "change list" from some file system audit tool (Was: Re: cut-off time for rsync ?)
On Mon, 13 Jul 2015 02:19:23 +0000, Andrew Gideon wrote:> Look at tools like inotifywait, auditd, or kfsmd to see what's easily > available to you and what best fits your needs. > > [Though I'd also be surprised if nobody has fed audit information into > rsync before; your need doesn't seem all that unusual given ever-growing > disk storage.]I wanted to take this a bit further. I've thought, on and off, about this for a while and I always get stuck. I use rsync with --link-desk as a backup tool. For various reasons, this is not something I want to give up. But, esp. for some very large file systems, doing something that avoids the scan would be desirable. I should also add that I mistrust time-stamp, and even time-stamp+file- size, mechanism for detecting changes. Checksums, on the other hand, are prohibitively expensive for backup of large file systems. These both bring me to the idea of using some file system auditing mechanism to drive - perhaps with an --include-from or --files-from - what rsync moves. Where I get stuck is that I cannot envision how I can provide rsync with a limited list of files to move that doesn't deny the benefit of --link- dest: a complete snapshot of the old file system via [hard] links into a prior snapshot for those files that are unchanged. Has anyone done something of this sort? I'd thought of preceding the rsync with a "cp -Rl" on the destination from the old snapshot to the new snapshot, but I still think that this will break in the face of hard links (to a file not in the --files-from list) or a change to file attributes (ie. a chmod would effect the copy of a file in the old snapshot). Thanks... Andrew
Simon Hobson
2015-Jul-13 14:40 UTC
rsync --link-dest and --files-from lead by a "change list" from some file system audit tool (Was: Re: cut-off time for rsync ?)
Andrew Gideon <c182driver1 at gideon.org> wrote:> These both bring me to the idea of using some file system auditing > mechanism to drive - perhaps with an --include-from or --files-from - > what rsync moves. > > Where I get stuck is that I cannot envision how I can provide rsync with > a limited list of files to move that doesn't deny the benefit of --link- > dest: a complete snapshot of the old file system via [hard] links into a > prior snapshot for those files that are unchanged.The think here is that you are into "backup" tools rather than the general purpose tool that rsync is intended to be. storebackup does some elements of what you talk about in that it keeps a catalogue of existing files in the backup with a hash/checksum for each. I'm not sure how it goes about picking changed files - I suspect it uses "time+size" as a primary filter, but on the other hand I know for a fact you can "touch" a file and that change won't appear in the destination*. But for remote backups, the primary server can generate a changes list which is then copied to the remote server which then adds the new/changed files and hard-links the unchanged ones according to the list it's been given. If you turn off the file splitting and compression options, the backup is a series of hard-linked directories which you can look into and pull files directly. * But if you do alter the timestamp on a file without changing the contents, that will not appear in the file structure in the backup - later "copies" of the file retain the earlier timestamp. It does keep this information, and if you use the corresponding restore tool then you get back the correct timestamp. In a completely different setup, I also use Retrospect. Recent versions have an option (Instant Scan") to allow the client to keep an audit of changes to avoid the "scan the client/do a massive compare" that's needed with this option turned off.
Ken Chase
2015-Jul-13 14:43 UTC
rsync --link-dest and --files-from lead by a "change list" from some file system audit tool (Was: Re: cut-off time for rsync ?)
inotifywatch or equiv, there's FSM stuff (filesystem monitor) as well. constantData had a product we used years ago - a kernel module that dumped out a list of any changed files out some /proc or /dev/* device and they had a whole toolset that ate the list (into some db) and played it out as it constantly tried to keep up with replication to a target (kinda like drdb but async). They got eaten by some large backup company and the product was later priced at 5x what we had paid for it (in the mid $x000s/y) This 2003-4 technolog is certainly available in some format now. If you only copy the changes, you're likely saving a lot of time. /kc On Mon, Jul 13, 2015 at 01:53:43PM +0000, Andrew Gideon said: >On Mon, 13 Jul 2015 02:19:23 +0000, Andrew Gideon wrote: > >> Look at tools like inotifywait, auditd, or kfsmd to see what's easily >> available to you and what best fits your needs. >> >> [Though I'd also be surprised if nobody has fed audit information into >> rsync before; your need doesn't seem all that unusual given ever-growing >> disk storage.] > >I wanted to take this a bit further. I've thought, on and off, about >this for a while and I always get stuck. > >I use rsync with --link-desk as a backup tool. For various reasons, this >is not something I want to give up. But, esp. for some very large file >systems, doing something that avoids the scan would be desirable. > >I should also add that I mistrust time-stamp, and even time-stamp+file- >size, mechanism for detecting changes. Checksums, on the other hand, are >prohibitively expensive for backup of large file systems. > >These both bring me to the idea of using some file system auditing >mechanism to drive - perhaps with an --include-from or --files-from - >what rsync moves. > >Where I get stuck is that I cannot envision how I can provide rsync with >a limited list of files to move that doesn't deny the benefit of --link- >dest: a complete snapshot of the old file system via [hard] links into a >prior snapshot for those files that are unchanged. > >Has anyone done something of this sort? I'd thought of preceding the >rsync with a "cp -Rl" on the destination from the old snapshot to the new >snapshot, but I still think that this will break in the face of hard >links (to a file not in the --files-from list) or a change to file >attributes (ie. a chmod would effect the copy of a file in the old >snapshot). > >Thanks... > > Andrew > >-- >Please use reply-all for most replies to avoid omitting the mailing list. >To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync >Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html -- Ken Chase - ken at heavycomputing.ca skype:kenchase23 +1 416 897 6284 Toronto Canada Heavy Computing - Clued bandwidth, colocation and managed linux VPS @151 Front St. W.
Andrew Gideon
2015-Jul-13 21:08 UTC
rsync --link-dest and --files-from lead by a "change list" from some file system audit tool (Was: Re: cut-off time for rsync ?)
On Mon, 13 Jul 2015 15:40:51 +0100, Simon Hobson wrote:> The think here is that you are into "backup" tools rather than the > general purpose tool that rsync is intended to be.Yes, that is true. Rsync serves so well as a core component to backup, I can be blind about "something other than rsync". I'll look at the tools you suggest. However, you've made be a little apprehensive about storebackup. I like the lack of a need for a "restore tool". This permits all the standard UNIX tools to be applied to whatever I might want to do over the backup, which is often *very* convenient. On the other hand, I do confess that I am sometimes miffed at the waste involved in a small change to a very large file. Rsync is smart about moving minimal data, but it still stores an entire new copy of the file. What's needed is a file system that can do what hard links do, but at the file page level. I imagine that this would work using the same Copy On Write logic used in managing memory pages after a fork(). - Andrew
Possibly Parallel Threads
- rsync --link-dest and --files-from lead by a "change list" from some file system audit tool (Was: Re: cut-off time for rsync ?)
- rsync --link-dest and --files-from lead by a "change list" from some file system audit tool (Was: Re: cut-off time for rsync ?)
- Fwd: rsync --link-dest and --files-from lead by a "change list" from some file system audit tool (Was: Re: cut-off time for rsync ?)
- rsync --link-dest and --files-from lead by a "change list" from some file system audit tool (Was: Re: cut-off time for rsync ?)
- rsync --link-dest and --files-from lead by a "change list" from some file system audit tool (Was: Re: cut-off time for rsync ?)