Andrew Gideon
2015-Jul-13 21:08 UTC
rsync --link-dest and --files-from lead by a "change list" from some file system audit tool (Was: Re: cut-off time for rsync ?)
On Mon, 13 Jul 2015 15:40:51 +0100, Simon Hobson wrote:> The think here is that you are into "backup" tools rather than the > general purpose tool that rsync is intended to be.Yes, that is true. Rsync serves so well as a core component to backup, I can be blind about "something other than rsync". I'll look at the tools you suggest. However, you've made be a little apprehensive about storebackup. I like the lack of a need for a "restore tool". This permits all the standard UNIX tools to be applied to whatever I might want to do over the backup, which is often *very* convenient. On the other hand, I do confess that I am sometimes miffed at the waste involved in a small change to a very large file. Rsync is smart about moving minimal data, but it still stores an entire new copy of the file. What's needed is a file system that can do what hard links do, but at the file page level. I imagine that this would work using the same Copy On Write logic used in managing memory pages after a fork(). - Andrew
Simon Hobson
2015-Jul-13 21:19 UTC
rsync --link-dest and --files-from lead by a "change list" from some file system audit tool (Was: Re: cut-off time for rsync ?)
Andrew Gideon <c182driver1 at gideon.org> wrote:> However, you've made be a little > apprehensive about storebackup. I like the lack of a need for a "restore > tool". This permits all the standard UNIX tools to be applied to > whatever I might want to do over the backup, which is often *very* > convenient.Well if you don't use the file splitting and compression options, you can still do that with storebackup - just be aware that some files may have different timestamps (but not contents) to the original. Specifically, consider this sequence : - Create a file, perform a backup - touch the file to change it's modification timestamp, perform another backup rsync will (I think) see the new file with different timestamp and create a new file rather than lining to the old one. storebackup will link the files )so taking (almost) zero extra space - but the second backup will show the file with the timestamp from the first file. If you just "cp -p" the file then it'll have the earlier timestamp, if you restore it with the storebackup tools then it'll come out with the later timestamp.> On the other hand, I do confess that I am sometimes miffed at the waste > involved in a small change to a very large file. Rsync is smart about > moving minimal data, but it still stores an entire new copy of the file.I'm not sure as I've not used it, but storebackup has the option of splitting large files (threshold user definable). You'd need to look and see if it compares file parts (hard-lining unchanged parts) or the whole file (creates all new parts).> What's needed is a file system that can do what hard links do, but at the > file page level. I imagine that this would work using the same Copy On > Write logic used in managing memory pages after a fork().Well some (all ?) enterprise grade storage boxes support de-dup - usually at the block level. So it does exist, at a price !
Selva Nair
2015-Jul-13 21:38 UTC
Fwd: rsync --link-dest and --files-from lead by a "change list" from some file system audit tool (Was: Re: cut-off time for rsync ?)
On Mon, Jul 13, 2015 at 5:19 PM, Simon Hobson <linux at thehobsons.co.uk> wrote:> > What's needed is a file system that can do what hard links do, but at the > > file page level. I imagine that this would work using the same Copy On > > Write logic used in managing memory pages after a fork(). > > Well some (all ?) enterprise grade storage boxes support de-dup - usually > at the block level. So it does exist, at a price ! >zfs is free and has de-dup. It takes more RAM to support it well, but not prohibitively so unless your data is more than a few TB. As with any dedup solution, performance does take a hit and its often not worth it unless you have a lot of duplication in the data. Selva -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.samba.org/pipermail/rsync/attachments/20150713/7f45d0bf/attachment.html>
Paul Slootman
2015-Jul-14 06:59 UTC
rsync --link-dest and --files-from lead by a "change list" from some file system audit tool (Was: Re: cut-off time for rsync ?)
On Mon 13 Jul 2015, Andrew Gideon wrote:> > On the other hand, I do confess that I am sometimes miffed at the waste > involved in a small change to a very large file. Rsync is smart about > moving minimal data, but it still stores an entire new copy of the file. > > What's needed is a file system that can do what hard links do, but at the > file page level. I imagine that this would work using the same Copy On > Write logic used in managing memory pages after a fork().btrfs has support for this: you make a backup, then create a btrfs snapshot of the filesystem (or directory), then the next time you make a new backup with rsync, use --inplace so that just changed parts of the file are written to the same blocks and btrfs will take care of the copy-on-write part. Paul
Ken Chase
2015-Jul-14 13:30 UTC
rsync --link-dest and --files-from lead by a "change list" from some file system audit tool (Was: Re: cut-off time for rsync ?)
And what's performance like? I've heard lots of COW systems performance drops through the floor when there's many snapshots. /kc On Tue, Jul 14, 2015 at 08:59:25AM +0200, Paul Slootman said: >On Mon 13 Jul 2015, Andrew Gideon wrote: >> >> On the other hand, I do confess that I am sometimes miffed at the waste >> involved in a small change to a very large file. Rsync is smart about >> moving minimal data, but it still stores an entire new copy of the file. >> >> What's needed is a file system that can do what hard links do, but at the >> file page level. I imagine that this would work using the same Copy On >> Write logic used in managing memory pages after a fork(). > >btrfs has support for this: you make a backup, then create a btrfs >snapshot of the filesystem (or directory), then the next time you make a >new backup with rsync, use --inplace so that just changed parts of the >file are written to the same blocks and btrfs will take care of the >copy-on-write part. > > >Paul > >-- >Please use reply-all for most replies to avoid omitting the mailing list. >To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync >Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html -- Ken Chase - ken at heavycomputing.ca skype:kenchase23 +1 416 897 6284 Toronto Canada Heavy Computing - Clued bandwidth, colocation and managed linux VPS @151 Front St. W.
Andrew Gideon
2015-Jul-16 17:42 UTC
Fwd: rsync --link-dest and --files-from lead by a "change list" from some file system audit tool (Was: Re: cut-off time for rsync ?)
On Mon, 13 Jul 2015 17:38:35 -0400, Selva Nair wrote:> As with any dedup solution, performance does take a hit and its often > not worth it unless you have a lot of duplication in the data.This is so only in some volumes in our case, but it appears that zfs permits this to be enabled/disabled on a per-volume basis. That would work for us. Is there a way to save cycles by offering zfs a hint as to where a previous copy of a file's blocks may be found? - Andrew
Andrew Gideon
2015-Jul-16 17:56 UTC
rsync --link-dest and --files-from lead by a "change list" from some file system audit tool (Was: Re: cut-off time for rsync ?)
On Tue, 14 Jul 2015 08:59:25 +0200, Paul Slootman wrote:> btrfs has support for this: you make a backup, then create a btrfs > snapshot of the filesystem (or directory), then the next time you make a > new backup with rsync, use --inplace so that just changed parts of the > file are written to the same blocks and btrfs will take care of the > copy-on-write part.That's interesting. I'd considered doing something similar with LVM snapshots. I chose not to do so because of a particular failure mode: if the space allocated to a snapshot filled (as a result of changes to the "live" data), the snapshot would fail. For my purposes, I'd want the new write to fail instead. Destroying snapshots holding backup data didn't seem a reasonable choice. How does btrfs deal with such issues? - Andrew