Ken Chase
2015-Apr-06 05:51 UTC
rsync --link-dest won't link even if existing file is out of date
Feature request: allow --link-dest dir to be linked to even if file exists in target. This statement from the man page is adhered to too strongly IMHO: "This option works best when copying into an empty destination hierarchy, as rsync treats existing files as definitive (so it never looks in the link-dest dirs when a destination file already exists)". I was suprised by this behaviour as generally the scheme is to be efficient/save space with rsync. When the file is out of date but exists in the --l-d target, it would be great if it could be removed and linked. If an option was supplied to request this behaviour, I'd actually throw some money at making it happen. (And a further option to retain a copy if inode permissions/ownership would otherwise be changed.) Reasoning: I backup many servers with --link-dest that have filesystems of 10+M files on them. I do not delete old backups - which take 60min per tree or more just so rsync can recreate them all in an empty target dir when <1% of files change per day (takes 3-5 hrs per backup!). Instead, I cycle them in with mv $olddate $today then rsync --del --link-dest over them - takes 30-60 min depending. (Yes, some malleability of permissions risk there, mostly interested in contents tho). Problem is, if a file exists AT ALL, even out of date, a new copy is put overtop of it per the above man page decree. Thus much more disk space is used. Running this scheme with moving old backups to be written overtop of accumulates many copies of the exact same file over time. Running pax -rpl over the copies before rsyncing to them works (and saves much space!), but takes a very long time as it traverses and compares 2 large backup trees thrashing the same device (in the order of 3-5x the rsync's time, 3-5 hrs for pax - hardlink(1) is far worse, I suspect a some non-linear algorithm therein - it ran 3-5x slower than pax again). I have detailed an example of this scenario at http://unix.stackexchange.com/questions/193308/rsyncs-link-dest-option-does-not-link-identical-files-if-an-old-file-exists which also indicates --delete-before and --whole-file do not help at all. /kc -- Ken Chase - ken at heavycomputing.ca skype:kenchase23 +1 416 897 6284 Toronto Canada Heavy Computing - Clued bandwidth, colocation and managed linux VPS @151 Front St. W.
Kevin Korb
2015-Apr-06 16:07 UTC
rsync --link-dest won't link even if existing file is out of date
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Since you are in an environment with millions of files I highly recommend that you move to ZFS storage and use ZFS's subvolume snapshots instead of --link-dest. It is much more space efficient, rsync run time efficient, and the old backups can be deleted in seconds. Rsync doesn't have to understand anything about ZFS. You just rsync to the same directory every time and have ZFS do a snapshot on that directory between runs. On 04/06/2015 01:51 AM, Ken Chase wrote:> Feature request: allow --link-dest dir to be linked to even if file > exists in target. > > This statement from the man page is adhered to too strongly IMHO: > > "This option works best when copying into an empty destination > hierarchy, as rsync treats existing files as definitive (so it > never looks in the link-dest dirs when a destination file already > exists)". > > I was suprised by this behaviour as generally the scheme is to be > efficient/save space with rsync. > > When the file is out of date but exists in the --l-d target, it > would be great if it could be removed and linked. If an option was > supplied to request this behaviour, I'd actually throw some money > at making it happen. (And a further option to retain a copy if > inode permissions/ownership would otherwise be changed.) > > Reasoning: > > I backup many servers with --link-dest that have filesystems of > 10+M files on them. I do not delete old backups - which take 60min > per tree or more just so rsync can recreate them all in an empty > target dir when <1% of files change per day (takes 3-5 hrs per > backup!). > > Instead, I cycle them in with mv $olddate $today then rsync --del > --link-dest over them - takes 30-60 min depending. (Yes, some > malleability of permissions risk there, mostly interested in > contents tho). Problem is, if a file exists AT ALL, even out of > date, a new copy is put overtop of it per the above man page > decree. > > Thus much more disk space is used. Running this scheme with moving > old backups to be written overtop of accumulates many copies of the > exact same file over time. Running pax -rpl over the copies before > rsyncing to them works (and saves much space!), but takes a very > long time as it traverses and compares 2 large backup trees > thrashing the same device (in the order of 3-5x the rsync's time, > 3-5 hrs for pax - hardlink(1) is far worse, I suspect a some > non-linear algorithm therein - it ran 3-5x slower than pax again). > > I have detailed an example of this scenario at > > http://unix.stackexchange.com/questions/193308/rsyncs-link-dest-option-does-not-link-identical-files-if-an-old-file-exists > > which also indicates --delete-before and --whole-file do not help > at all. > > /kc >- -- ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~ Kevin Korb Phone: (407) 252-6853 Systems Administrator Internet: FutureQuest, Inc. Kevin at FutureQuest.net (work) Orlando, Florida kmk at sanitarium.net (personal) Web page: http://www.sanitarium.net/ PGP public key available on web site. ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~ -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEARECAAYFAlUirykACgkQVKC1jlbQAQc83ACfa7lawkyPFyO9kDE/D8aztql0 AkAAoIQ970yTCHB1ypScQ8ILIQR6zphl =ktEg -----END PGP SIGNATURE-----
Ken Chase
2015-Apr-06 16:12 UTC
rsync --link-dest won't link even if existing file is out of date
This has been a consideration. But it pains me that a tiny change/addition to the rsync option set would save much time and space for other legit use cases. We know rsync very well, we dont know ZFS very well (licensing kept the tech out of our linux-centric operations). We've been using it but we're not experts yet. Thanks for the suggestion. /kc On Mon, Apr 06, 2015 at 12:07:05PM -0400, Kevin Korb said: >-----BEGIN PGP SIGNED MESSAGE----- >Hash: SHA1 > >Since you are in an environment with millions of files I highly >recommend that you move to ZFS storage and use ZFS's subvolume >snapshots instead of --link-dest. It is much more space efficient, >rsync run time efficient, and the old backups can be deleted in >seconds. Rsync doesn't have to understand anything about ZFS. You >just rsync to the same directory every time and have ZFS do a snapshot >on that directory between runs. > >On 04/06/2015 01:51 AM, Ken Chase wrote: >> Feature request: allow --link-dest dir to be linked to even if file >> exists in target. >> >> This statement from the man page is adhered to too strongly IMHO: >> >> "This option works best when copying into an empty destination >> hierarchy, as rsync treats existing files as definitive (so it >> never looks in the link-dest dirs when a destination file already >> exists)". >> >> I was suprised by this behaviour as generally the scheme is to be >> efficient/save space with rsync. >> >> When the file is out of date but exists in the --l-d target, it >> would be great if it could be removed and linked. If an option was >> supplied to request this behaviour, I'd actually throw some money >> at making it happen. (And a further option to retain a copy if >> inode permissions/ownership would otherwise be changed.) >> >> Reasoning: >> >> I backup many servers with --link-dest that have filesystems of >> 10+M files on them. I do not delete old backups - which take 60min >> per tree or more just so rsync can recreate them all in an empty >> target dir when <1% of files change per day (takes 3-5 hrs per >> backup!). >> >> Instead, I cycle them in with mv $olddate $today then rsync --del >> --link-dest over them - takes 30-60 min depending. (Yes, some >> malleability of permissions risk there, mostly interested in >> contents tho). Problem is, if a file exists AT ALL, even out of >> date, a new copy is put overtop of it per the above man page >> decree. >> >> Thus much more disk space is used. Running this scheme with moving >> old backups to be written overtop of accumulates many copies of the >> exact same file over time. Running pax -rpl over the copies before >> rsyncing to them works (and saves much space!), but takes a very >> long time as it traverses and compares 2 large backup trees >> thrashing the same device (in the order of 3-5x the rsync's time, >> 3-5 hrs for pax - hardlink(1) is far worse, I suspect a some >> non-linear algorithm therein - it ran 3-5x slower than pax again). >> >> I have detailed an example of this scenario at >> >> http://unix.stackexchange.com/questions/193308/rsyncs-link-dest-option-does-not-link-identical-files-if-an-old-file-exists >> >> which also indicates --delete-before and --whole-file do not help >> at all. >> >> /kc >> > >- -- >~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~ > Kevin Korb Phone: (407) 252-6853 > Systems Administrator Internet: > FutureQuest, Inc. Kevin at FutureQuest.net (work) > Orlando, Florida kmk at sanitarium.net (personal) > Web page: http://www.sanitarium.net/ > PGP public key available on web site. >~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~ >-----BEGIN PGP SIGNATURE----- >Version: GnuPG v2 > >iEYEARECAAYFAlUirykACgkQVKC1jlbQAQc83ACfa7lawkyPFyO9kDE/D8aztql0 >AkAAoIQ970yTCHB1ypScQ8ILIQR6zphl >=ktEg >-----END PGP SIGNATURE----- >-- >Please use reply-all for most replies to avoid omitting the mailing list. >To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync >Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html -- Ken Chase - ken att heavycomputing.ca Toronto Canada Heavy Computing - Clued bandwidth, colocation and managed linux VPS @151 Front St. W.
Wayne Davison
2015-Apr-12 16:32 UTC
rsync --link-dest won't link even if existing file is out of date
On Sun, Apr 5, 2015 at 10:51 PM, Ken Chase <rsync-list-m829 at sizone.org> wrote:> Feature request: allow --link-dest dir to be linked to even if file exists > in target. >>From the release notes for 3.1.0:- Improved the use of alt-dest options into an existing hierarchy of files: If a match is found in an alt-dir, it takes precedence over an existing file. (We'll need to wait for a future version before attribute-changes on otherwise unchanged files are safe when using an existing hierarchy.) So, storage savings are realized, and things like mode changes affect all the hard-linked files. ..wayne.. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.samba.org/pipermail/rsync/attachments/20150412/90ccd773/attachment.html>
Henri Shustak
2015-Apr-14 23:04 UTC
rsync --link-dest won't link even if existing file is out of date
Hi Ken, You may wish to take a quick look at LBackup (disclaimer I am a developer on this project) which is a wrapper to rsync ; designed for reliable user data backups. LBackup always starts a new backup snapshot with an empty directory. I have been looking at extending --link-dest options to scan beyond just the previous successful backup to (failed backups / older backups). However, there are all kinds of edge cases which are worth considering with such a changes. At present LBackup is focused on reliability as such, this R&D is quite slow given limited resources. The current version of LBackup offers IMHO reliable backups of user data and the scripting sub-system offers a high degree of flxibility. Yes, every time you start a backup snapshot, a directory is re-populated from scratch and this takes time with LBackup. However, if you are seeking reliability then you may wish to check out the following URL : http://www.lbackup.org If you are looking to speed up performance, then investing in faster hardware, additional file system caching or considering various file systems is well worth while. Ideas and patches are welcome to improve the LBackup project. -------------------------------------------------------------------- This email is protected by LBackup, an open source backup solution http://www.lbackup.org
Henri Shustak
2015-Apr-15 03:35 UTC
rsync --link-dest won't link even if existing file is out of date
> Ill take a look but I imagine I cant backup the 80 Million files I need > to in under the 5 hours i have for nightly maintenance/backups. Currently > it's possible by recycling directories...To cover that many files in that much time you will require a high speed system. Just another thought. Perhaps splitting the backup onto multiple backup servers / storage systems would reduce the backup time so that it fits into your window? Also, I strongly agree with the previous posts relating to file system snapshots. ZFS is just one file system which supports this kind of system. --------------------------------------------------------------------- This email is protected by LBackup, an open source backup solution. http://www.lbackup.org
Kevin Korb
2015-Apr-15 06:45 UTC
rsync --link-dest won't link even if existing file is out of date
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 04/14/2015 11:35 PM, Henri Shustak wrote:>> Ill take a look but I imagine I cant backup the 80 Million files >> I need to in under the 5 hours i have for nightly >> maintenance/backups. Currently it's possible by recycling >> directories...I would expect that recycling directories actually makes this worse. With an empty target directory you don't even need the overhead of - --delete (not as bad as it used to be thanks to --delete-during but it is still overhead). If your backup window is only 5 hours then that leaves you with 19 hours a day to do other things on your backup server(s) such as deleting off old backups. Get all those unlink() calls out of your backup window. Bad enough you need to do 80 million calls to stat().> > To cover that many files in that much time you will require a high > speed system. Just another thought. Perhaps splitting the backup > onto multiple backup servers / storage systems would reduce the > backup time so that it fits into your window?Agreed completely here. It is much easier to make more backup servers than it is to make one big one that can handle the entire load. We divide our backup load by server. IOW, each backup server has a list of production servers that it backs up.> Also, I strongly agree with the previous posts relating to file > system snapshots. ZFS is just one file system which supports this > kind of system.I have also attempted to use btrfs in Linux for this. I even wrote up a presentation for my local LUG about it: https://sanitarium.net/golug/rsync+btrfs_backups_2011.html Unfortunately there was nothing but grief. The btrfs just wasn't stable enough and the btrfs-cleaner kernel thread drove performance into the ground. We eventually had to abandon it in favor of ZFS on TrueOS. As far as "fast box" goes we decided on 8GB of RAM for most of the backup servers and essentially whatever CPU can handle that much RAM. Most of them are older AMD Athlon 64 X2 desktops. We do have one with a quad core CPU and 16GB of RAM. That is the only one running ZFS de-duplication as that is the big RAM hog.> --------------------------------------------------------------------- > >This email is protected by LBackup, an open source backup solution.> http://www.lbackup.org >- -- ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~ Kevin Korb Phone: (407) 252-6853 Systems Administrator Internet: FutureQuest, Inc. Kevin at FutureQuest.net (work) Orlando, Florida kmk at sanitarium.net (personal) Web page: http://www.sanitarium.net/ PGP public key available on web site. ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~ -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEARECAAYFAlUuCP0ACgkQVKC1jlbQAQfvtQCgyUNEGbwYaX3RILUnBvHCn1KH x4MAoIqmRBNpMDkZfiqndZ6oll+GfhLH =8saN -----END PGP SIGNATURE-----
Possibly Parallel Threads
- rsync --link-dest won't link even if existing file is out of date
- rsync --link-dest won't link even if existing file is out of date
- rsync --link-dest won't link even if existing file is out of date
- Patch for rsync --link-dest won't link even if existing file is out of date (fwd)
- How do you exclude a directory that is a symlink?