Hello fellas, Firstly, I would like to apologize for the post being long, but I would appreciate if someone with a good knowledge of the subject gave his opinion. I am working on a backup solution for a series of old and very specialized host computers in aviation industry. I need to do monthly backups of one or two logical volumes (non-OS) on each of the computers and keep the last 12 monthly and 5 last yearly backups for records. ***Initial strategy*** The strategy I have worked out so far is obviously using rsync (over NFS, since it is the only mean supported by all of the computers I am dealing with). First backup is done to and empty destination. Then the destination is tar'ed and compressed locally on the Linux backup server for storage. For each following backup, the previous archive is extracted to a temporary directory, rsync'ed with the source and re-tar'ed and compressed to another archive. The temp directory is then deleted. The monthly backups of one year ago and yearly ones of 6 years ago are deleted. So over a five year period, I will have about 17-18 (let's just say 20) .tar.gz archives that are rotating. The particularity of these backups is that a very large amount of hard-links exists in the filesystems being backed up. So I use -aH flags of rsync. ***What I think of doing*** Now, I read this other very neat way of doing rotating backups using --link-dest flag. You can find the whole explanation here: http://www.mikerubel.org/computers/rsync_snapshots/#Incremental. So what I am thinking of doing now is: The fist backup is done to an empty destination. Than, each following backup will be done to a freshly created empty dir with current date as name but with --link-dest option pointing at the previous (latest) backup directory. Like this, if I understand right, each new backup will be hard-linked to the previous backup, and only the new changes will be stored separately. With this new method, I will save time of compressing, uncompressing archives and supposedly the disk space used by backups will be one full backup + deltas of 17 other ones instead of 17 x compressed full backup like in the method I explained before. Here are my questions: 1) Now, what happens if I use the --delete option with --link-dest, and some files are deleted on the source? Are they going to be deleted from all of the hard-linked previous backups (thus corrupting my previous backups) or will they be somehow subtracted only in the new backup directory? 2) If I understand right, each new backup will increase the hard link count of all unchanged files by one. So after a five year period I will have the hard link count increased by 17. What happens if I now recover the latest backup back to the source? Will the hard link count go back to what it was on the source when the backup was done or will it stay the same (incremented by 17)? Is this new method better than the one I firstly proposed, or it just introduces unnecessary risks? I thank you for having patience to read though all of it and I hope to hear an expert opinion soon. -- View this message in context: http://old.nabble.com/rsync---link-dest%2C---delete-and-hard-link-count-tp27468696p27468696.html Sent from the Samba - rsync mailing list archive at Nabble.com.
Sorry guys, I asnwered one of my questions, which i guess was stupid... Here is the updated post: Hello fellas, Firstly, I would like to apologize for the post being long, but I would appreciate if someone with a good knowledge of the subject gave his opinion. I am working on a backup solution for a series of old and very specialized host computers in aviation industry. I need to do monthly backups of one or two logical volumes (non-OS) on each of the computers and keep the last 12 monthly and 5 last yearly backups for records. ***Initial strategy*** The strategy I have worked out so far is obviously using rsync (over NFS, since it is the only mean supported by all of the computers I am dealing with). First backup is done to and empty destination. Then the destination is tar'ed and compressed locally on the Linux backup server for storage. For each following backup, the previous archive is extracted to a temporary directory, rsync'ed with the source and re-tar'ed and compressed to another archive. The temp directory is then deleted. The monthly backups of one year ago and yearly ones of 6 years ago are deleted. So over a five year period, I will have about 17-18 (let's just say 20) .tar.gz archives that are rotating. The particularity of these backups is that a very large amount of hard-links exists in the filesystems being backed up. So I use -aH flags of rsync. ***What I think of doing*** Now, I read this other very neat way of doing rotating backups using --link-dest flag. You can find the whole explanation here: http://www.mikerubel.org/computers/rsync_snapshots/#Incremental. So what I am thinking of doing now is: The fist backup is done to an empty destination. Than, each following backup will be done to a freshly created empty dir with current date as name but with --link-dest option pointing at the previous (latest) backup directory. Like this, if I understand right, each new backup will be hard-linked to the previous backup, and only the new changes will be stored separately. With this new method, I will save time of compressing, uncompressing archives and supposedly the disk space used by backups will be one full backup + deltas of 17 other ones instead of 17 x compressed full backup like in the method I explained before. Here are my questions: 1) Now, what happens if I use the --delete option with --link-dest, and some files are deleted on the source? Are they going to be deleted from all of the hard-linked previous backups (thus corrupting my previous backups) or the missing file on new backup just wont be hard-linked to previous backup existing files (so they will just appear not to be there)? 2) If I understand right, each new backup will increase the hard link count of all unchanged files by one. So after a five year period I will have the hard link count increased by 17. What happens if I now recover the latest backup back to the source? Will the hard link count go back to what it was on the source when the backup was done or will it stay the same (incremented by 17)? Is this new method better than the one I firstly proposed, or it just introduces unnecessary risks? I thank you for having patience to read though all of it and I hope to hear an expert opinion soon. -- View this message in context: http://old.nabble.com/rsync---link-dest%2C---delete-and-hard-link-count-tp27468696p27469607.html Sent from the Samba - rsync mailing list archive at Nabble.com.
> > The fist backup is done to an empty destination. Than, each following backup > will be done to a freshly created empty dir with current date as name but > with --link-dest option pointing at the previous (latest) backup directory.LBackup also allows you quickly setup this kind of rotating hard linked backup. With LBackup you have the option of receiving an email after each backup or just when there are errors. There are also various other features which you may find useful, such as built in post actions for syncing to remote systems. If you end up going down the tar / untar path and you implement this system and it works well, please consider posting a link to the source code into this thread. This is because someone else may find such a system useful in the future. ------------------------------------------------------------------- This email is protected by LBackup, an open source backup solution. http://www.lbackup.org
I goofed up one part of this: Since things are stored in hierarchy, not flat... it's not the sum of all the full pathnames. But the sum of the names in each directory and the sum of all those sums. Adjust theory minimums accordingly. Still can use great gobs of space though, easily overlooked. And at least on these FS's, the cost lies among the total number of dirs/files and their langth. {00..00}/{00..99}/{00..99} find . | wc 10102 lines 110807 chars MFS: du -cks 20204 ZFS: du -cks 85867 {00000000000..00000000000}/{00000000000..00000000099}/{00000000000..00000000099} find . | wc 10102 lines 382616 chars MFS: du -cks 20406 ZFS: du -cks 61835 {1x<255chars>}/{100*<255chars>}/{100*<255chars>} # 0123456789 * 25.5 find . | wc 10102 lines 7751660 chars MFS: du -cks 25052 ZFS: du -cks 89923 {000..000}/{000..009}/{000..999} find . | wc 10102 lines 140108 chars MFS: du -cks 20124 ZFS: du -cks 62526 {00000..31999} find . | wc 32001 lines 256002 chars MFS: du -cks 64528 ZFS: du -cks 276153 {00000..31999} # not mkdir, touch $i find . | wc 32001 lines 256002 chars MFS: du -cks 528 ZFS: du -cks 20153 {00000..20000} # not mkdir, touch <240 0 chars>$i find . | wc 20002 lines 4960250 chars MFS: du -cks 5024 ZFS: du -cks 14185 UFS: numbers same as MFS, only much slower ZFS: seems to make some adjustments on subsequent runs ALL: these FS's were quite full
On Sun, 07 Feb 2010 13:00:04 -0600, <rsync-request at lists.samba.org> wrote:> It's simply that rsync _can_ be made to do all this in one invocation. > Since it has to look at and consider all three of source, prior and current > anyways, it makes sense to enhance it with this printing capability. >I don't have much use for userfriendly bloated scripts like dirvish/etc. > Not to knock them, they're fine for those who use them. I just prefer > putting only what I need into my own along with adding other bits.I'm the author of "snap2", an rsync-based open source rotating snapshot backup shell script with (omygosh!) GUI interface via gtkdialog. I've had to deal with the issue of reporting on files deleted between one snapshot and the previous one. I don't think a snapshot should be deleted just because all it contains are hard links. After all, part of the idea of a snapshot backup is to document the state of the filesystem at at certain point in time. Still, I agree that its useful to know what files are deleted between one snapshot and the next. Therefore, I'd also like to see a switch for rsync that works with --link-dest to make it report on missing files (compared to the hard link reference). In the meantime, you can consider a couple of other ways to get this information: 1. Use cp -al to create the hard links instead of --link-dest. Then when you run rsync, it will report on files "deleted." Of course, that introduces the ownership-permissions bug (hard-linked older snapshots get ownership/permissions of newest snapshot.) 2. Run rsync "in reverse", in report-only mode (dry run). It will look something like this: rsync -vazn /path/to/previous/snapshot/* /path/to/current/snapshot/ You will find this report quite fast to generate. This is the approach I took in the latest version of snap2 (http://www.linuxbackups.org). Of course, rsync rocks!
Possibly Parallel Threads
- Recycling directories and backup performance. Was: Re: rsync --link-dest won't link even if existing file is out of date (fwd)
- Recycling directories and backup performance. Was: Re: rsync --link-dest won't link even if existing file is out of date (fwd)
- rsync --link-dest won't link even if existing file is out of date
- [Bug 8456] New: improve --link-dest bahaviour
- Use of multiple --link-dest options