Hey guys, i tried to create a simple rsync script that should create daily backups from a ZFS storage and put them into a timestamp folder. After creating the initial full backup, the following backups should only contain "new data" and the rest will be referenced via hardlinks (-link-dest) This was at least a simple enough scenario to achieve it with my pathetic scripting skills. This is what i came up with: #!/bin/sh # rsync copy script for rsync pull from FreeNAS to BackupNAS for Buero dataset # Set variables EXPIRED=`date +"%d-%m-%Y" -d "14 days ago"` # Copy previous timefile to timeold.txt if it exists if [ -f "/volume1/rsync/Buero/timenow.txt" ] then yes | cp /volume1/rsync/Buero/timenow.txt /volume1/rsync/Buero/timeold.txt fi # Create current timefile echo `date +"%d-%m-%Y-%H%M"` > /volume1/rsync/Buero/timenow.txt # rsync command if [ -f "/volume1/rsync/Buero/timeold.txt" ] then rsync -aqzh \ --delete --stats --exclude-from=/volume1/rsync/Buero/exclude.txt \ --log-file=/volume1/Backup_Test/logs/rsync-`date +"%d-%m-%Y-%H%M"`.log \ --link-dest=/volume1/Backup_Test/`cat /volume1/rsync/Buero/timeold.txt` \ Test at 192.168.2.2::Test/volume1/Backup_Test/`date +"%d-%m-%Y-%H%M"` else rsync -aqzh \ --delete --stats --exclude-from=/volume1/rsync/Buero/exclude.txt \ --log-file=/volume1/Backup_Buero/logs/rsync-`date +"%d-%m-%Y-%H%M"`.log \ Test at 192.168.2.2::Test/volume1/Backup_Test/`date +"%d-%m-%Y-%H%M"` fi # Delete expired snapshots (2 weeks old) if [ -d /volume1/Backup_Buero/$EXPIRED-* ] then rm -Rf /volume1/Backup_Buero/$EXPIRED-* fi Well, it works but there is a huge flaw with his approach and i am not able to solve it on my own unfortunately. As long as the backups are finishing properly, everything is fine but as soon as one backup job couldn`t be finished for some reason, (like it will be aborted accidently or a power cut occurs) the whole backup chain is messed up and usually the script creates a new full backup which fills up my backup storage. What i would like to achieve is, to improve the script so that a backup run that wasn`t finished properly will be resumed, next time the script triggers. Only if that was successful should the next incremental backup be created so that the files that didn`t changed from the previous backup can be hardlinked properly. I did a little bit of research and i am not sure if i am on the right track here but apparently this can be done with return codes, but i honestly don`t know how to do this. Thank you in advance for your help and sorry if this question may seem foolish to most of you people. Regards Dennis -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.samba.org/pipermail/rsync/attachments/20160619/ca565622/attachment.html>
Dennis Steinkamp <dennis at lightandshadow.tv> wrote:> i tried to create a simple rsync script that should create daily backups from a ZFS storage and put them into a timestamp folder. > After creating the initial full backup, the following backups should only contain "new data" and the rest will be referenced via hardlinks (-link-dest) > ... > Well, it works but there is a huge flaw with his approach and i am not able to solve it on my own unfortunately. > As long as the backups are finishing properly, everything is fine but as soon as one backup job couldn`t be finished for some reason, (like it will be aborted accidently or a power cut occurs) > the whole backup chain is messed up and usually the script creates a new full backup which fills up my backup storage.Yes indeed, this is a typical flaw with many systems - you often need to throw away the partial backup. One option that comes to mind is this : Create the new backup in a directory called (for example) "new" or "in-progress". If, and only if, the backup completes, then rename this to a timestamp. If when you start a new backup, if the in-progress folder exists, then use that and it'll be freshened to the current source state. Also, have you looked at StoreBackup ? http://storebackup.org I does most of this automagically, keeps a definable history (eg one/day for 14 days, one/week for x weeks, one/30d for y years), plus it keeps file hashes so can detect bit-rot in your backups.
Am 19.06.2016 um 19:27 schrieb Simon Hobson:> Dennis Steinkamp <dennis at lightandshadow.tv> wrote: > >> i tried to create a simple rsync script that should create daily backups from a ZFS storage and put them into a timestamp folder. >> After creating the initial full backup, the following backups should only contain "new data" and the rest will be referenced via hardlinks (-link-dest) >> ... >> Well, it works but there is a huge flaw with his approach and i am not able to solve it on my own unfortunately. >> As long as the backups are finishing properly, everything is fine but as soon as one backup job couldn`t be finished for some reason, (like it will be aborted accidently or a power cut occurs) >> the whole backup chain is messed up and usually the script creates a new full backup which fills up my backup storage. > Yes indeed, this is a typical flaw with many systems - you often need to throw away the partial backup. > One option that comes to mind is this : > Create the new backup in a directory called (for example) "new" or "in-progress". If, and only if, the backup completes, then rename this to a timestamp. If when you start a new backup, if the in-progress folder exists, then use that and it'll be freshened to the current source state. > > Also, have you looked at StoreBackup ? http://storebackup.org > I does most of this automagically, keeps a definable history (eg one/day for 14 days, one/week for x weeks, one/30d for y years), plus it keeps file hashes so can detect bit-rot in your backups. > >Thank you for taking the time to answer me. Your suggestion is what i also had in mind but i wasn`t sure if this would be "best practice" To build this idea into my script i probably need to hardcode the target directory rsync writes to (e.g new or in-progress) and move the directory name to a timestamp only after rsync gave a return code of 0, am i correct? (or return code 0 and 24?) As for StoreBackup, it really does sound nice but i have to do all of this from the console of a 2bay synology nas, so its not that easy to use 3rd party software that may has other dependencies, the synology system doesn`t meet.
Rely on the other answers here as to how to do it right. I just want to mention a few things in your script.> yes | cp /volume1/rsync/Buero/timenow.txt > /volume1/rsync/Buero/timeold.txtYes is a program which puts out "Y" (or whatever you tell it to) forever - not what you want - and cp does not accept input from a pipe unless the first argument is "-" or some similar fancier construction. You can probably just leave off the "yes | " and have the statement work exactly as it does now. It looks like your EXPIRED logic will only find a directory which *exactly* matches that date. You might look at using something like a find command to find directories older than 14 days. Some find options which might help: -ctime 14 specifies finding things modified more than 14 days ago -type d specifies finding only directories -maxdepth 1 specifies finding things only one level below the path find starts at -exec ls -l {} \; specifies running a command on every result which is returned - in this case, an ls which can't hurt anything. You can replace ls with something like rm -rf {} when you're *very* sure the command is finding *exactly* what you want it to. I didn't put the whole command together because until you understand how it works, you don't want to try something that might delete a bunch of things beyond what you actually want deleted. Joe On 06/19/2016 08:22 AM, Dennis Steinkamp wrote:> Hey guys, > > i tried to create a simple rsync script that should create daily > backups from a ZFS storage and put them into a timestamp folder. > After creating the initial full backup, the following backups should > only contain "new data" and the rest will be referenced via hardlinks > (-link-dest) > > This was at least a simple enough scenario to achieve it with my > pathetic scripting skills. This is what i came up with: > > #!/bin/sh > > # rsync copy script for rsync pull from FreeNAS to BackupNAS for Buero > dataset > > # Set variables > EXPIRED=`date +"%d-%m-%Y" -d "14 days ago"` > > # Copy previous timefile to timeold.txt if it exists > if [ -f "/volume1/rsync/Buero/timenow.txt" ] > then > yes | cp /volume1/rsync/Buero/timenow.txt > /volume1/rsync/Buero/timeold.txt > fi > # Create current timefile > echo `date +"%d-%m-%Y-%H%M"` > /volume1/rsync/Buero/timenow.txt > # rsync command > if [ -f "/volume1/rsync/Buero/timeold.txt" ] > then > rsync -aqzh \ > --delete --stats --exclude-from=/volume1/rsync/Buero/exclude.txt \ > --log-file=/volume1/Backup_Test/logs/rsync-`date > +"%d-%m-%Y-%H%M"`.log \ > --link-dest=/volume1/Backup_Test/`cat > /volume1/rsync/Buero/timeold.txt` \ > Test at 192.168.2.2::Test/volume1/Backup_Test/`date +"%d-%m-%Y-%H%M"` > else > rsync -aqzh \ > --delete --stats --exclude-from=/volume1/rsync/Buero/exclude.txt \ > --log-file=/volume1/Backup_Buero/logs/rsync-`date > +"%d-%m-%Y-%H%M"`.log \ > Test at 192.168.2.2::Test/volume1/Backup_Test/`date +"%d-%m-%Y-%H%M"` > fi > > # Delete expired snapshots (2 weeks old) > if [ -d /volume1/Backup_Buero/$EXPIRED-* ] > then > rm -Rf /volume1/Backup_Buero/$EXPIRED-* > fi > > Well, it works but there is a huge flaw with his approach and i am not > able to solve it on my own unfortunately. > As long as the backups are finishing properly, everything is fine but > as soon as one backup job couldn`t be finished for some reason, (like > it will be aborted accidently or a power cut occurs) > the whole backup chain is messed up and usually the script creates a > new full backup which fills up my backup storage. > > What i would like to achieve is, to improve the script so that a > backup run that wasn`t finished properly will be resumed, next time > the script triggers. > Only if that was successful should the next incremental backup be > created so that the files that didn`t changed from the previous backup > can be hardlinked properly. > > I did a little bit of research and i am not sure if i am on the right > track here but apparently this can be done with return codes, but i > honestly don`t know how to do this. > Thank you in advance for your help and sorry if this question may seem > foolish to most of you people. > > Regards > > Dennis > > > > > > > > > >
The scripts I use analyze the rsync log after it completes and then sftp's a summary to the root of the just completed rsync. If no summary is found or the summary is that it failed, the folder rotation for that set is skipped and that folder is re-used on the subsequent rsync. The key here is that the folder rotation script runs separately from the rsync script(s). For each entity I want to rsync, I create a named folder to identify it and the rsync'd data is held in sub-folders: daily.[1-7] and monthly.[1-3] When I rsync, I rsync into daily.0 using daily.1 as the link-dest. Then the rotation script checks daily.0/rsync.summary - and if it worked, it removes daily.7 and renames the daily folders. On the first of the month, the rotation script removes monthly.3, renames the other 2 and makes a complete hard-link copy of daily.1 to monthly.1 It's been running now for about 4 years and, in my environment, the 10 copies take about 4 times the space of a single copy. (we do complete copies of linux servers - starting from /) If there's a good spot to post the scripts, I'd be glad to put them up. -- Larry Irwin Cell: 864-525-1322 Email: lrirwin at alum.wustl.edu Skype: larry_irwin About: http://about.me/larry_irwin On 06/19/2016 01:27 PM, Simon Hobson wrote:> Dennis Steinkamp <dennis at lightandshadow.tv> wrote: > >> i tried to create a simple rsync script that should create daily backups from a ZFS storage and put them into a timestamp folder. >> After creating the initial full backup, the following backups should only contain "new data" and the rest will be referenced via hardlinks (-link-dest) >> ... >> Well, it works but there is a huge flaw with his approach and i am not able to solve it on my own unfortunately. >> As long as the backups are finishing properly, everything is fine but as soon as one backup job couldn`t be finished for some reason, (like it will be aborted accidently or a power cut occurs) >> the whole backup chain is messed up and usually the script creates a new full backup which fills up my backup storage. > Yes indeed, this is a typical flaw with many systems - you often need to throw away the partial backup. > One option that comes to mind is this : > Create the new backup in a directory called (for example) "new" or "in-progress". If, and only if, the backup completes, then rename this to a timestamp. If when you start a new backup, if the in-progress folder exists, then use that and it'll be freshened to the current source state. > > Also, have you looked at StoreBackup ? http://storebackup.org > I does most of this automagically, keeps a definable history (eg one/day for 14 days, one/week for x weeks, one/30d for y years), plus it keeps file hashes so can detect bit-rot in your backups. > >
On 19 June 2016 at 10:27, Simon Hobson <linux at thehobsons.co.uk> wrote:> Dennis Steinkamp <dennis at lightandshadow.tv> wrote: > >> i tried to create a simple rsync script that should create daily backups from a ZFS storage and put them into a timestamp folder. >> After creating the initial full backup, the following backups should only contain "new data" and the rest will be referenced via hardlinks (-link-dest) >> ... >> Well, it works but there is a huge flaw with his approach and i am not able to solve it on my own unfortunately. >> As long as the backups are finishing properly, everything is fine but as soon as one backup job couldn`t be finished for some reason, (like it will be aborted accidently or a power cut occurs) >> the whole backup chain is messed up and usually the script creates a new full backup which fills up my backup storage. > > Yes indeed, this is a typical flaw with many systems - you often need to throw away the partial backup. > One option that comes to mind is this : > Create the new backup in a directory called (for example) "new" or "in-progress". If, and only if, the backup completes, then rename this to a timestamp. If when you start a new backup, if the in-progress folder exists, then use that and it'll be freshened to the current source state.I have an extremely similar script for my backups and that's exactly what I do to deal with backups that are stopped mid-way, either by power failures or by me. I rsync to a .tmp-$target directory, where $target is what I'm backing up. I have separate backups for my rootfs and /home. I also start the whole thing under ionice so that my computer doesn't get slow from all this I/O. Lastly, before renaming the .tmp-$target to the final directory I do a `sync -f` because rsync doesn't seem to call fsync() when copying files and you can have a failed backup if a power failure happens after the rename(). Here is my script: #!/bin/bash set -o errexit set -o pipefail target=$1 case "$target" in home) source=/home ;; root) source=/ ;; esac PATHTOBACKUP=/root/backup date=$(date --utc "+%Y-%m-%dT%H:%M:%S") ionice --class 3 rsync \ --archive \ --verbose \ --one-file-system \ --sparse \ --delete \ --compress \ --log-file=$PATHTOBACKUP/.tmp-$target.log \ --link-dest=$PATHTOBACKUP/$target-current \ $source $PATHTOBACKUP/.tmp-$target sync -f $PATHTOBACKUP/.tmp-$target mv $PATHTOBACKUP/.tmp-$target.log $PATHTOBACKUP/$target-$date.log mv $PATHTOBACKUP/.tmp-$target $PATHTOBACKUP/$target-$date ln --symbolic --force --no-dereference $target-$date $PATHTOBACKUP/$target-current