Wayne Piekarski
2019-Dec-18 23:47 UTC
Skip creating files in --backup-dir if content has not changed
I am using rsync with --backup --backup-dir to keep copies of files which have changed as part of an incremental backup system. However, if only the timestamp has changed, it creates a copy of the file in --backup-dir, and if thousands of large files have their timestamps changed, this can waste a lot of disk space on something which hasn't really changed. Interestingly, if you use --checksum, rsync will not create a file in --backup-dir unless the contents are truly different, but it will fix up the timestamp on the remote end to match. This is what I want, but I just don't want to pay the performance penalty of running --checksum all the time. Here is an example that shows the problem: mkdir ./SRC echo hello > ./SRC/a echo hello > ./SRC/b rsync -av ./SRC/ ./DEST/ touch ./SRC/* ls -al --full-time ./SRC/ ./DEST/ # Creates copies in BACKUP, even though contents are the same rsync -av --backup-dir=`pwd`/BACKUP/ ./SRC/ ./DEST/ After this run, the BACKUP directory will contain copies of both a and b even though neither actually changed. If you add --checksum, then it avoids creating a copy, but still syncs the timestamps correctly. touch ./SRC/* # Does not create any copies in BACKUP since nothing changed rsync -av --checksum --backup-dir=`pwd`/BACKUP/ ./SRC/ ./DEST/ The problem with --checksum is that for hundreds of gigabytes of data, it can be very slow to run over every file, especially if the timestamps are mostly actually the same. But without it, the delta algorithm in rsync has already decided to make a backup copy before it realizes later that nothing has changed. Is there a flag I can add to rsync that will tell it to only create a backup file if something actually changed, saving lots of wasted backup space? thanks, Wayne
Kevin Korb
2019-Dec-19 00:39 UTC
Skip creating files in --backup-dir if content has not changed
The reason for that is pretty simple... Rsync isn't reading the existing file to find out that it is the same. Doing so would normally be a waste of rsync's time. A better question is why are your files changing timestamps when the data is the same. On 12/18/19 6:47 PM, Wayne Piekarski via rsync wrote:> I am using rsync with --backup --backup-dir to keep copies of files > which have changed as part of an incremental backup system. However, if > only the timestamp has changed, it creates a copy of the file in > --backup-dir, and if thousands of large files have their timestamps > changed, this can waste a lot of disk space on something which hasn't > really changed. > > Interestingly, if you use --checksum, rsync will not create a file in > --backup-dir unless the contents are truly different, but it will fix up > the timestamp on the remote end to match. This is what I want, but I > just don't want to pay the performance penalty of running --checksum all > the time. > > Here is an example that shows the problem: > > mkdir ./SRC > echo hello > ./SRC/a > echo hello > ./SRC/b > rsync -av ./SRC/ ./DEST/ > touch ./SRC/* > ls -al --full-time ./SRC/ ./DEST/ > # Creates copies in BACKUP, even though contents are the same > rsync -av --backup-dir=`pwd`/BACKUP/ ./SRC/ ./DEST/ > > After this run, the BACKUP directory will contain copies of both a and b > even though neither actually changed. If you add --checksum, then it > avoids creating a copy, but still syncs the timestamps correctly. > > touch ./SRC/* > # Does not create any copies in BACKUP since nothing changed > rsync -av --checksum --backup-dir=`pwd`/BACKUP/ ./SRC/ ./DEST/ > > The problem with --checksum is that for hundreds of gigabytes of data, > it can be very slow to run over every file, especially if the timestamps > are mostly actually the same. But without it, the delta algorithm in > rsync has already decided to make a backup copy before it realizes later > that nothing has changed. > > Is there a flag I can add to rsync that will tell it to only create a > backup file if something actually changed, saving lots of wasted backup > space? > > thanks, > Wayne > >-- ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._., Kevin Korb Phone: (407) 252-6853 Systems Administrator Internet: FutureQuest, Inc. Kevin at FutureQuest.net (work) Orlando, Florida kmk at sanitarium.net (personal) Web page: https://sanitarium.net/ PGP public key available on web site. ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._., -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 195 bytes Desc: OpenPGP digital signature URL: <http://lists.samba.org/pipermail/rsync/attachments/20191218/4317f349/signature.sig>
Possibly Parallel Threads
- Skip based on checksum not worked as expected when using with complex filter rules.
- --compare-dest -- copy ONLY files with content-differences between 2 directories... to a third
- Solution For Rsync and Cygwin Daylight Savings Timezone Problems
- Skip based on checksum not worked as expected when using with complex filter rules.
- How can the --backup-dir be set to remote machine?