Edwin Olson
2004-Apr-14 23:21 UTC
Feature proposal and implementation plan: --delete-delay
Hi folks, One feature I've wanted in rsync is the ability to delete files that no longer exist in the source *after some specified grace period.* The functionality I'm looking for is a backup system that won't actually delete files until a week or two after the user does. This would: 1. Protect against accidental file deletions; the user would have some time to realize the mistake and retrieve a backup 2. Keep the backup machine tidier, and reflect more closely the source directory's structure. It seems to me the difficulty is in noticing when the file was "first" deleted and then keeping track of how much time has elapsed. Here is a proposed method of doing this: 1. When a file is first discovered to be obsolete due to the source file having been deleted, rename the local file "originalfilename__RSYNC_YYYYMMDD". 2. Whenever we are asked to delete a file (in delete_one), check the filename. If it is in the form above, check to see if the required amount of time has elapsed (via a --delete-delay=NDAYS argument). Delete as appropriate. If the file isn't in the correct form, the file only became obsolete just now; rename the file with the current date. 3. These modified filenames would be exposed to the user. (Yes, a big ugly.) An alternative strategy which would preserve the filenames would be to have an auxillary database file that stored the timestamps. However, I can imagine a thousand things that could go wrong here. I also considered encoding the timeout information in the modified-time/access-time fields, but that seems very hackish and, again, I can imagine a thousand things that could go horribly wrong. What do you think? I'm willing to do the work, but I'd like to implement the feature in the best possible way (any improvements?) and I was wondering (provided the implementation is adequately robust/clean) if you would be willing to add such a patch to CVS. (Or is it "philosophically" the incorrect behavior and thus undesireable regardless of its implementation?) Thanks, Ed
Just a note: I do something similar to what you describe, using a Perl script to invoke rsync with the --backup-dir= option. I back up system drives to a considerably larger archive volume using the backup-dir= option to shunt old versions of files (and deleted files) into hierarchies named after the dates they were originally synch'ed across. Before the daily backup runs, though, my Perl script "preens" the backup filesystem by checking for old archived versions of files that have reached "retirement age" and then deletes them. For example, on /backup, there is /backup/data and /backup/archive. If on 2004-04-14 I run my synch script and rsync needs to delete /backup/data/deadfile.txt, it instead moves it using backup-dir= to /backup/archive/2004-04-14/deadfile.txt. When I run my synch script again a month later, it first deletes the /backup/archive/2004-04-14 tree in its entirety before proceeding to the actual rsync'ing. In actual point of fact, my preening is triggered by drive capacity and usage levels, not age of archives, and once triggered it eliminates archives from older to newer until usage is back down to a desired percentage - but you get the idea. Jim Salter JRS Systems>> Hi folks, >> >> One feature I've wanted in rsync is the ability to delete files that >> no longer exist in the source *after some specified grace period.* >> >> The functionality I'm looking for is a backup system that won't >> actually delete files until a week or two after the user does. This >> would: >> >> 1. Protect against accidental file deletions; the user would have >> some time to realize the mistake and retrieve a backup >> 2. Keep the backup machine tidier, and reflect more closely the >> source directory's structure. >> >> It seems to me the difficulty is in noticing when the file was "first" >> deleted and then keeping track of how much time has elapsed. >> >> Here is a proposed method of doing this: >> >> 1. When a file is first discovered to be obsolete due to the source >> file having been deleted, rename the local file >> "originalfilename__RSYNC_YYYYMMDD". >> 2. Whenever we are asked to delete a file (in delete_one), check >> the filename. If it is in the form above, check to see if the required >> amount of time has elapsed (via a --delete-delay=NDAYS argument). >> Delete as appropriate. If the file isn't in the correct form, the file >> only became obsolete just now; rename the file with the current date. >> 3. These modified filenames would be exposed to the user. (Yes, a >> big ugly.) >> >> An alternative strategy which would preserve the filenames would be to >> have an auxillary database file that stored the timestamps. However, I >> can imagine a thousand things that could go wrong here. >> >> I also considered encoding the timeout information in the >> modified-time/access-time fields, but that seems very hackish and, >> again, I can imagine a thousand things that could go horribly wrong. >> >> What do you think? I'm willing to do the work, but I'd like to >> implement the feature in the best possible way (any improvements?) and >> I was wondering (provided the implementation is adequately >> robust/clean) if you would be willing to add such a patch to CVS. (Or >> is it "philosophically" the incorrect behavior and thus undesireable >> regardless of its implementation?) >> >> Thanks, >> >> Ed >> >> > >
Reasonably Related Threads
- rsync error: partial transfer (code 23) at main.c(576)
- "-b --suffix '' --delete --backup-dir /path/" combination does not act as expected
- rsync : permission denied
- rsync version 2.6.3pre1 protocol version 28
- compounding precipitation based on whether falls within a day