thr3ads.net - rsync - Feature proposal and implementation plan: --delete-delay [Apr 2004]

If this information is useful, please help other people find it:
Share via:

Edwin Olson

2004-Apr-14 23:21 UTC

Feature proposal and implementation plan: --delete-delay

Hi folks,

One feature I've wanted in rsync is the ability to delete files that no 
longer exist in the source *after some specified grace period.*

The functionality I'm looking for is a backup system that won't actually
delete files until a week or two after the user does. This would:

    1. Protect against accidental file deletions; the user would have 
some time to realize the mistake and retrieve a backup
    2. Keep the backup machine tidier, and reflect more closely the 
source directory's structure.

It seems to me the difficulty is in noticing when the file was "first"
deleted and then keeping track of how much time has elapsed.

Here is a proposed method of doing this:

    1. When a file is first discovered to be obsolete due to the source 
file having been deleted, rename the local file 
"originalfilename__RSYNC_YYYYMMDD".
    2. Whenever we are asked to delete a file (in delete_one), check the 
filename. If it is in the form above, check to see if the required 
amount of time has elapsed (via a --delete-delay=NDAYS argument). Delete 
as appropriate. If the file isn't in the correct form, the file only 
became obsolete just now; rename the file with the current date.
    3. These modified filenames would be exposed to the user. (Yes, a 
big ugly.)

An alternative strategy which would preserve the filenames would be to 
have an auxillary database file that stored the timestamps. However, I 
can imagine a thousand things that could go wrong here.

I also considered encoding the timeout information in the 
modified-time/access-time fields, but that seems very hackish and, 
again, I can imagine a thousand things that could go horribly wrong.

What do you think? I'm willing to do the work, but I'd like to implement
the feature in the best possible way (any improvements?) and I was 
wondering (provided the implementation is adequately robust/clean) if 
you would be willing to add such a patch to CVS. (Or is it 
"philosophically" the incorrect behavior and thus undesireable 
regardless of its implementation?)

Thanks,

Ed

Jim Salter

2004-Apr-14 23:40 UTC

head link

Feature proposal and implementation plan: --delete-delay

Just a note:

I do something similar to what you describe, using a Perl script to
invoke rsync with the --backup-dir= option.  I back up system drives to
a considerably larger archive volume using the backup-dir= option to
shunt old versions of files (and deleted files) into hierarchies named
after the dates they were originally synch'ed across.  Before the daily
backup runs, though, my Perl script "preens" the backup filesystem by
checking for old archived versions of files that have reached
"retirement age" and then deletes them.

For example, on /backup, there is /backup/data and /backup/archive.  If
on 2004-04-14 I run my synch script and rsync needs to delete
/backup/data/deadfile.txt, it instead moves it using backup-dir= to
/backup/archive/2004-04-14/deadfile.txt.  When I run my synch script
again a month later, it first deletes the /backup/archive/2004-04-14
tree in its entirety before proceeding to the actual rsync'ing.

In actual point of fact, my preening is triggered by drive capacity and
usage levels, not age of archives, and once triggered it eliminates
archives from older to newer until usage is back down to a desired
percentage - but you get the idea.

Jim Salter
JRS Systems
>> Hi folks,
>>
>> One feature I've wanted in rsync is the ability to delete files
that
>> no longer exist in the source *after some specified grace period.*
>>
>> The functionality I'm looking for is a backup system that won't
>> actually delete files until a week or two after the user does. This 
>> would:
>>
>>    1. Protect against accidental file deletions; the user would have 
>> some time to realize the mistake and retrieve a backup
>>    2. Keep the backup machine tidier, and reflect more closely the 
>> source directory's structure.
>>
>> It seems to me the difficulty is in noticing when the file was
"first"
>> deleted and then keeping track of how much time has elapsed.
>>
>> Here is a proposed method of doing this:
>>
>>    1. When a file is first discovered to be obsolete due to the source 
>> file having been deleted, rename the local file 
>> "originalfilename__RSYNC_YYYYMMDD".
>>    2. Whenever we are asked to delete a file (in delete_one), check 
>> the filename. If it is in the form above, check to see if the required 
>> amount of time has elapsed (via a --delete-delay=NDAYS argument). 
>> Delete as appropriate. If the file isn't in the correct form, the
file
>> only became obsolete just now; rename the file with the current date.
>>    3. These modified filenames would be exposed to the user. (Yes, a 
>> big ugly.)
>>
>> An alternative strategy which would preserve the filenames would be to 
>> have an auxillary database file that stored the timestamps. However, I 
>> can imagine a thousand things that could go wrong here.
>>
>> I also considered encoding the timeout information in the 
>> modified-time/access-time fields, but that seems very hackish and, 
>> again, I can imagine a thousand things that could go horribly wrong.
>>
>> What do you think? I'm willing to do the work, but I'd like to 
>> implement the feature in the best possible way (any improvements?) and 
>> I was wondering (provided the implementation is adequately 
>> robust/clean) if you would be willing to add such a patch to CVS. (Or 
>> is it "philosophically" the incorrect behavior and thus
undesireable
>> regardless of its implementation?)
>>
>> Thanks,
>>
>> Ed
>>  
>>
> 
>

Reasonably Related Threads

Search for more reasonably related threads

rsync - Apr 2004 - Feature proposal and implementation plan: --delete-delay

Feature proposal and implementation plan: --delete-delay

Feature proposal and implementation plan: --delete-delay

Reasonably Related Threads