I would like to propose some improvements to rsync's filters. (1) Add a notation that makes a pattern match both a file and everything under it if it happens to be a directory. One possibility: ending a path in //. "+ mydir//" would be equivalent to "+ mydir" "+ mydir/**". Just some syntactic sugar. (2) Add a third sender behavior to "hide" and "show": "traverse" (T). If a directory is to be "traversed", the sender scans it for files that are "shown" and includes any such files in the file list. If --no-implied-dirs is not given, the sender also sends their parent directories. With "traverse", it would be really easy to include certain trees, which seems to be a very common desire: S /foo/bar/wanted-tree-1// S /foo/baz/wanted-tree-2// T /** (3) The "protect" (P) receiver behavior is a misnomer, since the receiver still allows a sender file to overwrite a protected file. Add a new receiver behavior, "lock" (L). If the receiver is asked to change a locked file in any way or create a file at a locked path (i.e. if a file existed there it would be locked), the receiver complains and does nothing. Essentially, the receiver behaves as if paths matching a "lock" rule were illegal on its filesystem and no files existed at any of those paths. (I think "protect" should really have been "keep" and "lock" is the true sense of "protect", but it's too late now.) (4) "Protect" and "lock" rules should accept a modifier specifying that it's OK to delete a protected or locked file because its parent is being deleted. When sending files to a Subversion working copy, one might want to let a directory's Subversion metadata perish with the directory itself using something like "Ld .svn//" (where "d" is the new modifier). If this modifier is not specified, rsync simply leaves the parent directories lying around as it does now. It turns out that "--backup" interacts in two interesting way with receiver filters: (1) If one runs "rsync --backup --del a/ b/" and b/ contains a file that a/ doesn't, rsync will back the file up; if one runs the command again, rsync will delete the backup file because it is not matched in a/. This is not good! I feel that "--backup" should automatically generate a filter "P *~" (or maybe "L *~"), where ~ stands for the backup suffix, to stop this from happening. (2) Suppose "--backup" were changed so that, when rsync needs to delete a directory, it backs up the entire directory as a unit (by renaming or moving it) instead of backing up individual files inside it. I'm not saying this behavior is a good idea. Incidentally, the way a NetWare file server keeps "deleted" directories around in case the user asks to salvage them corresponds to this behavior, not rsync's current behavior. Before backing up a directory, should rsync scan it for locked or protected files? If such a file is found, should it remain at its original path (which would require splitting the directory in two), or should it remain in place inside the directory as the directory is moved? The second behavior is probably appropriate if the applicable filter had the new "d" modifier, but I'm not sure what is best in other cases. -- Matt McCutchen, ``hashproduct'' hashproduct@verizon.net -- http://hashproduct.metaesthetics.net/
On Tue, Dec 06, 2005 at 06:13:14PM -0500, Matt McCutchen wrote:> "+ mydir//" would be equivalent to "+ mydir" "+ mydir/**".Hmm. We do allow "**" to be empty in the pattern "+ **/foo" (matching /foo). So really, "+ mydir/**" should be enough to also match "mydir/" by itself. I don''t see a downside to that change (since the rule would currently be ineffective unless mydir was also included, its presence in the rules indicates a desire to have that directory included).> (2) Add a third sender behavior to "hide" and "show": "traverse" (T).That might be better as a command-line option, which would withhold the adding of all directories to the file-list until a non-directory required them to be present. Perhaps --exclude-empty-dirs.> S /foo/bar/wanted-tree-1// > S /foo/baz/wanted-tree-2// > T /**That example would be better solved by using -R or --files-from. The case where too many directories get sent is usually due to something like "+ */", "+ *.zip", "- *". Your solution could handle that, but so would a command-line option.> (3) The "protect" (P) receiver behavior is a misnomer, since the > receiver still allows a sender file to overwrite a protected file.The docs specifically say that it only protects against deletion, but "keep" may indeed have been a better word choice. One of the deciding factors was that I wanted each set of opposite rules (in this case, risk and protect) to be intuitively associated, and I''m not sure what I would have chosen for an opposite of "keep" (though "risk" probably works OK).> Add a new receiver behavior, "lock" (L). If the receiver is asked to > change a locked file in any way or create a file at a locked path > (i.e. if a file existed there it would be locked), the receiver > complains and does nothing.That sounds like a useful addition.> (4) "Protect" and "lock" rules should accept a modifier specifying > that it''s OK to delete a protected or locked file because its parent > is being deleted.Another nice idea. The Subversion problem you mentioned recently came up on the list, and this would provide a good way to solve this.> I feel that "--backup" should automatically generate a filter "P *~" > (or maybe "L *~"), where ~ stands for the backup suffix, to stop this > from happening.I''ve considered this in the past without being swayed, but am now coming around to thinking that this would be an improvement.> (2) Suppose "--backup" were changed so that, when rsync needs to > delete a directory, it backs up the entire directory as a unitA problem comes about if there have already been backed-up files from that hierarchy in the backup area (the directory will already exist) and the backup area might be on a different drive, so we''d need to support the current algorithm for when the new algorithm failed. Thus, I think I''ll just leave it as it as it is.> Before backing up a directory, should rsync scan it for locked or > protected files?Yes, if files are being protected (without your suggested parental- delete modifier), they need to remain unaffected in the hierarchy. ..wayne..