As threatened a while back, there are some exclude/include bugs that I'd like to see fixed in rsync. Here is the patch: http://www.blorf.net/rsync-exclude.patch This fixes the following bugs: - A non-anchored, slash-including pattern with a wildcard needs to be matched at the end of the path (e.g. "CVS/R* should match throughout the tree, not just at /CVS/R*). - A leading "**/" should match at the root of the transferred tree (e.g. "**/foo" should also match "/foo"). - A path that includes an infix or trailing "**" but no slash should match the entire path, not just the trailing name (e.g. "foo**bar" should match "/some/foo/bar"). The docs need to mention this. The remaining bug I didn't fix (yet): - The presence of a "**" turns all "*"s into "**"s. Fixing this last bug would require either switching to a new wildcard- matching routine (like the one I posted a while back), or switching to using regular-expression matching and converting wildcard strings into regex strings. I'm thinking that it might be nice to go the regex route and add an option to allow the user to specify all exclude/include strings as regular expressions instead of wildcard strings. Comments? ..wayne..
On Tue, Apr 22, 2003 at 11:48:57AM -0700, Wayne Davison wrote:> As threatened a while back, there are some exclude/include bugs that I'd > like to see fixed in rsync. Here is the patch: > > http://www.blorf.net/rsync-exclude.patch > > This fixes the following bugs: > > - A non-anchored, slash-including pattern with a wildcard needs to be > matched at the end of the path (e.g. "CVS/R* should match throughout > the tree, not just at /CVS/R*). > > - A leading "**/" should match at the root of the transferred tree > (e.g. "**/foo" should also match "/foo"). > > - A path that includes an infix or trailing "**" but no slash should > match the entire path, not just the trailing name (e.g. "foo**bar" > should match "/some/foo/bar"). The docs need to mention this. > > The remaining bug I didn't fix (yet): > > - The presence of a "**" turns all "*"s into "**"s. > > Fixing this last bug would require either switching to a new wildcard- > matching routine (like the one I posted a while back), or switching to > using regular-expression matching and converting wildcard strings into > regex strings. I'm thinking that it might be nice to go the regex route > and add an option to allow the user to specify all exclude/include > strings as regular expressions instead of wildcard strings. > > Comments?Fixing any bugs is good, go for it. Haven't looked at the patch but yours are usually pretty good. As for regex... I would love to go regex with one caveat. Slash needs special treatment. It shouldn't be in the dot character class. Patterns should not be implicitly anchored but directories have a trailing slash. Paths should have leading slash unless ^ assumes an optional one so that /foo/ will still match a top level directory of that name. Something like: SPECIAL REGEX NORMAL REGEX foo/bar foo/bar foo.*bar foo[^/]*bar foo/*bar foo/.*/bar || foo/bar foo/{2}bar foo/[^/]*/bar foo/{3}bar foo/[^/]*/[^/]*/bar /bar$ .*/bar$ ^/{3} ^/[^/]*/[^/]*/[^/]*/ ^/bar/$ ^/bar/$ That way, for instance, this could work rsync -a --includex='^/{,3}.*' --excludex=. . dest for d in */*/* do [ -d $d ] || continue rsync -a $d dest/$d done -- ________________________________________________________________ J.W. Schultz Pegasystems Technologies email address: jw@pegasys.ws Remember Cernan and Schmitt
On Wed, 2003-04-23 at 04:48, Wayne Davison wrote: [...]> The remaining bug I didn't fix (yet): > > - The presence of a "**" turns all "*"s into "**"s. > > Fixing this last bug would require either switching to a new wildcard- > matching routine (like the one I posted a while back), or switching to > using regular-expression matching and converting wildcard strings intoI did exactly this in Python for something I was working on. There are a few tricks to watch out for, but the attached python implementation should translate into C fairly easily. I also have Python implementations of efficient directory scanning and filename matching against include/exclude lists if anyone is interested.> regex strings. I'm thinking that it might be nice to go the regex route > and add an option to allow the user to specify all exclude/include > strings as regular expressions instead of wildcard strings.> Comments?I like the "extended shell pattern" matching. It is a logical extension of shell patterns that is simple but quite powerful for the specific task of filename matching. I also love regex's because you can do damn near anything with them, particularly perl-style regex's. However, I think regex's are overkill for most filename matching, and they are not a perfect fit for the application. Common chars like '.' have a special meaning, and common desired matches are verbose to express, ie '[^/]' for "all but slash". You could modify the regex syntax to make it more "filename friendly", but you are making work for everyone implementing, supporting, documenting and learning the differences. Changing the matching implementation to use regex's does make it simple to add the option of using of regex's at the command line, but I would keep the efnmatch matching as the default. -- ---------------------------------------------------------------- Donovan Baarda http://minkirri.apana.org.au/~abo/ ---------------------------------------------------------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: efnmatch.py Type: text/x-python Size: 2796 bytes Desc: not available Url : http://lists.samba.org/archive/rsync/attachments/20030423/d4ab242a/efnmatch.py
Possibly Parallel Threads
- New wildmatch code in CVS
- --exclude and --delete
- DO NOT REPLY [Bug 7450] New: When a single file is specified, the exclusion of it in the exclude-from file is ignored
- Excluding most and including some problems continue.
- How do you exclude a directory that is a symlink?