thr3ads.net - rsync - New wildmatch code in CVS [Jul 2003]

If this information is useful, please help other people find it:
Share via:

Wayne Davison

2003-Jul-06 09:12 UTC

New wildmatch code in CVS

If you've been watching CVS, you may have noticed that I checked in some
new files named wildmatch.c and wildmatch.h.  This code implements the
shell-style wildcard matching with rsync's extension that "**"
matches a
"/" but "*" and "?" does not.  I have also checked
in a new test module
which has allowed me to test a few things on all the machines in our
build farm.  One thing I discovered is that the various fnmatch() calls
all seem to handle the character-class boundary cases in conflicting
ways (things like "[]-]").  So, one benefit for rsync of switching
from
fnmatch to wildmatch will be to make the callers of this function behave
consistently on all platforms (which primarily affects the exclude
code).

I've currently got the wildmatch code fully implemented and optimized,
and it is looking good so far.  Use of it in rsync itself has not yet
been checked into CVS, but I'm using it on my systems.

Anyone have any concerns or comments on switching over to this new code?

Also, if anyone has some good test cases for wildcard matching, it would
be good to make the wildmatch test suite even more comprehensive than it
already is.

..wayne..

jw schultz

2003-Jul-06 11:15 UTC

head link

New wildmatch code in CVS

On Sat, Jul 05, 2003 at 04:12:15PM -0700, Wayne Davison
wrote:> If you've been watching CVS, you may have noticed that I checked in
some
> new files named wildmatch.c and wildmatch.h.  This code implements the
> shell-style wildcard matching with rsync's extension that
"**" matches a
> "/" but "*" and "?" does not.  I have also
checked in a new test module
> which has allowed me to test a few things on all the machines in our
> build farm.  One thing I discovered is that the various fnmatch() calls
> all seem to handle the character-class boundary cases in conflicting
> ways (things like "[]-]").  So, one benefit for rsync of
switching from
> fnmatch to wildmatch will be to make the callers of this function behave
> consistently on all platforms (which primarily affects the exclude
> code).
If i may ask, why this change?  That seems to be missing
from your description.

Does this introduce any changes in behavior of patterns?
> I've currently got the wildmatch code fully implemented and optimized,
> and it is looking good so far.  Use of it in rsync itself has not yet
> been checked into CVS, but I'm using it on my systems.
> 
> Anyone have any concerns or comments on switching over to this new code?
> 
> Also, if anyone has some good test cases for wildcard matching, it would
> be good to make the wildmatch test suite even more comprehensive than it
> already is.
-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw@pegasys.ws

		Remember Cernan and Schmitt

Donovan Baarda

2003-Jul-09 22:02 UTC

head link

New wildmatch code in CVS

Quoting Wayne Davison <wayned@samba.org>:
> If you've been watching CVS, you may have noticed that I checked in
> some
> new files named wildmatch.c and wildmatch.h.  This code implements the
> shell-style wildcard matching with rsync's extension that
"**" matches
[...]> build farm.  One thing I discovered is that the various fnmatch()
> calls
> all seem to handle the character-class boundary cases in conflicting
> ways (things like "[]-]").  So, one benefit for rsync of
switching
> from
> fnmatch to wildmatch will be to make the callers of this function
> behave
> consistently on all platforms (which primarily affects the exclude
> code).[...]

This might explain why Python implements its own fnmatch.py using regex's.
> Anyone have any concerns or comments on switching over to this new
> code?
Only one concern, and few questions, and a maybe suggestion;

The concern: 

Why the name "wildmatch"? It seems a bit too arbitary to me. I have
used the
name "efnmatch" (extended fnmatch) for it in my Python
implementations. The
name "wildmatch" is too generic, whereas "efnmatch" clearly
indicates it is an
exension to the standard fnmatch. A silly concern I know, but it will make my 
life easier when I start making Python extension modules out of your code to 
use in mine :-)

Some Questions:

How did you implement it (I know, I should just look in CVS, but while I'm 
typing...)? Does it use regexes or a modified implementation of fnmatch? How 
does it compare performance-wise with a regex based implementation?

The reason I'm curious is Python, for whatever reason, implements fnmatch in
Python using regex's rather than using a C python extension (possibly to
avoid
the fnmatch variations you identified). I'm wondering if it would be worth
re-
implemnting fnmatch (and efnmatch) as C extension modules.

The maybe suggestion:

I found by implementing efnmatch using regex's, it was painless to add the 
ability to use regex's in include/exclude lists. This meant include/exclude 
lists could be built using either efnmatch wildcards or regex's, as they
would
all be converted, compiled, and matched as regex's anyway.

I don't know how regex matching compares to fnmatch matching
performance-wise.
I'm also aware that people have expressed concerns about linking in/against 
largish regex lib's. However, if the option of using regex's for 
include/excludes is ever going to happen, then it might be an idea to use them 
for this.

Personally, I feel the efnmatch functionality is flexible enough to never 
require regex's, but I've seen a few enquiries in the past..

ABO

jw schultz

2003-Jul-31 19:40 UTC

head link

New wildmatch code in CVS

On Sat, Jul 05, 2003 at 04:12:15PM -0700, Wayne Davison
wrote:> If you've been watching CVS, you may have noticed that I checked in
some
> new files named wildmatch.c and wildmatch.h.  This code implements the
> shell-style wildcard matching with rsync's extension that
"**" matches a
> "/" but "*" and "?" does not.  I have also
checked in a new test module
> which has allowed me to test a few things on all the machines in our
> build farm.  One thing I discovered is that the various fnmatch() calls
> all seem to handle the character-class boundary cases in conflicting
> ways (things like "[]-]").  So, one benefit for rsync of
switching from
> fnmatch to wildmatch will be to make the callers of this function behave
> consistently on all platforms (which primarily affects the exclude
> code).
> 
> I've currently got the wildmatch code fully implemented and optimized,
> and it is looking good so far.  Use of it in rsync itself has not yet
> been checked into CVS, but I'm using it on my systems.
> 
> Anyone have any concerns or comments on switching over to this new code?
> 
> Also, if anyone has some good test cases for wildcard matching, it would
> be good to make the wildmatch test suite even more comprehensive than it
> already is.
I see you have make the transition.

I've built it and will start using it today.  So far it
looks good.

The only negative i have at this point is that i don't like
the indentation.  Don't let that trouble you, i can
understand why you went with a 4 char shiftwidth.  I only
noticed it because you were the last person to touch
backup.c which had converted tabs into spaces and had
indentation errors and other whitespace (or lack thereof)
yuckiness and i thought it might have been an editor thing.

-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw@pegasys.ws

		Remember Cernan and Schmitt

Reasonably Related Threads

Search for more seemingly similar threads

rsync - Jul 2003 - New wildmatch code in CVS

New wildmatch code in CVS

New wildmatch code in CVS

New wildmatch code in CVS

New wildmatch code in CVS

Reasonably Related Threads