If you've been watching CVS, you may have noticed that I checked in some new files named wildmatch.c and wildmatch.h. This code implements the shell-style wildcard matching with rsync's extension that "**" matches a "/" but "*" and "?" does not. I have also checked in a new test module which has allowed me to test a few things on all the machines in our build farm. One thing I discovered is that the various fnmatch() calls all seem to handle the character-class boundary cases in conflicting ways (things like "[]-]"). So, one benefit for rsync of switching from fnmatch to wildmatch will be to make the callers of this function behave consistently on all platforms (which primarily affects the exclude code). I've currently got the wildmatch code fully implemented and optimized, and it is looking good so far. Use of it in rsync itself has not yet been checked into CVS, but I'm using it on my systems. Anyone have any concerns or comments on switching over to this new code? Also, if anyone has some good test cases for wildcard matching, it would be good to make the wildmatch test suite even more comprehensive than it already is. ..wayne..
On Sat, Jul 05, 2003 at 04:12:15PM -0700, Wayne Davison wrote:> If you've been watching CVS, you may have noticed that I checked in some > new files named wildmatch.c and wildmatch.h. This code implements the > shell-style wildcard matching with rsync's extension that "**" matches a > "/" but "*" and "?" does not. I have also checked in a new test module > which has allowed me to test a few things on all the machines in our > build farm. One thing I discovered is that the various fnmatch() calls > all seem to handle the character-class boundary cases in conflicting > ways (things like "[]-]"). So, one benefit for rsync of switching from > fnmatch to wildmatch will be to make the callers of this function behave > consistently on all platforms (which primarily affects the exclude > code).If i may ask, why this change? That seems to be missing from your description. Does this introduce any changes in behavior of patterns?> I've currently got the wildmatch code fully implemented and optimized, > and it is looking good so far. Use of it in rsync itself has not yet > been checked into CVS, but I'm using it on my systems. > > Anyone have any concerns or comments on switching over to this new code? > > Also, if anyone has some good test cases for wildcard matching, it would > be good to make the wildmatch test suite even more comprehensive than it > already is.-- ________________________________________________________________ J.W. Schultz Pegasystems Technologies email address: jw@pegasys.ws Remember Cernan and Schmitt
Quoting Wayne Davison <wayned@samba.org>:> If you've been watching CVS, you may have noticed that I checked in > some > new files named wildmatch.c and wildmatch.h. This code implements the > shell-style wildcard matching with rsync's extension that "**" matches[...]> build farm. One thing I discovered is that the various fnmatch() > calls > all seem to handle the character-class boundary cases in conflicting > ways (things like "[]-]"). So, one benefit for rsync of switching > from > fnmatch to wildmatch will be to make the callers of this function > behave > consistently on all platforms (which primarily affects the exclude > code).[...] This might explain why Python implements its own fnmatch.py using regex's.> Anyone have any concerns or comments on switching over to this new > code?Only one concern, and few questions, and a maybe suggestion; The concern: Why the name "wildmatch"? It seems a bit too arbitary to me. I have used the name "efnmatch" (extended fnmatch) for it in my Python implementations. The name "wildmatch" is too generic, whereas "efnmatch" clearly indicates it is an exension to the standard fnmatch. A silly concern I know, but it will make my life easier when I start making Python extension modules out of your code to use in mine :-) Some Questions: How did you implement it (I know, I should just look in CVS, but while I'm typing...)? Does it use regexes or a modified implementation of fnmatch? How does it compare performance-wise with a regex based implementation? The reason I'm curious is Python, for whatever reason, implements fnmatch in Python using regex's rather than using a C python extension (possibly to avoid the fnmatch variations you identified). I'm wondering if it would be worth re- implemnting fnmatch (and efnmatch) as C extension modules. The maybe suggestion: I found by implementing efnmatch using regex's, it was painless to add the ability to use regex's in include/exclude lists. This meant include/exclude lists could be built using either efnmatch wildcards or regex's, as they would all be converted, compiled, and matched as regex's anyway. I don't know how regex matching compares to fnmatch matching performance-wise. I'm also aware that people have expressed concerns about linking in/against largish regex lib's. However, if the option of using regex's for include/excludes is ever going to happen, then it might be an idea to use them for this. Personally, I feel the efnmatch functionality is flexible enough to never require regex's, but I've seen a few enquiries in the past.. ABO
On Sat, Jul 05, 2003 at 04:12:15PM -0700, Wayne Davison wrote:> If you've been watching CVS, you may have noticed that I checked in some > new files named wildmatch.c and wildmatch.h. This code implements the > shell-style wildcard matching with rsync's extension that "**" matches a > "/" but "*" and "?" does not. I have also checked in a new test module > which has allowed me to test a few things on all the machines in our > build farm. One thing I discovered is that the various fnmatch() calls > all seem to handle the character-class boundary cases in conflicting > ways (things like "[]-]"). So, one benefit for rsync of switching from > fnmatch to wildmatch will be to make the callers of this function behave > consistently on all platforms (which primarily affects the exclude > code). > > I've currently got the wildmatch code fully implemented and optimized, > and it is looking good so far. Use of it in rsync itself has not yet > been checked into CVS, but I'm using it on my systems. > > Anyone have any concerns or comments on switching over to this new code? > > Also, if anyone has some good test cases for wildcard matching, it would > be good to make the wildmatch test suite even more comprehensive than it > already is.I see you have make the transition. I've built it and will start using it today. So far it looks good. The only negative i have at this point is that i don't like the indentation. Don't let that trouble you, i can understand why you went with a 4 char shiftwidth. I only noticed it because you were the last person to touch backup.c which had converted tabs into spaces and had indentation errors and other whitespace (or lack thereof) yuckiness and i thought it might have been an editor thing. -- ________________________________________________________________ J.W. Schultz Pegasystems Technologies email address: jw@pegasys.ws Remember Cernan and Schmitt