2009/8/25 Török Edwin <edwintorok at gmail.com>:> On 2009-08-25 21:18, Daniel Dunbar wrote: >> Woot! Thanks a bunch Edwin! >> >> Some comments on the patch: >> -- >> I'm not sure if it makes sense to import the man pages, if we only >> expose Regex.h. >> > > I'd like to keep re_format.7, it describes the format of the regex as > accepted by this implementation. > I'll remove regex.3 since its not exposed.Ok.>> Can you make these doxyments? Also, I'd prefer more LLVM style and less Unix >> style enum names, IgnoreCase for example. >> >> Does NOSPEC actually ever make sense to use? Wouldn't clients just use string >> compare? >> > > Indeed, it is worthless for LLVM. > There's also REG_PEND (non-POSIX, but this impl. supports it) that > allows matching patterns > with embedded NULs. Would that be needed/useful for LLVM?Maybe? If its already there and we are fine using this version of the lib always then we might as well include it.>>> + // notbol: The match-beginning-of-line operator always fails to match, >>> + // unless regex was compiled with NEWLINE flag. >>> + // noteol: The match-end-of-line operator always fails to match. >>> + // unless regex was compiled with NEWLINE flag. >>> + >>> + bool matches(const char *string, bool notbol=false, bool noteol=false); >>> + bool match_sub(const char *string, llvm_regmatch_t pmatch[], >>> + unsigned nmatch, bool notbol=false, bool noteol=false); >>> >> >> These should be doxymented, and I don't understand the flags. Does this mean >> that if I *don't* compile with NEWLINE, but *do* want ^/$ support, then >> notbol/noteol need to be false? Do we care about use cases that care about this? >> Could we just always have them as false? >> > > They are meant for cases where you call matches() on portions of a > larger string. > When you call it at offset 0, you want notbol=false, but when you call > it at offset > 0 you want notbol=true. > Ditto for noteol, when the portion of string passed to matches() doesn't > contain the real end of the string. > > If we are always passing the entire string we intend to match to > matches() then we can remove those bools.Ah, that makes sense. For the time being, though, I'd say lets drop them until some client actually cares.>> This won't work on Windows as stands (Regex.h == regex.h), but I think we should >> just keep regex.h private which solves the problem (still nice to rename it to >> avoid confusion, llvm_regex.h would be consistent). >> > > Ok, I'll try to make it private.Ok! I have one more request; since the implementation ends up being spread into a number of files, could we prefix them with something so they stand out inside the Support directory? Thanks again! - Daniel
On 2009-08-27 09:06, Daniel Dunbar wrote:> 2009/8/25 Török Edwin <edwintorok at gmail.com>: > >> On 2009-08-25 21:18, Daniel Dunbar wrote: >> >>> Woot! Thanks a bunch Edwin! >>> >>> Some comments on the patch: >>> -- >>> I'm not sure if it makes sense to import the man pages, if we only >>> expose Regex.h. >>> >>> >> I'd like to keep re_format.7, it describes the format of the regex as >> accepted by this implementation. >> I'll remove regex.3 since its not exposed. >> > > Ok. > > >>> Can you make these doxyments? Also, I'd prefer more LLVM style and less Unix >>> style enum names, IgnoreCase for example. >>> >>> Does NOSPEC actually ever make sense to use? Wouldn't clients just use string >>> compare? >>> >>> >> Indeed, it is worthless for LLVM. >> There's also REG_PEND (non-POSIX, but this impl. supports it) that >> allows matching patterns >> with embedded NULs. Would that be needed/useful for LLVM? >> > > Maybe? If its already there and we are fine using this version of the > lib always then we might as well include it. >I changed the constructor to take StringRef, and now I always use REG_PEND internally, so it works both with char* and std::string (which may not be null-terminated, or may include embedded nuls). You can add a Twine& constructor if needed ;)> >>>> + // notbol: The match-beginning-of-line operator always fails to match, >>>> + // unless regex was compiled with NEWLINE flag. >>>> + // noteol: The match-end-of-line operator always fails to match. >>>> + // unless regex was compiled with NEWLINE flag. >>>> + >>>> + bool matches(const char *string, bool notbol=false, bool noteol=false); >>>> + bool match_sub(const char *string, llvm_regmatch_t pmatch[], >>>> + unsigned nmatch, bool notbol=false, bool noteol=false); >>>> >>>> >>> These should be doxymented, and I don't understand the flags. Does this mean >>> that if I *don't* compile with NEWLINE, but *do* want ^/$ support, then >>> notbol/noteol need to be false? Do we care about use cases that care about this? >>> Could we just always have them as false? >>> >>> >> They are meant for cases where you call matches() on portions of a >> larger string. >> When you call it at offset 0, you want notbol=false, but when you call >> it at offset > 0 you want notbol=true. >> Ditto for noteol, when the portion of string passed to matches() doesn't >> contain the real end of the string. >> >> If we are always passing the entire string we intend to match to >> matches() then we can remove those bools. >> > > Ah, that makes sense. For the time being, though, I'd say lets drop > them until some client actually cares. >Dropped.> >>> This won't work on Windows as stands (Regex.h == regex.h), but I think we should >>> just keep regex.h private which solves the problem (still nice to rename it to >>> avoid confusion, llvm_regex.h would be consistent). >>> >>> >> Ok, I'll try to make it private. >> > > Ok! > > I have one more request; since the implementation ends up being spread > into a number of files, could we prefix them with something so they > stand out inside the Support directory? >They are now all lib/Support/reg* Attached new patch. Best regards, --Edwin -------------- next part -------------- A non-text attachment was scrubbed... Name: regex-support.patch.gz Type: application/gzip Size: 33735 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20090827/612fe731/attachment.bin>
Nice! This looks good to me but probably Chris or someone else should sign off on it. There were two minor warnings on Darwin: -- In file included from /Volumes/Data/Users/ddunbar/llvm/lib/Support/regexec.c:81: /Volumes/Data/Users/ddunbar/llvm/lib/Support/regengine.inc: In function 'sbackref': /Volumes/Data/Users/ddunbar/llvm/lib/Support/regengine.inc:665: warning: control reaches end of non-void function In file included from /Volumes/Data/Users/ddunbar/llvm/lib/Support/regexec.c:130: /Volumes/Data/Users/ddunbar/llvm/lib/Support/regengine.inc: In function 'lbackref': /Volumes/Data/Users/ddunbar/llvm/lib/Support/regengine.inc:665: warning: control reaches end of non-void function /Volumes/Data/Users/ddunbar/llvm/lib/Support/regexec.c: In function 'llvm_regexec': /Volumes/Data/Users/ddunbar/llvm/lib/Support/regexec.c:157: warning: comparison between signed and unsigned -- And one in the unittest: -- /Volumes/Data/Users/ddunbar/llvm/utils/unittest/googletest/include/gtest/gtest.h: In function 'testing::AssertionResult testing::internal::CmpHelperEQ(const char*, const char*, const T1&, const T2&) [with T1 = int, T2 = long unsigned int]': /Volumes/Data/Users/ddunbar/llvm/utils/unittest/googletest/include/gtest/gtest.h:650: instantiated from 'static testing::AssertionResult testing::internal::EqHelper<lhs_is_null_literal>::Compare(const char*, const char*, const T1&, const T2&) [with T1 = int, T2 = size_t, bool lhs_is_null_literal = false]' /Volumes/Data/Users/ddunbar/llvm/unittests/Support/RegexTest.cpp:29: instantiated from here /Volumes/Data/Users/ddunbar/llvm/utils/unittest/googletest/include/gtest/gtest.h:617: warning: comparison between signed and unsigned integer expressions -- - Daniel 2009/8/27 Török Edwin <edwintorok at gmail.com>:> On 2009-08-27 09:06, Daniel Dunbar wrote: >> 2009/8/25 Török Edwin <edwintorok at gmail.com>: >> >>> On 2009-08-25 21:18, Daniel Dunbar wrote: >>> >>>> Woot! Thanks a bunch Edwin! >>>> >>>> Some comments on the patch: >>>> -- >>>> I'm not sure if it makes sense to import the man pages, if we only >>>> expose Regex.h. >>>> >>>> >>> I'd like to keep re_format.7, it describes the format of the regex as >>> accepted by this implementation. >>> I'll remove regex.3 since its not exposed. >>> >> >> Ok. >> >> >>>> Can you make these doxyments? Also, I'd prefer more LLVM style and less Unix >>>> style enum names, IgnoreCase for example. >>>> >>>> Does NOSPEC actually ever make sense to use? Wouldn't clients just use string >>>> compare? >>>> >>>> >>> Indeed, it is worthless for LLVM. >>> There's also REG_PEND (non-POSIX, but this impl. supports it) that >>> allows matching patterns >>> with embedded NULs. Would that be needed/useful for LLVM? >>> >> >> Maybe? If its already there and we are fine using this version of the >> lib always then we might as well include it. >> > > I changed the constructor to take StringRef, and now I always use REG_PEND > internally, so it works both with char* and std::string (which may not > be null-terminated, or may include embedded nuls). > You can add a Twine& constructor if needed ;) > >> >>>>> + // notbol: The match-beginning-of-line operator always fails to match, >>>>> + // unless regex was compiled with NEWLINE flag. >>>>> + // noteol: The match-end-of-line operator always fails to match. >>>>> + // unless regex was compiled with NEWLINE flag. >>>>> + >>>>> + bool matches(const char *string, bool notbol=false, bool noteol=false); >>>>> + bool match_sub(const char *string, llvm_regmatch_t pmatch[], >>>>> + unsigned nmatch, bool notbol=false, bool noteol=false); >>>>> >>>>> >>>> These should be doxymented, and I don't understand the flags. Does this mean >>>> that if I *don't* compile with NEWLINE, but *do* want ^/$ support, then >>>> notbol/noteol need to be false? Do we care about use cases that care about this? >>>> Could we just always have them as false? >>>> >>>> >>> They are meant for cases where you call matches() on portions of a >>> larger string. >>> When you call it at offset 0, you want notbol=false, but when you call >>> it at offset > 0 you want notbol=true. >>> Ditto for noteol, when the portion of string passed to matches() doesn't >>> contain the real end of the string. >>> >>> If we are always passing the entire string we intend to match to >>> matches() then we can remove those bools. >>> >> >> Ah, that makes sense. For the time being, though, I'd say lets drop >> them until some client actually cares. >> > > Dropped. > >> >>>> This won't work on Windows as stands (Regex.h == regex.h), but I think we should >>>> just keep regex.h private which solves the problem (still nice to rename it to >>>> avoid confusion, llvm_regex.h would be consistent). >>>> >>>> >>> Ok, I'll try to make it private. >>> >> >> Ok! >> >> I have one more request; since the implementation ends up being spread >> into a number of files, could we prefix them with something so they >> stand out inside the Support directory? >> > > They are now all lib/Support/reg* > > Attached new patch. > > Best regards, > --Edwin > >