R-help, Sorry if this is more of a regex question than an R question. However, help would be appreciated on my use of the regexpr function. In the first example below, I ask for all characters (a-z) in 'abc123'; regexpr returns a 3-character match beginning at the first character.> regexpr("[[:alpha:]]*", "abc123")[1] 1 attr(,"match.length") [1] 3 However, when the text is flipped regexpr, and I ask for a match of all characters in '123abc', regexpr returns a zero-character match beginning at the first character. Can someone explain what a zero length match means (i.e. why not return -1), and why the result isn't 4, match.length=3?> regexpr("[[:alpha:]]*", "123abc")[1] 1 attr(,"match.length") [1] 0 Thanks, Robert> R.version_ platform x86_64-apple-darwin9.8.0 arch x86_64 os darwin9.8.0 system x86_64, darwin9.8.0 status Patched major 2 minor 11.0 year 2010 month 05 day 11 svn rev 51984 language R version.string R version 2.11.0 Patched (2010-05-11 r51984)
McGehee, Robert wrote:> R-help, > Sorry if this is more of a regex question than an R question. However, > help would be appreciated on my use of the regexpr function. > > In the first example below, I ask for all characters (a-z) in 'abc123'; > regexpr returns a 3-character match beginning at the first character. > >> regexpr("[[:alpha:]]*", "abc123") > [1] 1 > attr(,"match.length") > [1] 3 > > However, when the text is flipped regexpr, and I ask for a match of all > characters in '123abc', regexpr returns a zero-character match beginning > at the first character. Can someone explain what a zero length match > means (i.e. why not return -1), and why the result isn't 4, > match.length=3?It means it matches 0 characters, which is fine since you use *, which means match 0 or more occurrences of the regex. It sounds like you want + instead of *. Also see gregexpr.> >> regexpr("[[:alpha:]]*", "123abc") > [1] 1 > attr(,"match.length") > [1] 0 >