xiao.gang.fan1 at libertysurf.fr
2007-Jan-03 17:55 UTC
[Rd] pb in regular expression with the character "-" (PR#9437)
Full_Name: FAN Version: 2.4.0 OS: Windows Submission from: (NULL) (159.50.101.9) These are expected:> grep("[\-|c]", c("a-a","b"))[1] 1> gsub("[\-|c]", "&", c("a-a","b"))[1] "a&a" "b" but these are strange:> grep("[d|\-|c]", c("a-a","b"))integer(0)> gsub("[d|\-|c]", "&", c("a-a","b"))[1] "a-a" "b" Thanks
Prof Brian Ripley
2007-Jan-04 10:42 UTC
[Rd] pb in regular expression with the character "-" (PR#9437)
Why do you think this is a bug in R? You have not told us what you expected, but the character range |-| contains only | . Not agreeing with your expectations (unstated or otherwise) is not a bug in R. \- is the same as -, and - is special in character classes. (If it is first or last it is treated literally.) And | is not a metacharacter inside a character class. Also,> grep("[d\\-c]", c("a-a","b"))[1] 1 2> grep("[d\\-c]", c("a-a","b"), perl=TRUE)[1] 1 shows that escaping - works only in perl (which you will find from the background references mentioned, e.g. The interpretation of an ordinary character preceded by a backslash ('\') is undefined. .) This is all carefully documented in ?regexp, e.g. Patterns are described here as they would be printed by 'cat': do remember that backslashes need to be doubled in entering R character strings from the keyboard. This is not the first time you have wasted our resources with false bug reports, so please show more respect for the R developers' time. You were also explicitly asked not to report on obselete versions of R. On Wed, 3 Jan 2007, xiao.gang.fan1 at libertysurf.fr wrote:> Full_Name: FAN > Version: 2.4.0 > OS: Windows > Submission from: (NULL) (159.50.101.9) > > > These are expected: > >> grep("[\-|c]", c("a-a","b")) > [1] 1 > >> gsub("[\-|c]", "&", c("a-a","b")) > [1] "a&a" "b" > > but these are strange: > >> grep("[d|\-|c]", c("a-a","b")) > integer(0) > >> gsub("[d|\-|c]", "&", c("a-a","b")) > [1] "a-a" "b" > > Thanks > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
maechler at stat.math.ethz.ch
2007-Jan-04 21:18 UTC
[Rd] pb in regular expression with the character "-" (PR#9437)
>>>>> "FanX" == Xiao Gang Fan <xiao.gang.fan1 at libertysurf.fr> >>>>> on Thu, 04 Jan 2007 21:52:07 +0100 writes:FanX> Let me detail a bit my bug report: the two commands FanX> ("expected" vs "strange") should return the same FanX> result, the objective of the commands is to test the FanX> presence of several characters, '-'included. FanX> The order in which we specify the different characters FanX> must not be an issue, i.e., to test the presence of FanX> several characters, including say char_1, the regular FanX> expressions [char_1|char_2|char_3] and FanX> [char_2|char_1|char_3] should play the same FanX> role. Other softwares work just like this. FanX> What's reported is that R actually returns different FanX> result for the character "-" (\- in a RE) regarding FanX> it's position in the regular expression, and the FanX> "perl" option would not be relevant. Fan, it seems haven't understood what Brian Ripley explained to you: Let me try to spell it out for you: "\-" is *NOT* what you seem still to be thinking it is: > "\-" [1] "-" > identical("\-", "-") [1] TRUE > This is all in the R-FAQ entry >>> 7.37 Why does backslash behave strangely inside strings? ======================================================= and in several other places, and yes, please do read the R FAQ and maybe more documentation about R and "bug reporting" before your next bug report. Consider my guesstimate: For 99% of all R users, the amount of time they need working pretty intensely with R before they find a bug in it, is nowadays more than three years, and maybe even much more -- such as their lifetime :-) Martin Maechler, ETH Zurich FanX> Prof Brian Ripley wrote: >> Why do you think this is a bug in R? You have not told >> us what you expected, but the character range |-| >> contains only | . Not agreeing with your expectations >> (unstated or otherwise) is not a bug in R. >> >> \- is the same as -, and - is special in character >> classes. (If it is first or last it is treated >> literally.) And | is not a metacharacter inside a >> character class. Also, >> >>> grep("[d\\-c]", c("a-a","b")) >> [1] 1 2 >> >>> grep("[d\\-c]", c("a-a","b"), perl=TRUE) >> [1] 1 >> >> shows that escaping - works only in perl (which you will >> find from the background references mentioned, e.g. >> >> The interpretation of an ordinary character preceded by a >> backslash ('\') is undefined. >> >> .) >> >> This is all carefully documented in ?regexp, e.g. >> >> Patterns are described here as they would be printed by >> 'cat': do remember that backslashes need to be doubled in >> entering R character strings from the keyboard. >> >> >> This is not the first time you have wasted our resources >> with false bug reports, so please show more respect for >> the R developers' time. You were also explicitly asked >> not to report on obselete versions of R. >> >> On Wed, 3 Jan 2007, xiao.gang.fan1 at libertysurf.fr wrote: >> >>> Full_Name: FAN Version: 2.4.0 OS: Windows Submission >>> from: (NULL) (159.50.101.9) >>> >>> >>> These are expected: >>> >>>> grep("[\-|c]", c("a-a","b")) >>> [1] 1 >>> >>>> gsub("[\-|c]", "&", c("a-a","b")) >>> [1] "a&a" "b" >>> >>> but these are strange: >>> >>>> grep("[d|\-|c]", c("a-a","b")) >>> integer(0) >>> >>>> gsub("[d|\-|c]", "&", c("a-a","b")) >>> [1] "a-a" "b" >>> >>> Thanks >>> >>> ______________________________________________ >>> R-devel at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> >> FanX> ______________________________________________ FanX> R-devel at r-project.org mailing list FanX> https://stat.ethz.ch/mailman/listinfo/r-devel
ripley at stats.ox.ac.uk
2007-Jan-04 23:06 UTC
[Rd] pb in regular expression with the character "-" (PR#9437)
Both Solaris 8 grep and GNU grep 2.5.1 give gannet% cat > foo.txt a-a b gannet% egrep '[d|-|c]' foo.txt gannet% egrep '[-|c]' foo.txt a-a agreeing exactly with R (and the POSIX standard) and contradicting 'Fan'. On Thu, 4 Jan 2007, Fan wrote:> Let me detail a bit my bug report: > > the two commands ("expected" vs "strange") should return the > same result, the objective of the commands is to test the presence > of several characters, '-'included. > > The order in which we specify the different characters must not be > an issue, i.e., to test the presence of several characters, including > say char_1, the regular expressions [char_1|char_2|char_3] and > [char_2|char_1|char_3] should play the same role. Other softwares > work just like this. > > What's reported is that R actually returns different result for the > character "-" (\- in a RE) regarding it's position in the regular > expression, and the "perl" option would not be relevant.As described in the relevant international standard and R's own documentation.> Prof Brian Ripley wrote: >> Why do you think this is a bug in R? You have not told us what you >> expected, but the character range |-| contains only | . Not agreeing with >> your expectations (unstated or otherwise) is not a bug in R. >> >> \- is the same as -, and - is special in character classes. (If it is >> first or last it is treated literally.) And | is not a metacharacter >> inside a character class. Also, >> >>> grep("[d\\-c]", c("a-a","b")) >> >> [1] 1 2 >> >>> grep("[d\\-c]", c("a-a","b"), perl=TRUE) >> >> [1] 1 >> >> shows that escaping - works only in perl (which you will find from the >> background references mentioned, e.g. >> >> The interpretation of an ordinary character preceded by a backslash >> ('\') is undefined. >> >> .) >> >> This is all carefully documented in ?regexp, e.g. >> >> Patterns are described here as they would be printed by 'cat': do >> remember that backslashes need to be doubled in entering R >> character strings from the keyboard. >> >> >> This is not the first time you have wasted our resources with false bug >> reports, so please show more respect for the R developers' time. >> You were also explicitly asked not to report on obselete versions of R. >> >> On Wed, 3 Jan 2007, xiao.gang.fan1 at libertysurf.fr wrote: >> >>> Full_Name: FAN >>> Version: 2.4.0 >>> OS: Windows >>> Submission from: (NULL) (159.50.101.9) >>> >>> >>> These are expected: >>> >>>> grep("[\-|c]", c("a-a","b")) >>> >>> [1] 1 >>> >>>> gsub("[\-|c]", "&", c("a-a","b")) >>> >>> [1] "a&a" "b" >>> >>> but these are strange: >>> >>>> grep("[d|\-|c]", c("a-a","b")) >>> >>> integer(0) >>> >>>> gsub("[d|\-|c]", "&", c("a-a","b")) >>> >>> [1] "a-a" "b" >>> >>> Thanks >>> >>> ______________________________________________ >>> R-devel at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> >> >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595