>>>>> "David" == David Forrest <drf5n@maplepark.com> >>>>> on Tue, 22 Mar 2005 15:02:20 -0600 (CST) writes:David> According to help(sub), the ^ should match the David> zero-length string at the beginning of a string: yes, indeed. David> sub('^','var',1:3) # "1" "2" "3" David> sub('$','var',1:3) # "1var" "2var" "3var" David> # This generates what I expected from the first case: David> sub('^.','var',11:13) # "var1" "var2" "var3" there are even more fishy things here: 1) In your cases, the integer 'x' argument is auto-coerced to character, however that fails as soon as 'perl = TRUE' is used. > sub('^','v_', 1:3, perl=TRUE) Error in sub.perl(pattern, replacement, x, ignore.case) : invalid argument {one can argue that this is not a bug, since the help file asks for 'x' to be a character vector; OTOH, we have as.character(.) magic in many other places, i.e. quite naturally here; at least perl=TRUE and perl=FALSE should behave consistently.} 2) The 'perl=TRUE' case behaves even more problematically here: > sub('^','v_', LETTERS[1:3], perl=TRUE) [1] "A\0e" "B\0J" "C\0S" > sub('^','v_', LETTERS[1:3], perl=TRUE) [1] "A\0J" "B\0P" "C\0J" > sub('^','v_', LETTERS[1:3], perl=TRUE) [1] "A\0\0" "B\0\0" "C\0m" > i.e., the result is random nonsense. Note that this happens both for R-patched (2.0.1) and R-devel (2.1.0 alpha). ==> "forwarded" as bug report to R-bugs
On Wed, 23 Mar 2005 maechler@stat.math.ethz.ch wrote:> 1) In your cases, the integer 'x' argument is auto-coerced to > character, however that fails as soon as 'perl = TRUE' is used. > > > sub('^','v_', 1:3, perl=TRUE) > Error in sub.perl(pattern, replacement, x, ignore.case) : > invalid argument > > {one can argue that this is not a bug, since the help file asks > for 'x' to be a character vector; OTOH, we have > as.character(.) magic in many other places, i.e. quite > naturally here; > at least perl=TRUE and perl=FALSE should behave consistently.}I believe the bug is in the PERL=FALSE case. This coercion is undocumented, and e.g.> grep('^1', 1:3, perl=TRUE)Error in grep.perl(pattern, x, ignore.case, value, useBytes) : invalid argument> grep('^1', 1:3, perl=FALSE)Error in grep(pattern, x, ignore.case, extended, value, fixed, useBytes) : invalid argument do not accept non-character arguments. The only one that does AFAICS is [g]sub(perl=FALSE), and the other functions like tolower, substr, strsplit, chartr, agrep do not. The consistent thing to do seems to be to remove the anomalous coercion. Otherwise we need to at least change grep and regexpr. -- Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
<maechler <at> stat.math.ethz.ch> writes: : : >>>>> "David" == David Forrest <drf5n <at> maplepark.com> : >>>>> on Tue, 22 Mar 2005 15:02:20 -0600 (CST) writes: : : David> According to help(sub), the ^ should match the : David> zero-length string at the beginning of a string: : : yes, indeed. : : David> sub('^','var',1:3) # "1" "2" "3" : David> sub('$','var',1:3) # "1var" "2var" "3var" : : David> # This generates what I expected from the first case: : David> sub('^.','var',11:13) # "var1" "var2" "var3" : : there are even more fishy things here: : : 1) In your cases, the integer 'x' argument is auto-coerced to : character, however that fails as soon as 'perl = TRUE' is used. : : > sub('^','v_', 1:3, perl=TRUE) : Error in sub.perl(pattern, replacement, x, ignore.case) : : invalid argument : : {one can argue that this is not a bug, since the help file asks : for 'x' to be a character vector; OTOH, we have : as.character(.) magic in many other places, i.e. quite : naturally here; : at least perl=TRUE and perl=FALSE should behave consistently.} : : 2) The 'perl=TRUE' case behaves even more problematically here: : : > sub('^','v_', LETTERS[1:3], perl=TRUE) : [1] "A\0e" "B\0J" "C\0S" : > sub('^','v_', LETTERS[1:3], perl=TRUE) : [1] "A\0J" "B\0P" "C\0J" : > sub('^','v_', LETTERS[1:3], perl=TRUE) : [1] "A\0\0" "B\0\0" "C\0m" : > : : i.e., the result is random nonsense. : : Note that this happens both for R-patched (2.0.1) and R-devel (2.1.0 alpha). : : ==> "forwarded" as bug report to R-bugs Also consider the following which may be related. #1 does not place an X before the first word and #2 causes R to hang. R> R.version.string # Windows XP [1] "R version 2.1.0, 2005-03-17" R> gsub("\\b", "X", "The quick brown fox") # 1 [1] "The Xquick Xbrown Xfox" R> gsub("\\b", "X", "The quick brown fox", perl = TRUE) # 2 ... hangs ...