amiransk at uwo.ca
2009-Jan-19 18:25 UTC
[Rd] sub and gsub treat \\ incorrectly (PR#13454)
Sub and gsub treat \\ replacement pattern incorrectly I expect sub("a","\\", "a", perl=T) to produce [1] "\" instead it generates [1] "" On the other hand, if I run sub("a","\\\\", "a", perl=T) it correctly outputs [1] "\\" The same issue applies to gsub. --please do not edit the information below-- Version: platform = i386-pc-mingw32 arch = i386 os = mingw32 system = i386, mingw32 status = major = 2 minor = 8.1 year = 2008 month = 12 day = 22 svn rev = 47281 language = R version.string = R version 2.8.1 (2008-12-22) Windows XP (build 2600) Service Pack 2 Locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 Search Path: .GlobalEnv, package:stats, package:graphics, package:grDevices, package:utils, package:datasets, package:methods, Autoloads, package:base -- Sincerely, Andriy
> -----Original Message----- > From: r-devel-bounces at r-project.org > [mailto:r-devel-bounces at r-project.org] On Behalf Of amiransk at uwo.ca > Sent: Monday, January 19, 2009 10:25 AM > To: r-devel at stat.math.ethz.ch > Cc: R-bugs at r-project.org > Subject: [Rd] sub and gsub treat \\ incorrectly (PR#13454) > > Sub and gsub treat \\ replacement pattern incorrectly > > I expect > sub("a","\\", "a", perl=T) > to produce > [1] "\" > instead it generates > [1] "" > > On the other hand, if I run > sub("a","\\\\", "a", perl=T) > it correctly outputs > [1] "\\"The replacement pattern may include \\digit, which means to put the digit'th parenthesized subexpression into the replacement. E.g. > sub("([[:alpha:]]+) +([[:alpha:]]+)", "\\2 \\1", "One two three four five") [1] "two One three four five" > gsub("([[:alpha:]]+) +([[:alpha:]]+)", "\\2 \\1", "One two three four five") [1] "two One four three five" To support this without ambiguity or surprises, \\ is expected to be followed by a digit (or L or U when perl=TRUE). When fixed=TRUE then there is no possibility of a parenthesized subexpression so \\2 is taken literally. help(gsub) is not explicit about this behavior. Because I initially made the same mistake, when I wrote the S+ versions of gsub and sub I included a warning when the replacement included a \\ not followed by a digit: > gsub("([[:alpha:]]+) +([[:alpha:]]+)", "\\ \\", "One two three four five") [1] " five" Warning messages: backslash in replacement argument of substituteString(fixed=F) is not followed by backslash or digit, hence backslash is omitted in: substit\ uteString(pattern = pattern, replacement = replacement, x = x, extended ....> The same issue applies to gsub. > > --please do not edit the information below-- > > Version: > platform = i386-pc-mingw32 > arch = i386 > os = mingw32 > system = i386, mingw32 > status = > major = 2 > minor = 8.1 > year = 2008 > month = 12 > day = 22 > svn rev = 47281 > language = R > version.string = R version 2.8.1 (2008-12-22) > > Windows XP (build 2600) Service Pack 2 > > Locale: > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United > States.1252;LC_MONETARY=English_United > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > Search Path: > .GlobalEnv, package:stats, package:graphics, > package:grDevices, package:utils, package:datasets, > package:methods, Autoloads, package:base > > -- > Sincerely, > Andriy > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >
Reasonably Related Threads
- gsub() with unicode and escape character
- Bug in sub and gsub (PR#13460)
- sub / gsub - extracting between identical symbols
- gsub question, not a regex question...including part of the original in the sub...
- defining a template for functions via do.call and substit ute.