Markus Elze
2012-Mar-07 11:54 UTC
[R] gsub: replacing double backslashes with single backslash
Hello everybody, this might be a trivial question, but I have been unable to find this using Google. I am trying to replace double backslashes with single backslashes using gsub. There seems to be some unexpected behaviour with regards to the replacement string "\\". The following example uses the string C:\\ which should be converted to C:\ . > gsub("\\\\", "\\", "C:\\") [1] "C:" > gsub("\\\\", "Test", "C:\\") [1] "C:Test" > gsub("\\\\", "\\\\", "C:\\") [1] "C:\\" I have observed similar behaviour for fixed=TRUE and perl=TRUE. I use R 2.14.1 64-bit on Windows 7. Markus -- Markus Elze Department of Statistics University of Warwick Coventry CV4 7AL
David Winsemius
2012-Mar-07 14:57 UTC
[R] gsub: replacing double backslashes with single backslash
On Mar 7, 2012, at 6:54 AM, Markus Elze wrote:> Hello everybody, > this might be a trivial question, but I have been unable to find > this using Google. I am trying to replace double backslashes with > single backslashes using gsub.Actually you don't have double backslashes in the argument you are presenting to gsub. The string entered at the console as "C:\\" only has a single backslash. > nchar("C:\\") [1] 3> There seems to be some unexpected behaviour with regards to the > replacement string "\\". The following example uses the string C:\\ > which should be converted to C:\ . > > > gsub("\\\\", "\\", "C:\\") > [1] "C:"But I do not understand that returned value, either. I thought that the 'repl' argument (which I think I have demonstrated is a single backslash) would get put back in the returned value.> > gsub("\\\\", "Test", "C:\\") > [1] "C:Test" > > gsub("\\\\", "\\\\", "C:\\") > [1] "C:\\"I thought the parsing rules for 'replacement' were different than the rules for 'patt'. So I'm puzzled, too. Maybe something changed in 2.14? > sub("\\\\", "\\", "C:\\", fixed=TRUE) [1] "C:\\" > sub("\\\\", "\\", "C:\\") [1] "C:" > sub("([\\])", "\\1", "C:\\") [1] "C:\\" The NEWS file does say that there is a new regular expression implementation and that the help file for regex should be consulted. And presumably we should study this: http://laurikari.net/tre/documentation/regex-syntax/ In the 'replacement' argument, the "\\" is used to back-reference a numbered sub-pattern, so perhaps "\\" is now getting handled as the "null subpattern"? I don't see that mentioned in the regex help page, but it is a big "page". I also didn't see "\\" referenced in the TRE documentation, but then again I don't think that "\\" in console or source() input is a double backslash. The TRE document says that "A \ cannot be the last character of an ERE." I cannot tell whether that rule gets applied to the 'replacement'.> > > I have observed similar behaviour for fixed=TRUE and perl=TRUE. I > use R 2.14.1 64-bit on Windows 7.-- David Winsemius, MD West Hartford, CT