I noticed this issue in stringr::str_replace, but it also affects sub() in base R. If the pattern in a call to one of these needs to be a regular expression, then backslashes in the replacement text are treated specially. For example, gsub("a|b", "\\", "abcdef") gives "def", not "\\\\def" as I wanted. To get the latter, I need to escape the replacement backslashes, e.g. gsub("a|b", "\\\\", "abcdef") which gives "\\\\cdef". I have two questions: 1. Is there a variant on sub or str_replace which allows the pattern to be declared as a regular expression, but the replacement to be declared as fixed? 2. To get what I want, I can double the backslashes in the replacement text. This would do that: replacement <- gsub("\\\\", "\\\\\\\\", replacement) Are there any other special characters to worry about besides backslashes? Duncan Murdoch
Backslashes in regex expressions in R are maddening, but they make sense. R string handling interprets your replacement string "\\" as just one backslash. Your string is received by gsub as "\" - that is, just the control backslash, NOT the character backslash. gsub is expecting to see \0, \1, \2, or some other control starting with backslash. If you want gsub to replace with a backslash character, you have to send it as "\\". In order to get two backslash characters in an R string, you have to double them ALL: "\\\\". The string that is output is an R string: the backslashes are escaped with a backslash, so "\\\\" really means two backslashes. There are lots of special characters in the search string, but only one in the replacement string: backslash. Here's my favorite resource on this topic is https://www.regular-expressions.info/replacecharacters.html On 4/11/24 10:35, Duncan Murdoch wrote:> I noticed this issue in stringr::str_replace, but it also affects > sub() in base R. > > If the pattern in a call to one of these needs to be a regular > expression, then backslashes in the replacement text are treated > specially. > > For example, > > ? gsub("a|b", "\\", "abcdef") > > gives "def", not "\\\\def" as I wanted.? To get the latter, I need to > escape the replacement backslashes, e.g. > > ? gsub("a|b", "\\\\", "abcdef") > > which gives "\\\\cdef". > > I have two questions: > > 1.? Is there a variant on sub or str_replace which allows the pattern > to be declared as a regular expression, but the replacement to be > declared as fixed? > > 2.? To get what I want, I can double the backslashes in the > replacement text.? This would do that: > > ?? replacement <- gsub("\\\\", "\\\\\\\\", replacement) > > Are there any other special characters to worry about besides > backslashes? > > Duncan Murdoch > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hi Duncan, I only know about sub() and gsub(). There is no way to have pattern be a regular expression and replacement be a fixed string. Backslash is the only special character in replacement. If you need a reference, see this file: https://github.com/wch/r-source/blob/04650eddd6d844963b6d7aac02bd8d13cbf440d4/src/main/grep.c particularly functions R_pcre_string_adj and wstring_adj. So just double the backslashes in replacement and you'll be good to go. On Thu, Apr 11, 2024, 12:36 Duncan Murdoch <murdoch.duncan at gmail.com> wrote:> I noticed this issue in stringr::str_replace, but it also affects sub() > in base R. > > If the pattern in a call to one of these needs to be a regular > expression, then backslashes in the replacement text are treated specially. > > For example, > > gsub("a|b", "\\", "abcdef") > > gives "def", not "\\\\def" as I wanted. To get the latter, I need to > escape the replacement backslashes, e.g. > > gsub("a|b", "\\\\", "abcdef") > > which gives "\\\\cdef". > > I have two questions: > > 1. Is there a variant on sub or str_replace which allows the pattern to > be declared as a regular expression, but the replacement to be declared > as fixed? > > 2. To get what I want, I can double the backslashes in the replacement > text. This would do that: > > replacement <- gsub("\\\\", "\\\\\\\\", replacement) > > Are there any other special characters to worry about besides backslashes? > > Duncan Murdoch > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
On Thu, 11 Apr 2024, Duncan Murdoch writes:> I noticed this issue in stringr::str_replace, but it > also affects sub() in base R. > > If the pattern in a call to one of these needs to be a > regular expression, then backslashes in the replacement > text are treated specially. > > For example, > > gsub("a|b", "\\", "abcdef") > > gives "def", not "\\\\def" as I wanted. To get the > latter, I need to escape the replacement backslashes, > e.g. > > gsub("a|b", "\\\\", "abcdef") > > which gives "\\\\cdef". > > I have two questions: > > 1. Is there a variant on sub or str_replace which > allows the pattern to be declared as a regular > expression, but the replacement to be declared as > fixed?I realize that this reply is late, but you can use raw strings for the replacement: gsub("a|b", r"(\\)", "abcdef") ## [1] "\\\\cdef" which might be easier to read, sometimes. [...] -- Enrico Schumann Lucerne, Switzerland http://enricoschumann.net