I'm trying to write a gsub() call that takes a string and escapes all the unescaped quote marks in it. So the string \" would be left unchanged, but \\" would be changed to \\\" because the double backslash doesn't act as an escape for the quote, the first just escapes the second. I have the usual problems of writing regular expressions involving backslashes which make everything I write completely unreadable, so I'm going to change the problem for this post: I will define E to be the escape character, and q to be the quote; the gsub() call would leave Eq unchanged, but would change EEq to EEEq, etc. The expression I have come up with after this change is gsub( "((^|[^E])(EE)*)q", "\\1Eq", x) i.e. "(start of line, or non-escape, followed by an even number of escapes), all of which we call expression 1, followed by a quote, is replaced by expression 1 followed by an escape and a quote". This works sometimes, but not always: > gsub( "((^|[^E])(EE)*)q", "\\1Eq", "Eq") [1] "Eq" > gsub( "((^|[^E])(EE)*)q", "\\1Eq", "EEq") [1] "EEEq" > gsub( "((^|[^E])(EE)*)q", "\\1Eq", "qaq") [1] "EqaEq" > gsub( "((^|[^E])(EE)*)q", "\\1Eq", "qq") [1] "qEq" Notice that in the final example, the first quote doesn't get escaped. Why not???? Duncan Murdoch
Gabor Grothendieck
2008-Jul-06 21:27 UTC
[R] Regular expressions: bug or misunderstanding?
Try adding perl = TRUE On Sun, Jul 6, 2008 at 5:17 PM, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:> I'm trying to write a gsub() call that takes a string and escapes all the > unescaped quote marks in it. So the string > > \" > > would be left unchanged, but > > \\" > > would be changed to > > \\\" > > because the double backslash doesn't act as an escape for the quote, the > first just escapes the second. I have the usual problems of writing regular > expressions involving backslashes which make everything I write completely > unreadable, so I'm going to change the problem for this post: I will define > E to be the escape character, and q to be the quote; the gsub() call would > leave > > Eq > > unchanged, but would change > > EEq > > to EEEq, etc. > > The expression I have come up with after this change is > > gsub( "((^|[^E])(EE)*)q", "\\1Eq", x) > > i.e. "(start of line, or non-escape, followed by an even number of escapes), > all of which we call expression 1, followed by a quote, is replaced by > expression 1 followed by an escape and a quote". > > This works sometimes, but not always: > >> gsub( "((^|[^E])(EE)*)q", "\\1Eq", "Eq") > [1] "Eq" >> gsub( "((^|[^E])(EE)*)q", "\\1Eq", "EEq") > [1] "EEEq" >> gsub( "((^|[^E])(EE)*)q", "\\1Eq", "qaq") > [1] "EqaEq" >> gsub( "((^|[^E])(EE)*)q", "\\1Eq", "qq") > [1] "qEq" > > Notice that in the final example, the first quote doesn't get escaped. Why > not???? > > Duncan Murdoch > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
On 06-Jul-08 21:17:04, Duncan Murdoch wrote:> I'm trying to write a gsub() call that takes a string and escapes all > the unescaped quote marks in it. So the string > > \" > > would be left unchanged, but > > \\" > > would be changed to > > \\\" > > because the double backslash doesn't act as an escape for the quote, > the first just escapes the second. I have the usual problems of > writing regular expressions involving backslashes which make > everything I write completely unreadable, so I'm going to change > the problem for this post: I will define E to be the escape > character, and q to be the quote; the gsub() call would leave > > Eq > > unchanged, but would change > > EEq > > to EEEq, etc. > > The expression I have come up with after this change is > > gsub( "((^|[^E])(EE)*)q", "\\1Eq", x) > > i.e. "(start of line, or non-escape, followed by an even number of > escapes), all of which we call expression 1, followed by a quote, > is replaced by expression 1 followed by an escape and a quote". > > This works sometimes, but not always: > > > gsub( "((^|[^E])(EE)*)q", "\\1Eq", "Eq") > [1] "Eq" > > gsub( "((^|[^E])(EE)*)q", "\\1Eq", "EEq") > [1] "EEEq" > > gsub( "((^|[^E])(EE)*)q", "\\1Eq", "qaq") > [1] "EqaEq" > > gsub( "((^|[^E])(EE)*)q", "\\1Eq", "qq") > [1] "qEq" > > Notice that in the final example, the first quote doesn't get escaped. > Why not????I think (without having done the "experimental diagnostics") that it's because in "qq" the first q mtaches (^|[^E]) because it matches [^E] (i.e. is a "non-escape"); since it is followed by q, it is the second q which gets the escape. Possibly you need to include "^q" as an additional alternative match at the start of the line. Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 06-Jul-08 Time: 22:37:10 ------------------------------ XFMail ------------------------------
Reasonably Related Threads
- Using the host name of the volume, its related commands can become very slow
- Using the host name of the volume, its related commands can become very slow
- Indexer error after upgrade to 2.3.11.3
- To field was not correct indexed by FTS
- Indexer error after upgrade to 2.3.11.3