Mark Kimpel
2010-Jan-07 19:47 UTC
[R] regex question on escaping "." (and a couple other regex questions as well)
I have an example where escaping "." does not seem to be behaving
consistently, but perhaps it is due to my misunderstanding. Could someone
explain to me why the below produces the output it does?
It seems to me that in the second example, where I am being more precise
about specifying that a "." (dot) should be between the numbers,
should
produce the same output as the first example, but it does not.
As an aside, is there a document or help page that specifies which
characters need to be escaped to form regex's in R? I can't find one.
Finally, how does one grep for the escape character? I've tried
grep ("\\", vector)
grep ("\\\", vector)
grep("\\\\", vector)
all without success.
Thanks, Mark
a <- "160.15.05.00"
grep("[1-9]+.[0-9]+\\.[0-9]+\\.[0-0]+", a)
# [1]
grep("[1-9]+\\.[0-9]+\\.[0-9]+\\.[0-0]+", a)
# integer(0)
> sessionInfo()
R version 2.11.0 Under development (unstable) (2009-12-28 r50849)
x86_64-unknown-linux-gnu
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] grid stats graphics grDevices datasets utils methods
[8] base
other attached packages:
[1] Rgraphviz_1.25.1 graph_1.25.4
loaded via a namespace (and not attached):
[1] tools_2.11.0
Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry
Indiana University School of Medicine
15032 Hunter Court, Westfield, IN 46074
(317) 490-5129 Work, & Mobile & VoiceMail
(317) 399-1219 Skype No Voicemail please
[[alternative HTML version deleted]]
David Winsemius
2010-Jan-07 20:07 UTC
[R] regex question on escaping "." (and a couple other regex questions as well)
On Jan 7, 2010, at 2:47 PM, Mark Kimpel wrote:> I have an example where escaping "." does not seem to be behaving > consistently, but perhaps it is due to my misunderstanding. Could > someone > explain to me why the below produces the output it does? > > It seems to me that in the second example, where I am being more > precise > about specifying that a "." (dot) should be between the numbers, > should > produce the same output as the first example, but it does not.there is an intervening "0" in between the matching 1-9 group and the first period causing a pattern failure for match of "160." with "[1-9]+ \\."> > As an aside, is there a document or help page that specifies which > characters need to be escaped to form regex's in R? I can't find one.?regex # what else?> > Finally, how does one grep for the escape character? I've tried > grep ("\\", vector) > grep ("\\\", vector) > grep("\\\\", vector)Where or perhaps what is "vector"? Why should we think it has an "escape character" in it? In some sense I think you have an epistemological problem. There can be back-slashes in strings, but they are not escape characters at that point.> all without success.> > Thanks, Mark > > a <- "160.15.05.00" > grep("[1-9]+.[0-9]+\\.[0-9]+\\.[0-0]+", a) > # [1] > grep("[1-9]+\\.[0-9]+\\.[0-9]+\\.[0-0]+", a) > # integer(0) > >> sessionInfo() > R version 2.11.0 Under development (unstable) (2009-12-28 r50849) > x86_64-unknown-linux-gnu > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] grid stats graphics grDevices datasets utils > methods > [8] base > > other attached packages: > [1] Rgraphviz_1.25.1 graph_1.25.4 > > loaded via a namespace (and not attached): > [1] tools_2.11.0 > > Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry > Indiana University School of Medicine > > 15032 Hunter Court, Westfield, IN 46074 > > (317) 490-5129 Work, & Mobile & VoiceMail > (317) 399-1219 Skype No Voicemail please > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD Heritage Laboratories West Hartford, CT