Mark Kimpel
2010-Jan-07 19:47 UTC
[R] regex question on escaping "." (and a couple other regex questions as well)
I have an example where escaping "." does not seem to be behaving consistently, but perhaps it is due to my misunderstanding. Could someone explain to me why the below produces the output it does? It seems to me that in the second example, where I am being more precise about specifying that a "." (dot) should be between the numbers, should produce the same output as the first example, but it does not. As an aside, is there a document or help page that specifies which characters need to be escaped to form regex's in R? I can't find one. Finally, how does one grep for the escape character? I've tried grep ("\\", vector) grep ("\\\", vector) grep("\\\\", vector) all without success. Thanks, Mark a <- "160.15.05.00" grep("[1-9]+.[0-9]+\\.[0-9]+\\.[0-0]+", a) # [1] grep("[1-9]+\\.[0-9]+\\.[0-9]+\\.[0-0]+", a) # integer(0)> sessionInfo()R version 2.11.0 Under development (unstable) (2009-12-28 r50849) x86_64-unknown-linux-gnu locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] grid stats graphics grDevices datasets utils methods [8] base other attached packages: [1] Rgraphviz_1.25.1 graph_1.25.4 loaded via a namespace (and not attached): [1] tools_2.11.0 Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry Indiana University School of Medicine 15032 Hunter Court, Westfield, IN 46074 (317) 490-5129 Work, & Mobile & VoiceMail (317) 399-1219 Skype No Voicemail please [[alternative HTML version deleted]]
David Winsemius
2010-Jan-07 20:07 UTC
[R] regex question on escaping "." (and a couple other regex questions as well)
On Jan 7, 2010, at 2:47 PM, Mark Kimpel wrote:> I have an example where escaping "." does not seem to be behaving > consistently, but perhaps it is due to my misunderstanding. Could > someone > explain to me why the below produces the output it does? > > It seems to me that in the second example, where I am being more > precise > about specifying that a "." (dot) should be between the numbers, > should > produce the same output as the first example, but it does not.there is an intervening "0" in between the matching 1-9 group and the first period causing a pattern failure for match of "160." with "[1-9]+ \\."> > As an aside, is there a document or help page that specifies which > characters need to be escaped to form regex's in R? I can't find one.?regex # what else?> > Finally, how does one grep for the escape character? I've tried > grep ("\\", vector) > grep ("\\\", vector) > grep("\\\\", vector)Where or perhaps what is "vector"? Why should we think it has an "escape character" in it? In some sense I think you have an epistemological problem. There can be back-slashes in strings, but they are not escape characters at that point.> all without success.> > Thanks, Mark > > a <- "160.15.05.00" > grep("[1-9]+.[0-9]+\\.[0-9]+\\.[0-0]+", a) > # [1] > grep("[1-9]+\\.[0-9]+\\.[0-9]+\\.[0-0]+", a) > # integer(0) > >> sessionInfo() > R version 2.11.0 Under development (unstable) (2009-12-28 r50849) > x86_64-unknown-linux-gnu > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] grid stats graphics grDevices datasets utils > methods > [8] base > > other attached packages: > [1] Rgraphviz_1.25.1 graph_1.25.4 > > loaded via a namespace (and not attached): > [1] tools_2.11.0 > > Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry > Indiana University School of Medicine > > 15032 Hunter Court, Westfield, IN 46074 > > (317) 490-5129 Work, & Mobile & VoiceMail > (317) 399-1219 Skype No Voicemail please > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD Heritage Laboratories West Hartford, CT