can someone show me how to use a regular expression to break the string at the bottom up into its three components : (-0.791,-0.263] (-38,-1.24] (0.96,2.43] I tried to use strplit because of my regexpitis ( it's not curable. i've been to many doctors all over NYC. they tell me there's no cure ) but it doesn't work because there also dots inside the brackets. Thanks. (-0.791,-0.263].(-38,-1.24].(0.96,2.43]
G'day Mark, On Tue, 03 Mar 2009 00:16:34 -0600 (CST) markleeds at verizon.net wrote:> can someone show me how to use a regular expression to break the > string at the bottom up into its three components : > > (-0.791,-0.263] > (-38,-1.24] > (0.96,2.43] > > I tried to use strplit because of my regexpitis ( it's not curable. > i've been to many doctors all over NYC. they tell me there's no > cure ) but it doesn't work because there also dots inside the > brackets. Thanks. > > (-0.791,-0.263].(-38,-1.24].(0.96,2.43]Probably you will get better answers from regexp experts, but here we go: The problem seems to be that strsplit() throws away the part that is matched when deciding where to split. Thus, I guess the aim would be to replace the `.' on which you want to split by something else and then use strsplit(). For example you could do: R> str <- "(-0.791,-0.263].(-38,-1.24].(0.96,2.43]" R> (uu <- gsub("(\\([^]]*\\])(\\.)", "\\1RIsGreat", str)) [1] "(-0.791,-0.263]RIsGreat(-38,-1.24]RIsGreat(0.96,2.43]" R> strsplit(uu, "RIsGreat") [[1]] [1] "(-0.791,-0.263]" "(-38,-1.24]" "(0.96,2.43]" Though the following works too. R> (uu <- gsub("(\\([^]]*\\])(\\.)", "\\1?", str)) [1] "(-0.791,-0.263]?(-38,-1.24]?(0.96,2.43]" R> strsplit(uu, "\\?") [[1]] [1] "(-0.791,-0.263]" "(-38,-1.24]" "(0.96,2.43]" To explain the gsub() command, it says look for an opening round bracket ("\\("), followed by anything but a square close bracket ("[^]]"), followed by a close square bracket ("\\]") which if followed by a dot ("\\."). Call the part that is made up from the first three parts group 1 and the dot group too (that's the open/close brackets in the regexp: (\\([^]]\\\)(\\.) ^^^^^^^^^^^^----- group1 group2 Hopefully that explains the regexp used in the first part, the second part then says replace this pattern by repeating the first group ("\\1") and by replacing the second group with "RIsGreat" or, respectively "?". HTH. Cheers, Berwin =========================== Full address ============================Berwin A Turlach Tel.: +65 6516 4416 (secr) Dept of Statistics and Applied Probability +65 6516 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: statba at nus.edu.sg Singapore 117546 http://www.stat.nus.edu.sg/~statba
markleeds at verizon.net wrote:> can someone show me how to use a regular expression to break the > string at the bottom up into its three components : > > (-0.791,-0.263] > (-38,-1.24] > (0.96,2.43] > > I tried to use strplit because of my regexpitis ( it's not curable. > i've been to many doctors all over NYC. they tell me there's no cure > ) but it doesn't work because there also dots inside the brackets. > Thanks. > > (-0.791,-0.263].(-38,-1.24].(0.96,2.43] >here's one way to get a matrix of numeric values: text = "(-0.791,-0.263].(-38,-1.24].(0.96,2.43]" values = matrix(ncol=2, byrow=TRUE, as.numeric( grep(pattern='.', value=TRUE, x=strsplit(x=text, split=']\\.\\(|\\(|]|,')[[1]]))) modify any of the steps according to your needs. vQ
Here are two solutions using gsubfn package. strapply works by matching the what you want rather than what you don't want which may make it easier in this case. The two solutions are the same except we use \\ escapes in the first and [ ... ] in the second, i.e. \\( has the same effect as [(]. In each case we first match the ( then a sequence of characters that is not ] and finally we match the terminating ].> library(gsubfn) > x <- "(-0.791,-0.263].(-38,-1.24].(0.96,2.43]"> strapply(x, "\\([^]]+[]]")[[1]][1] "(-0.791,-0.263]" "(-38,-1.24]" "(0.96,2.43]"> strapply(x, "[(][^]]+[]]")[[1]][1] "(-0.791,-0.263]" "(-38,-1.24]" "(0.96,2.43]" On Tue, Mar 3, 2009 at 1:16 AM, <markleeds at verizon.net> wrote:> can someone show me how to use a regular expression to break the string at > the bottom up into its three components : > > (-0.791,-0.263] > (-38,-1.24] > (0.96,2.43] > > I tried to use strplit because of my regexpitis ( it's not curable. i've > been to many doctors all over NYC. they tell me there's no cure ?) ?but it > doesn't work because there also dots inside ?the brackets. Thanks. > > (-0.791,-0.263].(-38,-1.24].(0.96,2.43] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Here is another approach that still uses strspit if you want to stay with that:> tmp <- '(-0.791,-0.263].(-38,-1.24].(0.96,2.43]' > strsplit(tmp, '\\.(?=\\()', perl=TRUE)[[1]] [1] "(-0.791,-0.263]" "(-38,-1.24]" "(0.96,2.43]" This uses the Perl 'look-ahead' indicator to say only match on a period that is followed by a '(', but don't include the '(' in the match. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org 801.408.8111> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of markleeds at verizon.net > Sent: Monday, March 02, 2009 11:17 PM > To: r-help at r-project.org > Subject: [R] regular expression question > > can someone show me how to use a regular expression to break the string > at the bottom up into its three components : > > (-0.791,-0.263] > (-38,-1.24] > (0.96,2.43] > > I tried to use strplit because of my regexpitis ( it's not curable. > i've > been to many doctors all over NYC. they tell me there's no cure ) but > it doesn't work because there also dots inside the brackets. Thanks. > > (-0.791,-0.263].(-38,-1.24].(0.96,2.43] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.