For the regexp aficionados, out there: I need a regular expression to extract either everything within some brackets, or everything outside the brackets, in a string. This would be the test string: "A1{0}~B0{1} CO{a2}NN{12}" Everything outside the brackets would be: "A1 ~B0 CO NN" and everything inside the brackets would be: "0 1 a2 12" I have a working solution involving strsplit(), but I wonder if there is a more direct way. Thanks in advance for any hint, Adrian -- Adrian Dusa University of Bucharest Romanian Social Data Archive Soseaua Panduri nr.90 050663 Bucharest sector 5 Romania [[alternative HTML version deleted]]
It depends the complexity of your expression. If you are sure you don?t have nested brackets, and pairs of brackets always match, this will take everything outside the brackets: str <- "A1{0}~B0{1} CO{a2}NN{12}? gsub("\\{[^}]*\\}", " ", str) Philippe Grosjean> On 11 Dec 2015, at 14:50, Adrian Du?a <dusa.adrian at unibuc.ro> wrote: > > For the regexp aficionados, out there: > > I need a regular expression to extract either everything within some > brackets, or everything outside the brackets, in a string. > > This would be the test string: > "A1{0}~B0{1} CO{a2}NN{12}" > > Everything outside the brackets would be: > > "A1 ~B0 CO NN" > > and everything inside the brackets would be: > > "0 1 a2 12" > > I have a working solution involving strsplit(), but I wonder if there is a > more direct way. > Thanks in advance for any hint, > Adrian > > -- > Adrian Dusa > University of Bucharest > Romanian Social Data Archive > Soseaua Panduri nr.90 > 050663 Bucharest sector 5 > Romania > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
> On Dec 11, 2015, at 7:50 AM, Adrian Du?a <dusa.adrian at unibuc.ro> wrote: > > For the regexp aficionados, out there: > > I need a regular expression to extract either everything within some > brackets, or everything outside the brackets, in a string. > > This would be the test string: > "A1{0}~B0{1} CO{a2}NN{12}" > > Everything outside the brackets would be: > > "A1 ~B0 CO NN" > > and everything inside the brackets would be: > > "0 1 a2 12" > > I have a working solution involving strsplit(), but I wonder if there is a > more direct way. > Thanks in advance for any hint, > Adrianx <- "A1{0}~B0{1} CO{a2}NN{12}" The first is a bit easier:> gsub("\\{[[:alnum:]]*\\}", " ", x)[1] "A1 ~B0 CO NN " The second, at least using standard functions, is a bit more cumbersome, given the repeated sequences:> gsub("\\{|\\}", "", regmatches(x, gregexpr("\\{[[:alnum:]]+\\}", x))[[1]])[1] "0" "1" "a2" "12" Note that a multi-element vector is returned. In the above:> gregexpr("\\{[[:alnum:]]+\\}", x)[[1]] [1] 3 9 15 21 attr(,"match.length") [1] 3 3 4 4 attr(,"useBytes") [1] TRUE returns the starting positions of the matches, which are passed to regmatches() to get the actual values:> regmatches(x, gregexpr("\\{[[:alnum:]]+\\}", x))[[1]] [1] "{0}" "{1}" "{a2}" "{12}" The gsub() replaces the returned braces. You could invert the result of regmatches() to get:> regmatches(x, gregexpr("\\{[[:alnum:]]+\\}", x), invert = TRUE)[[1]][1] "A1" "~B0" " CO" "NN" "" Of course, this presumes non-nesting of braces, etc. Regards, Marc Schwartz
The gsub function is your friend. s <- "A1{0}~B0{1} CO{a2}NN{12}" gsub( "([^{}]*)\\{([^{}]*)\\}", "\\1 ", s ) gsub( "([^{}]*)\\{([^{}]*)\\}", "\\2 ", s ) but keep in mind that there are many resources on the Internet for learning about regular expressions... they are hardly R-specific. -- Sent from my phone. Please excuse my brevity. On December 11, 2015 5:50:28 AM PST, "Adrian Du?a" <dusa.adrian at unibuc.ro> wrote:>For the regexp aficionados, out there: > >I need a regular expression to extract either everything within some >brackets, or everything outside the brackets, in a string. > >This would be the test string: >"A1{0}~B0{1} CO{a2}NN{12}" > >Everything outside the brackets would be: > >"A1 ~B0 CO NN" > >and everything inside the brackets would be: > >"0 1 a2 12" > >I have a working solution involving strsplit(), but I wonder if there >is a >more direct way. >Thanks in advance for any hint, >Adrian > >-- >Adrian Dusa >University of Bucharest >Romanian Social Data Archive >Soseaua Panduri nr.90 >050663 Bucharest sector 5 >Romania > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.[[alternative HTML version deleted]]
Thanks very much, Marc and Jeff. Jeff's solutions seem to be simple one liners. I really need to learn these things, too powerful to ignore. Thank you very much, Adrian On Fri, Dec 11, 2015 at 5:05 PM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:> The gsub function is your friend. > > s <- "A1{0}~B0{1} CO{a2}NN{12}" > gsub( "([^{}]*)\\{([^{}]*)\\}", "\\1 ", s ) > gsub( "([^{}]*)\\{([^{}]*)\\}", "\\2 ", s ) > > but keep in mind that there are many resources on the Internet for > learning about regular expressions... they are hardly R-specific. > > -- > Sent from my phone. Please excuse my brevity. > > On December 11, 2015 5:50:28 AM PST, "Adrian Du?a" <dusa.adrian at unibuc.ro> > wrote: >> >> For the regexp aficionados, out there: >> >> I need a regular expression to extract either everything within some >> brackets, or everything outside the brackets, in a string. >> >> This would be the test string: >> "A1{0}~B0{1} CO{a2}NN{12}" >> >> Everything outside the brackets would be: >> >> "A1 ~B0 CO NN" >> >> and everything inside the brackets would be: >> >> "0 1 a2 12" >> >> I have a working solution involving strsplit(), but I wonder if there is a >> more direct way. >> Thanks in advance for any hint, >> Adrian >> >>-- Adrian Dusa University of Bucharest Romanian Social Data Archive Soseaua Panduri nr.90 050663 Bucharest sector 5 Romania [[alternative HTML version deleted]]