For the regexp aficionados, out there:
I need a regular expression to extract either everything within some
brackets, or everything outside the brackets, in a string.
This would be the test string:
"A1{0}~B0{1} CO{a2}NN{12}"
Everything outside the brackets would be:
"A1 ~B0 CO NN"
and everything inside the brackets would be:
"0 1 a2 12"
I have a working solution involving strsplit(), but I wonder if there is a
more direct way.
Thanks in advance for any hint,
Adrian
--
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr.90
050663 Bucharest sector 5
Romania
[[alternative HTML version deleted]]
It depends the complexity of your expression. If you are sure you don?t have
nested brackets, and pairs of brackets always match, this will take everything
outside the brackets:
str <- "A1{0}~B0{1} CO{a2}NN{12}?
gsub("\\{[^}]*\\}", " ", str)
Philippe Grosjean
> On 11 Dec 2015, at 14:50, Adrian Du?a <dusa.adrian at unibuc.ro>
wrote:
>
> For the regexp aficionados, out there:
>
> I need a regular expression to extract either everything within some
> brackets, or everything outside the brackets, in a string.
>
> This would be the test string:
> "A1{0}~B0{1} CO{a2}NN{12}"
>
> Everything outside the brackets would be:
>
> "A1 ~B0 CO NN"
>
> and everything inside the brackets would be:
>
> "0 1 a2 12"
>
> I have a working solution involving strsplit(), but I wonder if there is a
> more direct way.
> Thanks in advance for any hint,
> Adrian
>
> --
> Adrian Dusa
> University of Bucharest
> Romanian Social Data Archive
> Soseaua Panduri nr.90
> 050663 Bucharest sector 5
> Romania
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> On Dec 11, 2015, at 7:50 AM, Adrian Du?a <dusa.adrian at unibuc.ro> wrote: > > For the regexp aficionados, out there: > > I need a regular expression to extract either everything within some > brackets, or everything outside the brackets, in a string. > > This would be the test string: > "A1{0}~B0{1} CO{a2}NN{12}" > > Everything outside the brackets would be: > > "A1 ~B0 CO NN" > > and everything inside the brackets would be: > > "0 1 a2 12" > > I have a working solution involving strsplit(), but I wonder if there is a > more direct way. > Thanks in advance for any hint, > Adrianx <- "A1{0}~B0{1} CO{a2}NN{12}" The first is a bit easier:> gsub("\\{[[:alnum:]]*\\}", " ", x)[1] "A1 ~B0 CO NN " The second, at least using standard functions, is a bit more cumbersome, given the repeated sequences:> gsub("\\{|\\}", "", regmatches(x, gregexpr("\\{[[:alnum:]]+\\}", x))[[1]])[1] "0" "1" "a2" "12" Note that a multi-element vector is returned. In the above:> gregexpr("\\{[[:alnum:]]+\\}", x)[[1]] [1] 3 9 15 21 attr(,"match.length") [1] 3 3 4 4 attr(,"useBytes") [1] TRUE returns the starting positions of the matches, which are passed to regmatches() to get the actual values:> regmatches(x, gregexpr("\\{[[:alnum:]]+\\}", x))[[1]] [1] "{0}" "{1}" "{a2}" "{12}" The gsub() replaces the returned braces. You could invert the result of regmatches() to get:> regmatches(x, gregexpr("\\{[[:alnum:]]+\\}", x), invert = TRUE)[[1]][1] "A1" "~B0" " CO" "NN" "" Of course, this presumes non-nesting of braces, etc. Regards, Marc Schwartz
The gsub function is your friend.
s <- "A1{0}~B0{1} CO{a2}NN{12}"
gsub( "([^{}]*)\\{([^{}]*)\\}", "\\1 ", s )
gsub( "([^{}]*)\\{([^{}]*)\\}", "\\2 ", s )
but keep in mind that there are many resources on the Internet for learning
about regular expressions... they are hardly R-specific.
--
Sent from my phone. Please excuse my brevity.
On December 11, 2015 5:50:28 AM PST, "Adrian Du?a" <dusa.adrian at
unibuc.ro> wrote:>For the regexp aficionados, out there:
>
>I need a regular expression to extract either everything within some
>brackets, or everything outside the brackets, in a string.
>
>This would be the test string:
>"A1{0}~B0{1} CO{a2}NN{12}"
>
>Everything outside the brackets would be:
>
>"A1 ~B0 CO NN"
>
>and everything inside the brackets would be:
>
>"0 1 a2 12"
>
>I have a working solution involving strsplit(), but I wonder if there
>is a
>more direct way.
>Thanks in advance for any hint,
>Adrian
>
>--
>Adrian Dusa
>University of Bucharest
>Romanian Social Data Archive
>Soseaua Panduri nr.90
>050663 Bucharest sector 5
>Romania
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
Thanks very much, Marc and Jeff. Jeff's solutions seem to be simple one liners. I really need to learn these things, too powerful to ignore. Thank you very much, Adrian On Fri, Dec 11, 2015 at 5:05 PM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:> The gsub function is your friend. > > s <- "A1{0}~B0{1} CO{a2}NN{12}" > gsub( "([^{}]*)\\{([^{}]*)\\}", "\\1 ", s ) > gsub( "([^{}]*)\\{([^{}]*)\\}", "\\2 ", s ) > > but keep in mind that there are many resources on the Internet for > learning about regular expressions... they are hardly R-specific. > > -- > Sent from my phone. Please excuse my brevity. > > On December 11, 2015 5:50:28 AM PST, "Adrian Du?a" <dusa.adrian at unibuc.ro> > wrote: >> >> For the regexp aficionados, out there: >> >> I need a regular expression to extract either everything within some >> brackets, or everything outside the brackets, in a string. >> >> This would be the test string: >> "A1{0}~B0{1} CO{a2}NN{12}" >> >> Everything outside the brackets would be: >> >> "A1 ~B0 CO NN" >> >> and everything inside the brackets would be: >> >> "0 1 a2 12" >> >> I have a working solution involving strsplit(), but I wonder if there is a >> more direct way. >> Thanks in advance for any hint, >> Adrian >> >>-- Adrian Dusa University of Bucharest Romanian Social Data Archive Soseaua Panduri nr.90 050663 Bucharest sector 5 Romania [[alternative HTML version deleted]]