Michael Young
2011-Jun-05 19:59 UTC
[R] Negating two identical characters with regular expressions
Hello all,
Let's say I have a character string
"Race-ethnicity-----coding information"
I want to extract all text before the multiple dashes, including the word
"ethnicity."
I wrote a handy function to extract the first matched text:
grepcut <- function(pattern,x){
start.and.length <- regexpr(pattern,x)
substring(x,start.and.length,start.and.length
+attr(start.and.length,"match.length")-1)}
grepcut("^[^-]+","Race-ethnicity-----coding information")
The above grepcut, of course, returns only the string "Race" What I
really
want is a to create a class of two dashes in a row and then negate that. Is
it possible to create a class of repeated characters? If so, it might be
further complicated that "-" is a special character in brackets and
can only
go first or last.
Can anyone help me out?
Thanks,
Michael Young
[[alternative HTML version deleted]]
matto in cor
2011-Jun-06 08:32 UTC
[R] Negating two identical characters with regular expressions
Hello Michael,
try strsplit("aa-bb-----cc dd", "-{2,}") . This function
returns an array
with all the strings separated by multiple dashes (at least two).
Alternatively if you want the first string only try this:
sub("(.*?)--.*",
"\\1", "aa-bb----cc dd") (note the reluctant quantifier *? )
Hope this helps
Marco
On Sun, Jun 5, 2011 at 9:59 PM, Michael Young
<michaeltyoung@gmail.com>wrote:
> Hello all,
>
> Let's say I have a character string
> "Race-ethnicity-----coding information"
>
> I want to extract all text before the multiple dashes, including the word
> "ethnicity."
>
> I wrote a handy function to extract the first matched text:
>
> grepcut <- function(pattern,x){
> start.and.length <- regexpr(pattern,x)
> substring(x,start.and.length,start.and.length
> +attr(start.and.length,"match.length")-1)}
>
> grepcut("^[^-]+","Race-ethnicity-----coding
information")
>
> The above grepcut, of course, returns only the string "Race"
What I really
> want is a to create a class of two dashes in a row and then negate that. Is
> it possible to create a class of repeated characters? If so, it might be
> further complicated that "-" is a special character in brackets
and can
> only
> go first or last.
>
> Can anyone help me out?
>
> Thanks,
> Michael Young
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Ogni tempo ha il suo fascismo. A questo si arriva in molti modi, non
necessariamente col terrore dell'intimidazione poliziesca, ma anche
negando o distorcendo l'informazione, inquinando la giustizia,
paralizzando la scuola, diffondendo in molti sottili modi la
nostalgia per un mondo in cui regnava sovrano l'ordine.
(Primo Levi, 1974)
[[alternative HTML version deleted]]