Michael Young
2011-Jun-05 19:59 UTC
[R] Negating two identical characters with regular expressions
Hello all, Let's say I have a character string "Race-ethnicity-----coding information" I want to extract all text before the multiple dashes, including the word "ethnicity." I wrote a handy function to extract the first matched text: grepcut <- function(pattern,x){ start.and.length <- regexpr(pattern,x) substring(x,start.and.length,start.and.length +attr(start.and.length,"match.length")-1)} grepcut("^[^-]+","Race-ethnicity-----coding information") The above grepcut, of course, returns only the string "Race" What I really want is a to create a class of two dashes in a row and then negate that. Is it possible to create a class of repeated characters? If so, it might be further complicated that "-" is a special character in brackets and can only go first or last. Can anyone help me out? Thanks, Michael Young [[alternative HTML version deleted]]
matto in cor
2011-Jun-06 08:32 UTC
[R] Negating two identical characters with regular expressions
Hello Michael, try strsplit("aa-bb-----cc dd", "-{2,}") . This function returns an array with all the strings separated by multiple dashes (at least two). Alternatively if you want the first string only try this: sub("(.*?)--.*", "\\1", "aa-bb----cc dd") (note the reluctant quantifier *? ) Hope this helps Marco On Sun, Jun 5, 2011 at 9:59 PM, Michael Young <michaeltyoung@gmail.com>wrote:> Hello all, > > Let's say I have a character string > "Race-ethnicity-----coding information" > > I want to extract all text before the multiple dashes, including the word > "ethnicity." > > I wrote a handy function to extract the first matched text: > > grepcut <- function(pattern,x){ > start.and.length <- regexpr(pattern,x) > substring(x,start.and.length,start.and.length > +attr(start.and.length,"match.length")-1)} > > grepcut("^[^-]+","Race-ethnicity-----coding information") > > The above grepcut, of course, returns only the string "Race" What I really > want is a to create a class of two dashes in a row and then negate that. Is > it possible to create a class of repeated characters? If so, it might be > further complicated that "-" is a special character in brackets and can > only > go first or last. > > Can anyone help me out? > > Thanks, > Michael Young > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Ogni tempo ha il suo fascismo. A questo si arriva in molti modi, non necessariamente col terrore dell'intimidazione poliziesca, ma anche negando o distorcendo l'informazione, inquinando la giustizia, paralizzando la scuola, diffondendo in molti sottili modi la nostalgia per un mondo in cui regnava sovrano l'ordine. (Primo Levi, 1974) [[alternative HTML version deleted]]