Hi, this should be an easy one, but I can't figure it out. I have a vector of tests, with their units between brackets (if they have units). eg tests <- c("pH", "Assay (%)", "Impurity A(%)", "content (mg/ml)") Now I would like to hava a function where I use a test as input, and which returns the units like: f <- function (x) sub("\\)", "", sub("\\(", "",sub("[[:alnum:]]+","",x))) this should give "", "%", "%", "mg/ml", but it doesn't do the job quit well. After searching in the manual, and on the help lists, I cant find the answer. anyone? Bart -- View this message in context: http://r.789695.n4.nabble.com/Regular-expression-to-find-value-between-brackets-tp2994166p2994166.html Sent from the R help mailing list archive at Nabble.com.
Henrique Dallazuanna
2010-Oct-13 18:34 UTC
[R] Regular expression to find value between brackets
Try this: replace(gsub(".*\\((.*)\\)$", "\\1", tests), !grepl("\\(.*\\)", tests), "") On Wed, Oct 13, 2010 at 3:16 PM, Bart Joosen <bartjoosen@hotmail.com> wrote:> > Hi, > > this should be an easy one, but I can't figure it out. > I have a vector of tests, with their units between brackets (if they have > units). > eg tests <- c("pH", "Assay (%)", "Impurity A(%)", "content (mg/ml)") > > Now I would like to hava a function where I use a test as input, and which > returns the units > like: > f <- function (x) sub("\\)", "", sub("\\(", "",sub("[[:alnum:]]+","",x))) > this should give "", "%", "%", "mg/ml", but it doesn't do the job quit > well. > > After searching in the manual, and on the help lists, I cant find the > answer. > > anyone? > > Bart > -- > View this message in context: > http://r.789695.n4.nabble.com/Regular-expression-to-find-value-between-brackets-tp2994166p2994166.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]]
Bart, I'm hardly one of the lists regex gurus: but this can get you started... tests <- c("pH", "Assay (%)", "Impurity A(%)", "content (mg/ml)") x <- regexpr("\\((.*)\\)", tests) substr(tests, x + 1, x + attr(x, "match.length") - 2) Bart Joosen wrote:> Hi, > > this should be an easy one, but I can't figure it out. > I have a vector of tests, with their units between brackets (if they have > units). > eg tests <- c("pH", "Assay (%)", "Impurity A(%)", "content (mg/ml)") > > Now I would like to hava a function where I use a test as input, and which > returns the units > like: > f <- function (x) sub("\\)", "", sub("\\(", "",sub("[[:alnum:]]+","",x))) > this should give "", "%", "%", "mg/ml", but it doesn't do the job quit well. > > After searching in the manual, and on the help lists, I cant find the > answer. > > anyone? > > Bart
One way: gsub(".*\\(([^()]*)\\).*", "\\1",tests) Idea: Pick out the units designation between the "()" and replace the whole expression with it. The "\\1" refers to the "[^()]* parenthesized expression in the middle that picks out the units. Cheers, Bert On Wed, Oct 13, 2010 at 11:16 AM, Bart Joosen <bartjoosen at hotmail.com> wrote:> > Hi, > > this should be an easy one, but I can't figure it out. > I have a vector of tests, with their units between brackets (if they have > units). > eg tests <- c("pH", "Assay (%)", "Impurity A(%)", "content (mg/ml)") > > Now I would like to hava a function where I use a test as input, and which > returns the units > like: > f <- function (x) sub("\\)", "", sub("\\(", "",sub("[[:alnum:]]+","",x))) > this should give "", "%", "%", "mg/ml", but it doesn't do the job quit well. > > After searching in the manual, and on the help lists, I cant find the > answer. > > anyone? > > Bart > -- > View this message in context: http://r.789695.n4.nabble.com/Regular-expression-to-find-value-between-brackets-tp2994166p2994166.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Bert Gunter Genentech Nonclinical Biostatistics
Note: My original proposal, not quite right, can be made quite right via: gsub(".*\\((.*)\\).*||[^()]+", "\\1",tests) The "||" or clause at the end handles the case where there are no parentheses in the string. -- Bert On Wed, Oct 13, 2010 at 11:16 AM, Bart Joosen <bartjoosen at hotmail.com> wrote:> > Hi, > > this should be an easy one, but I can't figure it out. > I have a vector of tests, with their units between brackets (if they have > units). > eg tests <- c("pH", "Assay (%)", "Impurity A(%)", "content (mg/ml)") > > Now I would like to hava a function where I use a test as input, and which > returns the units > like: > f <- function (x) sub("\\)", "", sub("\\(", "",sub("[[:alnum:]]+","",x))) > this should give "", "%", "%", "mg/ml", but it doesn't do the job quit well. > > After searching in the manual, and on the help lists, I cant find the > answer. > > anyone? > > Bart > -- > View this message in context: http://r.789695.n4.nabble.com/Regular-expression-to-find-value-between-brackets-tp2994166p2994166.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Bert Gunter Genentech Nonclinical Biostatistics
Gabor Grothendieck
2010-Oct-13 20:14 UTC
[R] Regular expression to find value between brackets
On Wed, Oct 13, 2010 at 2:16 PM, Bart Joosen <bartjoosen at hotmail.com> wrote:> > Hi, > > this should be an easy one, but I can't figure it out. > I have a vector of tests, with their units between brackets (if they have > units). > eg tests <- c("pH", "Assay (%)", "Impurity A(%)", "content (mg/ml)") >strapply in gsubfn can match by content which is what you want. We use a regular expression which is a literal left paren, "\\(" followed by a capturing paren ( followed by the longest string not containing a right paren [^)]* followed by the matching capturing paren "\\)" with strapply from the gsubfn package. This returns the matches to the function that is in the third arg and it just concatenates them. The result is simplified into a character vector (rather than a list). library(gsubfn) strapply(tests, "\\(([^)]*)\\)", c, simplify = c) e.g.> strapply(tests, "\\(([^)]*)\\)", c, simplify = c)[1] "%" "%" "mg/ml" See http://gsubfn.googlecode.com for the gsubfn home page and more info. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com