Computer Friends, with the following example lines: [107] "98-610: Cell type: S; Surv(months): 6; STATUS(0=alive, 1=dead): 1" [108] "99-625: Cell type: S; Surv(months): 21; STATUS(0=alive, 1=dead): 1" i want to be able to isolate the number of months of survival for each row. is there a regular expression that can find the first instance of a ";", delete everything in front of it-- and find the second instance of an ";" and delete everything behind it? in python there is a function line.find(), would be grateful to hear the R equiv; or, any other better alternatives to get the number of months of survival stored as a variable. Much Thank You! [[alternative HTML version deleted]]
On Wed, Feb 29, 2012 at 2:24 PM, Fred G <bayespokerguy at gmail.com> wrote:> Computer Friends, > > with the following example lines: > > [107] "98-610: Cell type: S; Surv(months): 6; STATUS(0=alive, 1=dead): 1" > > [108] "99-625: Cell type: S; Surv(months): 21; STATUS(0=alive, 1=dead): 1" > > i want to be able to isolate the number of months of survival for each row. > > is there a regular expression that can find the first instance of a ";", > delete everything in front of it-- and find the second instance of an ";" > and delete everything behind it? in python there is a function line.find(), > would be grateful to hear the R equiv; or, any other better alternatives to > get the number of months of survival stored as a variable. >This extracts all the numeric fields: # sample data Lines <- c("98-610: Cell type: S; Surv(months): 6; STATUS(0=alive, 1=dead): 1", "99-625: Cell type: S; Surv(months): 21; STATUS(0=alive, 1=dead): 1") library(gsubfn) strapply(Lines, "(\\d+);", as.numeric, simplify = TRUE) # We can also get all numeric fields in case that is of interest: strapply(Lines, "\\d+", as.numeric, simplify = rbind) -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
On Feb 29, 2012, at 2:24 PM, Fred G wrote:> Computer Friends, > > with the following example lines:Modified to be correct R code. Please emulate my example in the future. inp <-c( "98-610: Cell type: S; Surv(months): 6; STATUS(0=alive, 1=dead): 1", "99-625: Cell type: S; Surv(months): 21; STATUS(0=alive, 1=dead): 1")> > i want to be able to isolate the number of months of survival for > each row. > > is there a regular expression that can find the first instance of a > ";", > delete everything in front of it-- and find the second instance of > an ";" > and delete everything behind it? in python there is a function > line.find(), > would be grateful to hear the R equiv; or, any other better > alternatives to > get the number of months of survival stored as a variable.You can use either regex methods (noting that the "?" is necessary to defeat the default greedy nature of regex match. > sub( ";.+$", "", sub("^.+?;", "", inp) ) [1] " Surv(months): 6" " Surv(months): 21" ... or you can read these as lines and pass the results to read.table with sep =";". > read.table(text=inp, sep=";", stringsAsFactors=FALSE)[ ,2] [1] " Surv(months): 6" " Surv(months): 21"> > [[alternative HTML version deleted]]Please learn to post in palin text. -- David Winsemius, MD West Hartford, CT
gsub('.+; (.+);.+','\\1',x) or if you just want the value out: gsub('.+; Surv\\(months\\): ([0-9]+);.+','\\1',x) You can also look at strsplit:> strsplit(x,';')[[1]] [1] "99-625: Cell type: S" " Surv(months): 21" " STATUS(0=alive, 1=dead): 1"> lapply(strsplit(x,';'),'[',2)[[1]] [1] " Surv(months): 21" But i would follow David's second suggestion and just read them in with sep=';' instead. Justin On Wed, Feb 29, 2012 at 11:24 AM, Fred G <bayespokerguy@gmail.com> wrote:> Computer Friends, > > with the following example lines: > > [107] "98-610: Cell type: S; Surv(months): 6; STATUS(0=alive, 1=dead): 1" > > [108] "99-625: Cell type: S; Surv(months): 21; STATUS(0=alive, 1=dead): 1" > > i want to be able to isolate the number of months of survival for each row. > > is there a regular expression that can find the first instance of a ";", > delete everything in front of it-- and find the second instance of an ";" > and delete everything behind it? in python there is a function line.find(), > would be grateful to hear the R equiv; or, any other better alternatives to > get the number of months of survival stored as a variable. > > Much Thank You! > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]