Hi, I have a data.frame as following: var1 var2 1 ab_c_(ok) 2 okf789(db)_c 3 jojfiod(90).gt 4 "ij"_(78)__op 5 (iojfodjfo)_ab what I want is to create a new variable called "var3". the value of var3 is the content in the Parentheses. so var3 would be: var3 ok db 90 78 iojfodjfo how to do this? thanks, karena -- View this message in context: http://r.789695.n4.nabble.com/question-about-string-handling-tp2289178p2289178.html Sent from the R help mailing list archive at Nabble.com.
On Wed, Jul 14, 2010 at 2:21 PM, karena <dr.jzhou at gmail.com> wrote:> > Hi, > > I have a data.frame as following: > var1 ? ? ? ? var2 > 1 ? ? ? ? ? ab_c_(ok) > 2 ? ? ? ? ? okf789(db)_c > 3 ? ? ? ? ? jojfiod(90).gt > 4 ? ? ? ? ? "ij"_(78)__op > 5 ? ? ? ? ? (iojfodjfo)_ab > > what I want is to create a new variable called "var3". the value of var3 is > the content in the Parentheses. so var3 would be: > var3 > ok > db > 90 > 78 > iojfodjfo >Here are several alternatives. The gsub solution matches everything up to the ( as well as everything after the ) and replaces each with nothing. The strsplit solution splits each into three fields, everything before the (, everything with in the (), and everything after the ) and the picks off the second. The strapply solution matches everything from ( to ) and returns everything between them. The below works whether DF$var2 is factor or character but if you know its character you can drop the as.character in #2 and #3. # 1 gsub(".*[(]|[)].*", "", DF$var2) # 2 sapply(strsplit(as.character(DF$var2), "[()]"), "[", 2) # 3 library(gsubfn) strapply(as.character(DF$var2), "[(](.*)[)]", simplify = TRUE)
Try this: text <- 'var1 var2 1 ab_c_(ok) 2 okf789(db)_c 3 jojfiod(90).gt 4 "ij"_(78)__op 5 (iojfodjfo)_ab' df <- read.table(textConnection(text), head=T, sep=" ",quote="") df$var3 <- gsub("(.*\\()(.*)(\\).*)","\\2",df$var2) ----- A R learner. -- View this message in context: http://r.789695.n4.nabble.com/question-about-string-handling-tp2289178p2289327.html Sent from the R help mailing list archive at Nabble.com.
Another option could be: df$var3 <- gsub(".*\\((.*)\\).*", "\\1", df$var2) On Wed, Jul 14, 2010 at 3:21 PM, karena <dr.jzhou@gmail.com> wrote:> > Hi, > > I have a data.frame as following: > var1 var2 > 1 ab_c_(ok) > 2 okf789(db)_c > 3 jojfiod(90).gt > 4 "ij"_(78)__op > 5 (iojfodjfo)_ab > > what I want is to create a new variable called "var3". the value of var3 is > the content in the Parentheses. so var3 would be: > var3 > ok > db > 90 > 78 > iojfodjfo > > how to do this? > > thanks, > > karena > -- > View this message in context: > http://r.789695.n4.nabble.com/question-about-string-handling-tp2289178p2289178.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]]
hey, guys, all these methods work perfectly. thank you!! -- View this message in context: http://r.789695.n4.nabble.com/question-about-string-handling-tp2289178p2291497.html Sent from the R help mailing list archive at Nabble.com.