I have a data.frame as the following: var1 var2 9G/G09 abd89C/T90 10A/T9 32C/C 90G/G A/A . . . . . . 10T/C 00G/G90 What I want is to get the letters which are on the left and right of '/'. for example, for "9G/G09", I only want "G", "G", and for "abd89C/T90", I only want "C" and "T", how to get these? thank you, karena -- View this message in context: http://r.789695.n4.nabble.com/string-handling-tp2242119p2242119.html Sent from the R help mailing list archive at Nabble.com.
try this:> x <- "1234C/Tasdf" > y <- strsplit(sub("^.*(.)/(.).*", "\\1 \\2", x),' ')[[1]] > y[1] "C" "T">On Thu, Jun 3, 2010 at 2:18 PM, karena <dr.jzhou at gmail.com> wrote:> > I have a data.frame as the following: > var1 ? ? ? ?var2 > 9G/G09 ? ?abd89C/T90 > 10A/T9 ? ?32C/C > 90G/G ? ? ?A/A > . ? ? ? ? ? ? . > . ? ? ? ? ? ? . > . ? ? ? ? ? ? . > 10T/C ? ? ?00G/G90 > > What I want is to get the letters which are on the left and right of '/'. > for example, for "9G/G09", I only want "G", "G", and for "abd89C/T90", I > only want "C" and "T", how to get these? > > thank you, > > karena > -- > View this message in context: http://r.789695.n4.nabble.com/string-handling-tp2242119p2242119.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
Hope it helps. text <- "var1 var2 9G/G09 abd89C/T90 10A/T9 32C/C 90G/G A/A" x <- read.table(textConnection(text), header = T) x$var1.1 <- sub(".*(.)/.*", "\\1", x$var1) x$var1.2 <- sub(".*/(.).*", "\\1", x$var1) x$var2.1 <- sub(".*(.)/.*", "\\1", x$var2) x$var2.2 <- sub(".*/(.).*", "\\1", x$var2) ----- A R learner. -- View this message in context: http://r.789695.n4.nabble.com/string-handling-tp2242119p2242357.html Sent from the R help mailing list archive at Nabble.com.
On Thu, Jun 3, 2010 at 4:06 PM, Wu Gong <wg2f at mtmail.mtsu.edu> wrote:> > Hope it helps. > > text <- "var1 ? ? ? ?var2 > 9G/G09 ? ?abd89C/T90 > 10A/T9 ? ?32C/C > 90G/G ? ? ?A/A" > > x <- read.table(textConnection(text), header = T)Or with the stringr package: library(stringr) str_match(x$var1, "(.)/(.)") Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/
This solution using strapply in gsubfn is along the same lines as the stringr solution. First we read in the data using as.is = TRUE so that we get character rather than factor columns. On the other hand, if your data is already in columns with class factor then just replace strappy(x, ...) with strapply(as.character(x), ...) below. Then lapply over the columns of DF using strapply on each one. See home page at http://gsubfn.googlecode.com for more.> Lines <- "var1 var2+ 9G/G09 abd89C/T90 + 10A/T9 32C/C + 90G/G A/A"> > library(gsubfn) > DF <- read.table(textConnection(Lines), header = TRUE, as.is = TRUE) > lapply(DF, function(x) strapply(x, "(.)/(.)", c, simplify = rbind))$var1 [,1] [,2] [1,] "G" "G" [2,] "A" "T" [3,] "G" "G" $var2 [,1] [,2] [1,] "C" "T" [2,] "C" "C" [3,] "A" "A" Also a slight simplification is possible using gsubfn's capability of representing a one line function as a formula. We just preface lapply with fn$ and then formulas appearing in the arguments (subject to certain rules) are interpreted as functions. Here, the formula in the second argument to lapply is interpreted as the anonymous function we used above:> fn$lapply(DF, x ~ strapply(x, "(.)/(.)", c, simplify = rbind))$var1 [,1] [,2] [1,] "G" "G" [2,] "A" "T" [3,] "G" "G" $var2 [,1] [,2] [1,] "C" "T" [2,] "C" "C" [3,] "A" "A" On Thu, Jun 3, 2010 at 2:18 PM, karena <dr.jzhou at gmail.com> wrote:> > I have a data.frame as the following: > var1 ? ? ? ?var2 > 9G/G09 ? ?abd89C/T90 > 10A/T9 ? ?32C/C > 90G/G ? ? ?A/A > . ? ? ? ? ? ? . > . ? ? ? ? ? ? . > . ? ? ? ? ? ? . > 10T/C ? ? ?00G/G90 > > What I want is to get the letters which are on the left and right of '/'. > for example, for "9G/G09", I only want "G", "G", and for "abd89C/T90", I > only want "C" and "T", how to get these? > > thank you, > > karena > -- > View this message in context: http://r.789695.n4.nabble.com/string-handling-tp2242119p2242119.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >