Hi all, I read in a column which looks like "chr1:000889594-000889638", and need to break them into three columns like "chr1:", "000889594" and "000889638". How shall I do in R. Thanks a lot for your suggestions! Bill
Wacek Kusnierczyk
2009-Feb-05 23:38 UTC
[R] how to separate char and num within a variable
Bill Hyman wrote:> Hi all, > > I read in a column which looks like "chr1:000889594-000889638", and need to break them into three columns like "chr1:", "000889594" and "000889638". How shall I do in R. Thanks a lot for your suggestions! > >if strings is your vector of strings, this should do (assuming the format is stable across all entries): strsplit(strings, split=':|-') vQ
on 02/05/2009 05:20 PM Bill Hyman wrote:> Hi all, > > I read in a column which looks like "chr1:000889594-000889638", and > need to break them into three columns like "chr1:", "000889594" and > "000889638". How shall I do in R. Thanks a lot for your suggestions!See ?strsplit Vec <- "chr1:000889594-000889638"> Vec[1] "chr1:000889594-000889638" # Use a regular expression, defining the 'split' character # as either ":" or "-", where the vertical bar means 'or':> strsplit(Vec, split = ":|-")[[1]] [1] "chr1" "000889594" "000889638" Note that the split characters are not retained in the result. Let's presume that you have a column in a data frame of the original data and wish to split it into 3 columns: DF <- data.frame(Col = rep(Vec, 10))> DFCol 1 chr1:000889594-000889638 2 chr1:000889594-000889638 3 chr1:000889594-000889638 4 chr1:000889594-000889638 5 chr1:000889594-000889638 6 chr1:000889594-000889638 7 chr1:000889594-000889638 8 chr1:000889594-000889638 9 chr1:000889594-000889638 10 chr1:000889594-000889638 Note that by default, 'Col' will be a factor and strsplit() expects a character vector, thus we do the coercion and use do.call() to create a character matrix, via rbind(), from the result:> do.call(rbind, strsplit(as.character(DF$Col), split = ":|-"))[,1] [,2] [,3] [1,] "chr1" "000889594" "000889638" [2,] "chr1" "000889594" "000889638" [3,] "chr1" "000889594" "000889638" [4,] "chr1" "000889594" "000889638" [5,] "chr1" "000889594" "000889638" [6,] "chr1" "000889594" "000889638" [7,] "chr1" "000889594" "000889638" [8,] "chr1" "000889594" "000889638" [9,] "chr1" "000889594" "000889638" [10,] "chr1" "000889594" "000889638" See ?regex, ?do.call and ?rbind for more information. HTH, Marc Schwartz
Thank you! ----- Original Message ---- From: "markleeds at verizon.net" <markleeds at verizon.net> To: Bill Hyman <billhyman1 at yahoo.com> Sent: Thursday, February 5, 2009 3:35:54 PM Subject: RE: [R] how to separate char and num within a variable hi: you can do below but there should possibly be a better way so I'm sending offline in order to encourage better replies from the guRus. good luck. temp <- "chr1:000889594-000889638" temp2 <- strsplit(temp, ":|-") print(temp2) "chr1:000889594-000889638", On Thu, Feb 5, 2009 at 6:20 PM, Bill Hyman wrote:> Hi all, > > I read in a column which looks like "chr1:000889594-000889638", and need to break them into three columns like "chr1:", "000889594" and "000889638". How shall I do in R. Thanks a lot for your suggestions! > > Bill > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.