aldi
2015-Sep-10 17:46 UTC
[R] how to split row elements [1] and [2] of a string variable A via srtsplit and sapply
Hi, I have a data.frame x1, of which a variable A needs to be split by element 1 and element 2 where separator is ":". Sometimes could be three elements in A, but I do not need the third element. Since R does not have a SCAN function as in SAS, C=scan(A,1,":"); D=scan(A,2,":"); I am using a combination of strsplit and sapply. If I do not use the index [i] then R captures the full vector . Instead I need row by row capturing the first and the second element and from them create two new variables C and D. Right now as is somehow in the loop i C is captured correctly, but D is missing because the variables AA does not have it. Any suggestions? Thank you in advance, Aldi A B 1:29439275 0.46773514 5:85928892 0.81283052 10:128341232 0.09332543 1:106024283:ID 0.36307805 3:62707519 0.42657952 2:80464120 0.89125094 x1<-read.table(file='./test.txt',head=T,sep='\t') x1$A <- as.character(x1$A) for(i in 1:length(x1$A)){ x1$AA[i] <- as.numeric(unlist(strsplit(x1$A[i],':'))) x1$C[i] <- sapply(x1$AA[i],function(x)x[1]) x1$D[i] <- sapply(x1$AA[i],function(x)x[2]) } x1 > x1 A B AA C D 1 1:29439275 0.46773514 1 1 NA 2 5:85928892 0.81283052 5 5 NA 3 10:128341232 0.09332543 10 10 NA 4 1:106024283:ID 0.36307805 1 1 NA 5 3:62707519 0.42657952 3 3 NA 6 2:80464120 0.89125094 2 2 NA -- [[alternative HTML version deleted]]
jim holtman
2015-Sep-10 18:05 UTC
[R] how to split row elements [1] and [2] of a string variable A via srtsplit and sapply
try this:> x <- read.table(text = "A B+ 1:29439275 0.46773514 + 5:85928892 0.81283052 + 10:128341232 0.09332543 + 1:106024283:ID 0.36307805 + 3:62707519 0.42657952 + 2:80464120 0.89125094", header = TRUE, as.is = TRUE)> > temp <- strsplit(x$A, ":") > x$C <- sapply(temp, '[[', 1) > x$D <- sapply(temp, '[[', 2) > > xA B C D 1 1:29439275 0.46773514 1 29439275 2 5:85928892 0.81283052 5 85928892 3 10:128341232 0.09332543 10 128341232 4 1:106024283:ID 0.36307805 1 106024283 5 3:62707519 0.42657952 3 62707519 6 2:80464120 0.89125094 2 80464120 Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Thu, Sep 10, 2015 at 1:46 PM, aldi <aldi at wustl.edu> wrote:> Hi, > I have a data.frame x1, of which a variable A needs to be split by > element 1 and element 2 where separator is ":". Sometimes could be three > elements in A, but I do not need the third element. > > Since R does not have a SCAN function as in SAS, C=scan(A,1,":"); > D=scan(A,2,":"); > I am using a combination of strsplit and sapply. If I do not use the > index [i] then R captures the full vector . Instead I need row by row > capturing the first and the second element and from them create two new > variables C and D. > Right now as is somehow in the loop i C is captured correctly, but D is > missing because the variables AA does not have it. Any suggestions? > Thank you in advance, Aldi > > A B > 1:29439275 0.46773514 > 5:85928892 0.81283052 > 10:128341232 0.09332543 > 1:106024283:ID 0.36307805 > 3:62707519 0.42657952 > 2:80464120 0.89125094 > > x1<-read.table(file='./test.txt',head=T,sep='\t') > x1$A <- as.character(x1$A) > > for(i in 1:length(x1$A)){ > > x1$AA[i] <- as.numeric(unlist(strsplit(x1$A[i],':'))) > > x1$C[i] <- sapply(x1$AA[i],function(x)x[1]) > x1$D[i] <- sapply(x1$AA[i],function(x)x[2]) > } > > x1 > > > > > x1 > A B AA C D > 1 1:29439275 0.46773514 1 1 NA > 2 5:85928892 0.81283052 5 5 NA > 3 10:128341232 0.09332543 10 10 NA > 4 1:106024283:ID 0.36307805 1 1 NA > 5 3:62707519 0.42657952 3 3 NA > 6 2:80464120 0.89125094 2 2 NA > > > -- > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Bert Gunter
2015-Sep-10 18:35 UTC
[R] how to split row elements [1] and [2] of a string variable A via srtsplit and sapply
... Alternatively, you can avoid the looping (i.e. sapply) altogether by: do.call(rbind,strsplit(x[[1]],":"))[,-3] [,1] [,2] [1,] "1" "29439275" [2,] "5" "85928892" [3,] "10" "128341232" [4,] "1" "106024283" [5,] "3" "62707519" [6,] "2" "80464120" These can then be added to the existing frame, converted to numeric, etc. Cheers, Bert Bert Gunter "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." -- Clifford Stoll On Thu, Sep 10, 2015 at 11:05 AM, jim holtman <jholtman at gmail.com> wrote:> try this: > > >> x <- read.table(text = "A B > + 1:29439275 0.46773514 > + 5:85928892 0.81283052 > + 10:128341232 0.09332543 > + 1:106024283:ID 0.36307805 > + 3:62707519 0.42657952 > + 2:80464120 0.89125094", header = TRUE, as.is = TRUE) >> >> temp <- strsplit(x$A, ":") >> x$C <- sapply(temp, '[[', 1) >> x$D <- sapply(temp, '[[', 2) >> >> x > A B C D > 1 1:29439275 0.46773514 1 29439275 > 2 5:85928892 0.81283052 5 85928892 > 3 10:128341232 0.09332543 10 128341232 > 4 1:106024283:ID 0.36307805 1 106024283 > 5 3:62707519 0.42657952 3 62707519 > 6 2:80464120 0.89125094 2 80464120 > > > > > Jim Holtman > Data Munger Guru > > What is the problem that you are trying to solve? > Tell me what you want to do, not how you want to do it. > > On Thu, Sep 10, 2015 at 1:46 PM, aldi <aldi at wustl.edu> wrote: > >> Hi, >> I have a data.frame x1, of which a variable A needs to be split by >> element 1 and element 2 where separator is ":". Sometimes could be three >> elements in A, but I do not need the third element. >> >> Since R does not have a SCAN function as in SAS, C=scan(A,1,":"); >> D=scan(A,2,":"); >> I am using a combination of strsplit and sapply. If I do not use the >> index [i] then R captures the full vector . Instead I need row by row >> capturing the first and the second element and from them create two new >> variables C and D. >> Right now as is somehow in the loop i C is captured correctly, but D is >> missing because the variables AA does not have it. Any suggestions? >> Thank you in advance, Aldi >> >> A B >> 1:29439275 0.46773514 >> 5:85928892 0.81283052 >> 10:128341232 0.09332543 >> 1:106024283:ID 0.36307805 >> 3:62707519 0.42657952 >> 2:80464120 0.89125094 >> >> x1<-read.table(file='./test.txt',head=T,sep='\t') >> x1$A <- as.character(x1$A) >> >> for(i in 1:length(x1$A)){ >> >> x1$AA[i] <- as.numeric(unlist(strsplit(x1$A[i],':'))) >> >> x1$C[i] <- sapply(x1$AA[i],function(x)x[1]) >> x1$D[i] <- sapply(x1$AA[i],function(x)x[2]) >> } >> >> x1 >> >> >> >> > x1 >> A B AA C D >> 1 1:29439275 0.46773514 1 1 NA >> 2 5:85928892 0.81283052 5 5 NA >> 3 10:128341232 0.09332543 10 10 NA >> 4 1:106024283:ID 0.36307805 1 1 NA >> 5 3:62707519 0.42657952 3 3 NA >> 6 2:80464120 0.89125094 2 2 NA >> >> >> -- >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.