Christine SINOQUET
2010-Feb-06 18:15 UTC
[R] optimized R-selection and R-replacement inside a matrix need, strings coerced to factors
Hello, I encounter two problems : First, I need to modify some huge arrays (2000 individuals x 50 000 variables). To format the data, I think I should benefit from optimized R-selection and R-replacement inside a matrix and prohibite a naive use of loops. Thank you in advance for providing information about the following problem : file A : 2 000 individuals in rows 50 000 columns corresponding to 50 000 variables : each value belongs to {0, 1, 2} file B : 50 000 variables in rows 1st column : character (A,C,G,T) corresponding to code 0 2nd colomn : character corresponding to code 1 convention: if A[,j]=0, one wants to replace 0 with character in B[j,1] twice if A[,j]=1, one wants to replace 1 with character in B[j,1] and character in B[j,2] if A[,j]=2, one wants to replace 2 with character in B[j,2] and character in B[j,2] C <- matrix(0,2000,0) # initialization to void matrix for(j in 1:2000){ c <- A[,j] zeros <- which(c==0); ones <- which(c==1); twos <- which(c==2); rm(c) c1 <- matrix("Z",2000) c2 <- matrix("Z",2000) c1[zeros] <- B$V1[j]; c2[zeros] <-B$V1[j] c1[ones] <- B$V1[j]; c2[ones] <-B$V2[j] c1[twos] <- B$V2[j]; c2[twos] <-B$V2[j] C <- cbind(C, cbind(c1,c2)) } I do think some more elaborated solution might exist. _______________________ However, testing this naive implementation restricting to 6 individuals and variable number 6 (in B), I encounter the problem of character strings coerced to numbers. coding.txt *allele0 allele1 A C G T A G G C G T A T* c <- data.frame(x=1:6,y=c(0,1,2,0,1,2)) A <- c$y zeros <- which(A==0); ones <- which(A==1); twos <- which(A==2); rm(A) c1 <- matrix("Z",6) c2 <- matrix("Z",6) B <- read.table(file="coding.txt",h=T) c1[zeros] <- B$allele0[6]; c2[zeros] <-B$allele0[6] c1[ones] <- B$allele0[6]; c2[ones] <-B$allele1[6] c1[twos] <- B$allele1[6]; c2[twos] <-B$allele1[6] results obtained for c1 and c2 : > c1 [,1] [1,] "1" [2,] "1" [3,] "3" [4,] "1" [5,] "1" [6,] "3" > c2 [,1] [1,] "1" [2,] "3" [3,] "3" [4,] "1" [5,] "3" [6,] "3" Thanks in advance for your help.
Christine SINOQUET
2010-Feb-07 14:13 UTC
[R] optimized R-selection and R-replacement inside a matrix need, strings coerced to factors
Hello, I need to modify some huge arrays (2000 individuals x 50 000 variables). To format the data, I think I should benefit from optimized R-selection and R-replacement inside a matrix and prohibite a naive use of loops. Thank you in advance for providing information about the following problem : file A : 2 000 individuals in rows 50 000 columns corresponding to 50 000 variables : each value belongs to {0, 1, 2} file B : 50 000 variables in rows 1st column : character (A,C,G,T) corresponding to code 0 2nd colomn : character corresponding to code 1 convention: if A[,j]=0, one wants to replace 0 with character in B[j,1] twice if A[,j]=1, one wants to replace 1 with character in B[j,1] and character in B[j,2] if A[,j]=2, one wants to replace 2 with character in B[j,2] and character in B[j,2] C <- matrix(0,2000,0) # initialization to void matrix for(j in 1:2000){ c <- A[,j] zeros <- which(c==0); ones <- which(c==1); twos <- which(c==2); rm(c) c1 <- matrix("Z",2000) c2 <- matrix("Z",2000) c1[zeros] <- B$V1[j]; c2[zeros] <-B$V1[j] c1[ones] <- B$V1[j]; c2[ones] <-B$V2[j] c1[twos] <- B$V2[j]; c2[twos] <-B$V2[j] C <- cbind(C, cbind(c1,c2)) } I do think some more elaborated solution might exist. Thanks in advance for your help.