arun
2013-Nov-12 03:01 UTC
[R] Apply function to every 20 rows between pairs of columns in a matrix
HI, It's not very clear. set.seed(25) dat1 <- as.data.frame(matrix(sample(c("A","T","G","C"),46482*56,replace=TRUE),ncol=56,nrow=46482),stringsAsFactors=FALSE) ?lst1 <- split(dat1,as.character(gl(nrow(dat1),20,nrow(dat1)))) res <- lapply(lst1,function(x) sapply(x[,1:8],function(y) sapply(x[,9:56], function(z) sum(y==z)/20))) ?length(res) #[1] 2325? ### check here ?dim(res[[1]]) #[1] 48? 8 A.K. Hi all, I have a set of genetic SNP data that looks like Founder1 Founder2 Founder3 Founder4 Founder5 Founder6 Founder7 Founder8 Sample1 Sample2 Sample3 Sample... A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T The size of the matrix is 56 columns by 46482 rows. I need to first bin the matrix by every 20 rows, then compare each of the first 8 columns (founders) to each columns 9-56, and divide the total number of matching letters/alleles by the total number of rows (20). Ultimately I need 48 8 column by 2342 row matrices, which are essentially similarity matrices. I have tried to extract each pair separately by something like "length(cbind(odd[,9],odd[,1])[cbind(odd[,9],cbind(odd[,9],odd[,1])[,1])[,1]=="T" & cbind(odd[,9],odd[,1])[,2]=="T",])/nrow(cbind(odd[,9],odd[,1]))" but this is no where near efficient, and I do not know of a faster way of applying the function to every 20 rows and across multiple pairs. In the example given above, if the rows were all identical like shown across 20 rows, then the first row of the matrix for Sample1 would be 1 1 1 0 0 0 0
arun
2013-Nov-12 03:43 UTC
[R] Apply function to every 20 rows between pairs of columns in a matrix
HI, set.seed(25) dat1 <- as.data.frame(matrix(sample(c("A","T","G","C"),46482*56,replace=TRUE),ncol=56,nrow=46482),stringsAsFactors=FALSE) ?lst1 <- split(dat1,as.character(gl(nrow(dat1),20,nrow(dat1)))) res <- lapply(lst1,function(x) sapply(x[,1:8],function(y) sapply(x[,9:56], function(z) sum(y==z)/20))) ?length(res) #[1] 2325? ### check here ?dim(res[[1]]) #[1] 48? 8 A.K. Hi all, I have a set of genetic SNP data that looks like Founder1 Founder2 Founder3 Founder4 Founder5 Founder6 Founder7 Founder8 Sample1 Sample2 Sample3 Sample... A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T A A A T T T T T A T A T The size of the matrix is 56 columns by 46482 rows. I need to first bin the matrix by every 20 rows, then compare each of the first 8 columns (founders) to each columns 9-56, and divide the total number of matching letters/alleles by the total number of rows (20). Ultimately I need 48 8 column by 2342 row matrices, which are essentially similarity matrices. I have tried to extract each pair separately by something like "length(cbind(odd[,9],odd[,1])[cbind(odd[,9],cbind(odd[,9],odd[,1])[,1])[,1]=="T" & cbind(odd[,9],odd[,1])[,2]=="T",])/nrow(cbind(odd[,9],odd[,1]))" but this is no where near efficient, and I do not know of a faster way of applying the function to every 20 rows and across multiple pairs. In the example given above, if the rows were all identical like shown across 20 rows, then the first row of the matrix for Sample1 would be 1 1 1 0 0 0 0