arun
2013-Nov-12  03:01 UTC
[R] Apply function to every 20 rows between pairs of columns in a matrix
HI,
It's not very clear.
set.seed(25)
dat1 <-
as.data.frame(matrix(sample(c("A","T","G","C"),46482*56,replace=TRUE),ncol=56,nrow=46482),stringsAsFactors=FALSE)
?lst1 <- split(dat1,as.character(gl(nrow(dat1),20,nrow(dat1))))
res <- lapply(lst1,function(x) sapply(x[,1:8],function(y) sapply(x[,9:56],
function(z) sum(y==z)/20)))
?length(res)
#[1] 2325? ### check here
?dim(res[[1]])
#[1] 48? 8
A.K.
Hi all, I have a set of genetic SNP data that looks like 
Founder1 Founder2 Founder3 Founder4 Founder5 Founder6 Founder7 Founder8 Sample1
Sample2 Sample3 Sample...
A A A T T T T T A T A T 
A A A T T T T T A T A T 
A A A T T T T T A T A T 
A A A T T T T T A T A T 
A A A T T T T T A T A T 
A A A T T T T T A T A T 
A A A T T T T T A T A T 
A A A T T T T T A T A T 
A A A T T T T T A T A T 
A A A T T T T T A T A T 
A A A T T T T T A T A T 
A A A T T T T T A T A T 
The size of the matrix is 56 columns by 46482 rows. I need to 
first bin the matrix by every 20 rows, then compare each of the first 8 
columns (founders) to each columns 9-56, and divide the total number of 
matching letters/alleles by the total number of rows (20). Ultimately I 
need 48 8 column by 2342 row matrices, which are essentially similarity 
matrices. I have tried to extract each pair separately by something like 
"length(cbind(odd[,9],odd[,1])[cbind(odd[,9],cbind(odd[,9],odd[,1])[,1])[,1]=="T"
 &
cbind(odd[,9],odd[,1])[,2]=="T",])/nrow(cbind(odd[,9],odd[,1]))"
but this is no where near efficient, and I do not know of a 
faster way of applying the function to every 20 rows and across multiple
 pairs. 
In the example given above, if the rows were all identical like 
shown across 20 rows, then the first row of the matrix for Sample1 would
 be 
1 1 1 0 0 0 0
arun
2013-Nov-12  03:43 UTC
[R] Apply function to every 20 rows between pairs of columns in a matrix
HI,
set.seed(25)
dat1 <-
as.data.frame(matrix(sample(c("A","T","G","C"),46482*56,replace=TRUE),ncol=56,nrow=46482),stringsAsFactors=FALSE)
?lst1 <- split(dat1,as.character(gl(nrow(dat1),20,nrow(dat1))))
res <- lapply(lst1,function(x) sapply(x[,1:8],function(y) sapply(x[,9:56],
function(z) sum(y==z)/20)))
?length(res)
#[1] 2325? ### check here
?dim(res[[1]])
#[1] 48? 8
A.K.
Hi all, I have a set of genetic SNP data that looks like 
Founder1 Founder2 Founder3 Founder4 Founder5 Founder6 Founder7 Founder8 Sample1
Sample2 Sample3 Sample...
A A A T T T T T A T A T 
A A A T T T T T A T A T 
A A A T T T T T A T A T 
A A A T T T T T A T A T 
A A A T T T T T A T A T 
A A A T T T T T A T A T 
A A A T T T T T A T A T 
A A A T T T T T A T A T 
A A A T T T T T A T A T 
A A A T T T T T A T A T 
A A A T T T T T A T A T 
A A A T T T T T A T A T 
The size of the matrix is 56 columns by 46482 rows. I need to 
first bin the matrix by every 20 rows, then compare each of the first 8 
columns (founders) to each columns 9-56, and divide the total number of 
matching letters/alleles by the total number of rows (20). Ultimately I 
need 48 8 column by 2342 row matrices, which are essentially similarity 
matrices. I have tried to extract each pair separately by something like 
"length(cbind(odd[,9],odd[,1])[cbind(odd[,9],cbind(odd[,9],odd[,1])[,1])[,1]=="T"
&
cbind(odd[,9],odd[,1])[,2]=="T",])/nrow(cbind(odd[,9],odd[,1]))"
but this is no where near efficient, and I do not know of a 
faster way of applying the function to every 20 rows and across multiple
pairs. 
In the example given above, if the rows were all identical like 
shown across 20 rows, then the first row of the matrix for Sample1 would
be 
1 1 1 0 0 0 0