Stathis Metsovitis
2012-Apr-03 00:04 UTC
[R] pairwise linear regression between two large datasets
Hi all, I am trying to perform some analysis on the residuals of pair-wise linear regressions between two large sets A with dimensions {k x m}, and B {k x n} . So I need to regress every column B[,j] of B on every column A[,i] of A and produce a matrix C with dimensions {m x n}, so that C[i,j] contains the z-score of the k-th (last) residual of the aforementioned linear regression. I have tried the following code, but i don't seem to get it work. Moreover, any idea of using a for loop is disastrous since A and B are very large matrices. I'd be grateful for any suggestions! C <- apply( A[ ,1:dim(A)[2] ], 2, linRegZscore, y=A[ ,1:dim(A)[2]] ) where linRegZscore is the following function linRegUtility1 <- function(x,y){ regRes <- lm(y~x)$residuals ( regRes[dim(regRes)[1]]-apply(regRes, 2, mean) )/( apply(regRes,2, sd) ) } [[alternative HTML version deleted]]
actually i figured it out: regRes[dim(regRes)[1]] should be regRes[dim(regRes)[1], ] -- View this message in context: http://r.789695.n4.nabble.com/pairwise-linear-regression-between-two-large-datasets-tp4527476p4527680.html Sent from the R help mailing list archive at Nabble.com.
If "any idea of a for loop is disastrous," why are you using apply, which is basically a for loop? I think you have framed the question in such a way that loops are inevitable. You are already using the LHS as a matrix, which is the main speedup I could think of. However, you can avoid some subscripting and use colMeans for a bit of speedup. # fake data... you didn't provide any A <- data.frame( a=1:10, b=(0:9)*2 ) B <- data.frame( c=A$a+rnorm(10), d=A$b+rnorm(10), e=A$a+A$b+rnorm(10) ) A <- as.matrix(A) B <- as.matrix(B) linRegUtility2 <- function(x,y){ regRes <- lm(y~x)[["residuals"]] ( regRes[ nrow( regRes ), ] - colMeans( regRes ) )/ apply( regRes, 2, sd ) } C <- apply( A, 2, linRegUtility2, y=B ) However, this is only getting about 9% improvement on my computer. Stathis Metsovitis <stmetsov at gmail.com> wrote:>Hi all, >I am trying to perform some analysis on the residuals of pair-wise >linear >regressions between two large sets A with dimensions {k x m}, and B {k >x n} >. So I need to regress every column B[,j] of B on every column A[,i] > of >A >and produce a matrix C with dimensions {m x n}, so that C[i,j] > contains >the >z-score of the k-th (last) residual of the aforementioned linear >regression. > >I have tried the following code, but i don't seem to get it work. >Moreover, >any idea of using a for loop is disastrous since A and B are very > large >matrices. I'd be grateful for any suggestions! >C <- apply( A[ ,1:dim(A)[2] ], 2, linRegZscore, y=A[ ,1:dim(A)[2]] ) > >where linRegZscore is the following function >linRegUtility1 <- function(x,y){ > regRes <- lm(y~x)$residuals >( regRes[dim(regRes)[1]]-apply(regRes, 2, mean) )/( apply(regRes,2, > sd) >) > } > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.