thr3ads.net - R help - [R] pairwise linear regression between two large datasets [Apr 2012]

If this information is useful, please help other people find it:
Share via:

Stathis Metsovitis

2012-Apr-03 00:04 UTC

[R] pairwise linear regression between two large datasets

Hi all,
I am trying to perform some analysis on the residuals of pair-wise linear
regressions between two large sets A with dimensions {k x m}, and B {k x n}
. So I need to regress every column B[,j] of B on every column A[,i] of A
and produce a matrix C with dimensions {m x n}, so that C[i,j] contains the
z-score of the k-th (last) residual of the aforementioned linear regression.

I have tried the following code, but i don't seem to get it work. Moreover,
any idea of using a for loop is disastrous since A and B are very large
matrices. I'd be grateful for any suggestions!
C <- apply( A[ ,1:dim(A)[2] ], 2, linRegZscore, y=A[ ,1:dim(A)[2]] )

where linRegZscore is the following function
linRegUtility1 <- function(x,y){
  regRes <- lm(y~x)$residuals
  ( regRes[dim(regRes)[1]]-apply(regRes, 2, mean) )/( apply(regRes,2, sd) )
  }

	[[alternative HTML version deleted]]

stathisan

2012-Apr-03 02:48 UTC

head link

[R] pairwise linear regression between two large datasets

actually i figured it out:

 regRes[dim(regRes)[1]] should be regRes[dim(regRes)[1], ]

--
View this message in context:
http://r.789695.n4.nabble.com/pairwise-linear-regression-between-two-large-datasets-tp4527476p4527680.html
Sent from the R help mailing list archive at Nabble.com.

jdnewmil

2012-Apr-03 04:07 UTC

head link

[R] pairwise linear regression between two large datasets

If "any idea of a for loop is disastrous," why are you using apply, 
which is basically a for loop?

I think you have framed the question in such a way that loops are 
inevitable.  You are already using the LHS as a matrix, which is the 
main speedup I could think of. However, you can avoid some subscripting 
and use colMeans for a bit of speedup.

# fake data... you didn't provide any
A <- data.frame( a=1:10, b=(0:9)*2 )
B <- data.frame( c=A$a+rnorm(10), d=A$b+rnorm(10), e=A$a+A$b+rnorm(10) 
)
A <- as.matrix(A)
B <- as.matrix(B)

linRegUtility2 <- function(x,y){
    regRes <- lm(y~x)[["residuals"]]
    ( regRes[ nrow( regRes ), ] - colMeans( regRes ) )/ apply( regRes, 
2, sd )
    }
C <- apply( A, 2, linRegUtility2, y=B )

However, this is only getting about 9% improvement on my computer.

Stathis Metsovitis <stmetsov at gmail.com> wrote:
>Hi all,
>I am trying to perform some analysis on the residuals of pair-wise
>linear
>regressions between two large sets A with dimensions {k x m}, and B {k
>x n}
>. So I need to regress every column B[,j] of B on every column A[,i] 
> of
>A
>and produce a matrix C with dimensions {m x n}, so that C[i,j] 
> contains
>the
>z-score of the k-th (last) residual of the aforementioned linear
>regression.
>
>I have tried the following code, but i don't seem to get it work.
>Moreover,
>any idea of using a for loop is disastrous since A and B are very 
> large
>matrices. I'd be grateful for any suggestions!
>C <- apply( A[ ,1:dim(A)[2] ], 2, linRegZscore, y=A[ ,1:dim(A)[2]] )
>
>where linRegZscore is the following function
>linRegUtility1 <- function(x,y){
>  regRes <- lm(y~x)$residuals
>( regRes[dim(regRes)[1]]-apply(regRes, 2, mean) )/( apply(regRes,2, 
> sd)
>)
>  }
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

Maybe Matching Threads

Search for more reasonably related threads

R help - Apr 2012 - pairwise linear regression between two large datasets

[R] pairwise linear regression between two large datasets

[R] pairwise linear regression between two large datasets

[R] pairwise linear regression between two large datasets

Maybe Matching Threads