I would like to compare every column in my matrix with every other column and get the r-squared. I tried using the following formula and looping through every column:> summary(lm(matrix[,x]~matrix[,y]))$r.squaredIf I have 10,000 columns, the loops (10,000 * 10,000) take forever even if there is no formula inside. Then, I attempted to vectorize my code:> cor(matrix)^2With 10,000 columns, this works great. With 30,000, R tells me it cannot allocate vector of that length even if the memory limit is set to 4 GBs. Is there anything else I can do to resolve this issue? Thanks. -- View this message in context: http://www.nabble.com/processing-a-large-matrix-tf3216447.html#a8932591 Sent from the R help mailing list archive at Nabble.com.
On Mon, 12 Feb 2007, andy1983 wrote:> > I would like to compare every column in my matrix with every other column and > get the r-squared. > > I tried using the following formula and looping through every column: >> summary(lm(matrix[,x]~matrix[,y]))$r.squared > If I have 10,000 columns, the loops (10,000 * 10,000) take forever even if > there is no formula inside. > > Then, I attempted to vectorize my code: >> cor(matrix)^2 > With 10,000 columns, this works great. With 30,000, R tells me it cannot > allocate vector of that length even if the memory limit is set to 4 GBs.30000^2 doubles * 8 Bytes/double > 6.5 GBs. And that's just to store the result; you will need some space to work in, too.> > Is there anything else I can do to resolve this issue? > > Thanks. > -- > View this message in context: http://www.nabble.com/processing-a-large-matrix-tf3216447.html#a8932591 > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry at tajo.ucsd.edu UC San Diego http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0901
One approach is to split up the work of doing the correlations, if you give the 'cor' function 2 matricies then it gives you the correlations between all pairs of columns. Since you said it works fine with 10,000 columns but not 30,000 you could split into 3 pieces and do something like (untested): out <- rbind( cbind( cor(mymatrix[,1:10000])^2, cor(mymatrix[,1:10000], mymatrix[10001:20000])^2, cor(mymatrix[,1:10000], mymatrix[20001:30000])^2 ), cbind( matrix(NA,10000,10000), cor(mymatrix[,10001:20000])^2, cor(mymatrix[,20001:30000],mymatrix[,1:10000])^2), cbind( matrix(NA,10000,10000), matrix(NA,10000,10000), cor(mymatrix[,20001:30000])^2 ) ) out[ lower.tri(out) ] <- t(out)[ lower.tri(out) ] For breaking into 3 pieces, this is probably easier/quicker than trying to find and alternative. If you need to break it into even more pieces (doing blocks of 1,000 when there are 30,000 columns) then there are probably better alternatives (you could do a loop over blocks, that would be faster than the loop over individual columns). Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at intermountainmail.org (801) 408-8111> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of andy1983 > Sent: Monday, February 12, 2007 1:55 PM > To: r-help at stat.math.ethz.ch > Subject: [R] processing a large matrix > > > I would like to compare every column in my matrix with every > other column and get the r-squared. > > I tried using the following formula and looping through every column: > > summary(lm(matrix[,x]~matrix[,y]))$r.squared > If I have 10,000 columns, the loops (10,000 * 10,000) take > forever even if there is no formula inside. > > Then, I attempted to vectorize my code: > > cor(matrix)^2 > With 10,000 columns, this works great. With 30,000, R tells > me it cannot allocate vector of that length even if the > memory limit is set to 4 GBs. > > Is there anything else I can do to resolve this issue? > > Thanks. > -- > View this message in context: > http://www.nabble.com/processing-a-large-matrix-tf3216447.html#a8932591> Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Given the response by Carles Berry, you should probably really think about what you want to do with the results (I'm hoping that you do not plan to look at every R^2 value personally). For instance if you want to find which variable gives the highest R^2 value for each variable, then this approach may work better: myR2fun <- function(i){ cat("\r",i) # optional flush.console() # optional tmp <- cor( mymat[,i], mymat[,-i] )^2 which.max(tmp) } out <- sapply( 1:30000, myR2fun ) -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at intermountainmail.org (801) 408-8111> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Greg Snow > Sent: Monday, February 12, 2007 2:34 PM > To: andy1983; r-help at stat.math.ethz.ch > Subject: Re: [R] processing a large matrix > > One approach is to split up the work of doing the > correlations, if you give the 'cor' function 2 matricies then > it gives you the correlations between all pairs of columns. > Since you said it works fine with 10,000 columns but not > 30,000 you could split into 3 pieces and do something like (untested): > > out <- rbind( > cbind( cor(mymatrix[,1:10000])^2, > cor(mymatrix[,1:10000], mymatrix[10001:20000])^2, > cor(mymatrix[,1:10000], mymatrix[20001:30000])^2 ), > cbind( matrix(NA,10000,10000), > cor(mymatrix[,10001:20000])^2, > cor(mymatrix[,20001:30000],mymatrix[,1:10000])^2), > cbind( matrix(NA,10000,10000), > matrix(NA,10000,10000), > cor(mymatrix[,20001:30000])^2 ) > ) > > out[ lower.tri(out) ] <- t(out)[ lower.tri(out) ] > > For breaking into 3 pieces, this is probably easier/quicker > than trying to find and alternative. If you need to break it > into even more pieces (doing blocks of 1,000 when there are > 30,000 columns) then there are probably better alternatives > (you could do a loop over blocks, that would be faster than > the loop over individual columns). > > Hope this helps, > > -- > Gregory (Greg) L. Snow Ph.D. > Statistical Data Center > Intermountain Healthcare > greg.snow at intermountainmail.org > (801) 408-8111 > > > > > -----Original Message----- > > From: r-help-bounces at stat.math.ethz.ch > > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of andy1983 > > Sent: Monday, February 12, 2007 1:55 PM > > To: r-help at stat.math.ethz.ch > > Subject: [R] processing a large matrix > > > > > > I would like to compare every column in my matrix with every other > > column and get the r-squared. > > > > I tried using the following formula and looping through > every column: > > > summary(lm(matrix[,x]~matrix[,y]))$r.squared > > If I have 10,000 columns, the loops (10,000 * 10,000) take forever > > even if there is no formula inside. > > > > Then, I attempted to vectorize my code: > > > cor(matrix)^2 > > With 10,000 columns, this works great. With 30,000, R tells me it > > cannot allocate vector of that length even if the memory > limit is set > > to 4 GBs. > > > > Is there anything else I can do to resolve this issue? > > > > Thanks. > > -- > > View this message in context: > > http://www.nabble.com/processing-a-large-matrix-tf3216447.html > #a8932591 > > Sent from the R help mailing list archive at Nabble.com. > > > > ______________________________________________ > > R-help at stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >