deaRs, I want to build a covariance matrix out of the data from a binary file, that I can read in chunk by chunk, with each chunk containing a single observation vector X. I wonder how to do that most efficiently, avoiding the calculation of the full symmetric matrices XX'. The trivial non-optimal approach boils down to something like: Q <- matrix(rnorm(100000),ncol=200) M <- matrix(0,ncol=200,nrow=200) for (i in 1:nrow(Q)) M <- M + tcrossprod(Q[i,]) I would appreciate pointers to help me fill this lacuna in my R skills :) Cheers, Tsjerk -- Tsjerk A. Wassenaar, Ph.D. post-doctoral researcher Molecular Dynamics Group * Groningen Institute for Biomolecular Research and Biotechnology * Zernike Institute for Advanced Materials University of Groningen The Netherlands
rex.dwyer at syngenta.com
2011-Mar-14 16:43 UTC
[R] *Building* a covariance matrix efficiently
Tjerk,
This is just a pseudo code outline of what you need to do:
M = matrix(0, number of variables, number of variables)
V = rep(0, number of variables)
N = 0
While (more observations to read) {
X <- next observation
V <- V + X
M <- M + outer(X,X)
N <- N+1
}
Compute covariance matrix from elements of V,M, and N
You just need to refer to the formula defining covariance.
Outlook seems to think all my variables should be upper case.
HTH
Rex
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of Tsjerk Wassenaar
Sent: Monday, March 14, 2011 10:14 AM
To: R-help
Subject: [R] *Building* a covariance matrix efficiently
deaRs,
I want to build a covariance matrix out of the data from a binary
file, that I can read in chunk by chunk, with each chunk containing a
single observation vector X. I wonder how to do that most efficiently,
avoiding the calculation of the full symmetric matrices XX'. The
trivial non-optimal approach boils down to something like:
Q <- matrix(rnorm(100000),ncol=200)
M <- matrix(0,ncol=200,nrow=200)
for (i in 1:nrow(Q))
M <- M + tcrossprod(Q[i,])
I would appreciate pointers to help me fill this lacuna in my R skills :)
Cheers,
Tsjerk
--
Tsjerk A. Wassenaar, Ph.D.
post-doctoral researcher
Molecular Dynamics Group
* Groningen Institute for Biomolecular Research and Biotechnology
* Zernike Institute for Advanced Materials
University of Groningen
The Netherlands
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
message may contain confidential information. If you are not the designated
recipient, please notify the sender immediately, and delete the original and any
copies. Any use of the message by you is prohibited.
rex.dwyer at syngenta.com
2011-Mar-14 18:57 UTC
[R] *Building* a covariance matrix efficiently
Tsjerk,
It seems to me that memory and not time is your big efficiency problem, and
I've showed you how to avoid storing your entire input.
If you want to avoid doing each multiplication twice, you can replace the
"outer" with a function that computes each product only once and
accumulate sums of those products.
Iii = matrix(c(rep(1:200,200),rep(1:200,each=200)), ncol=2)
Iii = iii[ iii[,1]<=iii[,2]]
and at each step
V2 = v2+X[iii[,1]] * X[iii[,2]]
instead of M.
I would imagine that the internal cov does this anyway, as you are not the first
person to notice this symmetry, so I'm not sure of the point of this
exercise.
PS: I actually know how to spell and pronounce Tsjerk, but Tsj is not a very
familiar pattern for my fingers.
From: Tsjerk Wassenaar [mailto:tsjerkw@gmail.com]
Sent: Monday, March 14, 2011 1:41 PM
To: Dwyer Rex USRE
Subject: Re: RE: [R] *Building* a covariance matrix efficiently
Hi Rex,
Thanks for the reply. But it doesn't solve the issues of redundant
calculations due to symmetry, both in the outer product and in the summation.
Cheers,
Tsjerk (correct spelling, really)
On Mar 14, 2011 5:44 PM,
<rex.dwyer@syngenta.com<mailto:rex.dwyer@syngenta.com>> wrote:
Tjerk,
This is just a pseudo code outline of what you need to do:
M = matrix(0, number of variables, number of variables)
V = rep(0, number of variables)
N = 0
While (more observations to read) {
X <- next observation
V <- V + X
M <- M + outer(X,X)
N <- N+1
}
Compute covariance matrix from elements of V,M, and N
You just need to refer to the formula defining covariance.
Outlook seems to think all my variables should be upper case.
HTH
Rex
-----Original Message----- From:
r-help-bounces@r-project.org<mailto:r-help-bounces@r-project.org>
[mailto:r-help-bounces@r-project.org.<mailto:r-help-bounces@r-project.org.>..
______________________________________________
R-help@r-project.org<mailto:R-help@r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
message may contain confidential information. If you are not the designated
recipient, please notify the sender immediately, and delete the original and any
copies. Any use of the message by you is prohibited.
message may contain confidential information. If you are not the designated
recipient, please notify the sender immediately, and delete the original and any
copies. Any use of the message by you is prohibited.
[[alternative HTML version deleted]]
Seemingly Similar Threads
- cube root
- Plot colour according to column
- Efficient way to Calculate the squared distances for a set of vectors to a fixed vector
- Creating 250 submatrices from a large single matrix with 2500 variables using loops
- built a lower triangular matrix from dataframe