deaRs, I want to build a covariance matrix out of the data from a binary file, that I can read in chunk by chunk, with each chunk containing a single observation vector X. I wonder how to do that most efficiently, avoiding the calculation of the full symmetric matrices XX'. The trivial non-optimal approach boils down to something like: Q <- matrix(rnorm(100000),ncol=200) M <- matrix(0,ncol=200,nrow=200) for (i in 1:nrow(Q)) M <- M + tcrossprod(Q[i,]) I would appreciate pointers to help me fill this lacuna in my R skills :) Cheers, Tsjerk -- Tsjerk A. Wassenaar, Ph.D. post-doctoral researcher Molecular Dynamics Group * Groningen Institute for Biomolecular Research and Biotechnology * Zernike Institute for Advanced Materials University of Groningen The Netherlands
rex.dwyer at syngenta.com
2011-Mar-14 16:43 UTC
[R] *Building* a covariance matrix efficiently
Tjerk, This is just a pseudo code outline of what you need to do: M = matrix(0, number of variables, number of variables) V = rep(0, number of variables) N = 0 While (more observations to read) { X <- next observation V <- V + X M <- M + outer(X,X) N <- N+1 } Compute covariance matrix from elements of V,M, and N You just need to refer to the formula defining covariance. Outlook seems to think all my variables should be upper case. HTH Rex -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Tsjerk Wassenaar Sent: Monday, March 14, 2011 10:14 AM To: R-help Subject: [R] *Building* a covariance matrix efficiently deaRs, I want to build a covariance matrix out of the data from a binary file, that I can read in chunk by chunk, with each chunk containing a single observation vector X. I wonder how to do that most efficiently, avoiding the calculation of the full symmetric matrices XX'. The trivial non-optimal approach boils down to something like: Q <- matrix(rnorm(100000),ncol=200) M <- matrix(0,ncol=200,nrow=200) for (i in 1:nrow(Q)) M <- M + tcrossprod(Q[i,]) I would appreciate pointers to help me fill this lacuna in my R skills :) Cheers, Tsjerk -- Tsjerk A. Wassenaar, Ph.D. post-doctoral researcher Molecular Dynamics Group * Groningen Institute for Biomolecular Research and Biotechnology * Zernike Institute for Advanced Materials University of Groningen The Netherlands ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. message may contain confidential information. If you are not the designated recipient, please notify the sender immediately, and delete the original and any copies. Any use of the message by you is prohibited.
rex.dwyer at syngenta.com
2011-Mar-14 18:57 UTC
[R] *Building* a covariance matrix efficiently
Tsjerk, It seems to me that memory and not time is your big efficiency problem, and I've showed you how to avoid storing your entire input. If you want to avoid doing each multiplication twice, you can replace the "outer" with a function that computes each product only once and accumulate sums of those products. Iii = matrix(c(rep(1:200,200),rep(1:200,each=200)), ncol=2) Iii = iii[ iii[,1]<=iii[,2]] and at each step V2 = v2+X[iii[,1]] * X[iii[,2]] instead of M. I would imagine that the internal cov does this anyway, as you are not the first person to notice this symmetry, so I'm not sure of the point of this exercise. PS: I actually know how to spell and pronounce Tsjerk, but Tsj is not a very familiar pattern for my fingers. From: Tsjerk Wassenaar [mailto:tsjerkw@gmail.com] Sent: Monday, March 14, 2011 1:41 PM To: Dwyer Rex USRE Subject: Re: RE: [R] *Building* a covariance matrix efficiently Hi Rex, Thanks for the reply. But it doesn't solve the issues of redundant calculations due to symmetry, both in the outer product and in the summation. Cheers, Tsjerk (correct spelling, really) On Mar 14, 2011 5:44 PM, <rex.dwyer@syngenta.com<mailto:rex.dwyer@syngenta.com>> wrote: Tjerk, This is just a pseudo code outline of what you need to do: M = matrix(0, number of variables, number of variables) V = rep(0, number of variables) N = 0 While (more observations to read) { X <- next observation V <- V + X M <- M + outer(X,X) N <- N+1 } Compute covariance matrix from elements of V,M, and N You just need to refer to the formula defining covariance. Outlook seems to think all my variables should be upper case. HTH Rex -----Original Message----- From: r-help-bounces@r-project.org<mailto:r-help-bounces@r-project.org> [mailto:r-help-bounces@r-project.org.<mailto:r-help-bounces@r-project.org.>.. ______________________________________________ R-help@r-project.org<mailto:R-help@r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. message may contain confidential information. If you are not the designated recipient, please notify the sender immediately, and delete the original and any copies. Any use of the message by you is prohibited. message may contain confidential information. If you are not the designated recipient, please notify the sender immediately, and delete the original and any copies. Any use of the message by you is prohibited. [[alternative HTML version deleted]]
Reasonably Related Threads
- cube root
- Plot colour according to column
- Efficient way to Calculate the squared distances for a set of vectors to a fixed vector
- Creating 250 submatrices from a large single matrix with 2500 variables using loops
- built a lower triangular matrix from dataframe