Is there an R function for computing a variance-covariance matrix that guarantees that it will have no negative eigenvalues? In my case, there is a *lot* of missing data, especially for a subset of variables. I think my tactic will be to compute cor(x, use="pairwise.complete.obs") and then pre- and post-multiply by a diagonal matrix of standard deviations that were computed based on all non-missing observations. Or maybe cov() would do exactly that with use="pairwise.complete.obs", but that isn't really clear from the docs. Next I would test to see if what I have is positive definite. If the correlation matrix is positive definite, then the covariance matrix will be. Maybe I'll be lucky, but I need a positive-definite matrix, and this method is not guaranteed to produce one. Any ideas? Mike
Peter Langfelder
2011-Jan-31 17:51 UTC
[R] computing var-covar matrix with much missing data
On Mon, Jan 31, 2011 at 9:30 AM, Mike Miller <mbmiller+l at gmail.com> wrote:> Is there an R function for computing a variance-covariance matrix that > guarantees that it will have no negative eigenvalues? ?In my case, there is > a *lot* of missing data, especially for a subset of variables. ?I think my > tactic will be to compute cor(x, use="pairwise.complete.obs") and then pre- > and post-multiply by a diagonal matrix of standard deviations that were > computed based on all non-missing observations. ?Or maybe cov() would do > exactly that with use="pairwise.complete.obs", but that isn't really clear > from the docs. ?Next I would test to see if what I have is positive > definite. ?If the correlation matrix is positive definite, then the > covariance matrix will be. > > Maybe I'll be lucky, but I need a positive-definite matrix, and this method > is not guaranteed to produce one. ?Any ideas?You may get lucky and your matrix (cov or cor) may be positive definite. If not, you may want to think about imputing the missing data, which may be better than trying to massage a covariance matrix into being positive definite. You could also try a hybrid approach of deleting observation with lots of missing data and imputing only the ones that are left over. Peter
One option is the nearPD function in the Matrix package. Other options include robust estimation of the covariance matrix. You should Google this. It's been discussed before. Kevin Wright On Mon, Jan 31, 2011 at 11:30 AM, Mike Miller <mbmiller+l@gmail.com<mbmiller%2Bl@gmail.com>> wrote:> Is there an R function for computing a variance-covariance matrix that > guarantees that it will have no negative eigenvalues? In my case, there is > a *lot* of missing data, especially for a subset of variables. I think my > tactic will be to compute cor(x, use="pairwise.complete.obs") and then pre- > and post-multiply by a diagonal matrix of standard deviations that were > computed based on all non-missing observations. Or maybe cov() would do > exactly that with use="pairwise.complete.obs", but that isn't really clear > from the docs. Next I would test to see if what I have is positive > definite. If the correlation matrix is positive definite, then the > covariance matrix will be. > > Maybe I'll be lucky, but I need a positive-definite matrix, and this method > is not guaranteed to produce one. Any ideas? > > Mike > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]