thr3ads.net - R help - [R] Problem when creating matrix of values based on covariance matrix [Aug 2012]

If this information is useful, please help other people find it:
Share via:

Boel Brynedal

2012-Aug-11 14:17 UTC

[R] Problem when creating matrix of values based on covariance matrix

Hi,

I want to simulate a data set with similar covariance structure as my
observed data, and have calculated a covariance matrix (dimensions
8368*8368). So far I've tried two approaches to simulating data:
rmvnorm from the mvtnorm package, and by using the Cholesky
decomposition
(http://www.cerebralmastication.com/2010/09/cholesk-post-on-correlated-random-normal-generation/).
The problem is that the resulting covariance structure in my simulated
data is very different from the original supplied covariance vector.
Lets just look at some of the values:
> cov8[1:4,1:4] # covariance of simulated data            X1          X2         X3         X4
X1 34515296.00    99956.69   369538.1  1749086.6
X2    99956.69 34515296.00  2145289.9  -624961.1
X3   369538.08  2145289.93 34515296.0  -163716.5
X4  1749086.62  -624961.09  -163716.5 34515296.0> CEUcovar[1:4,1:4]             [,1]         [,2]          [,3]         [,4]
[1,] 0.1873402987  0.001837229  0.0009009272  0.010324521
[2,] 0.0018372286  0.188665853  0.0124216535 -0.001755035
[3,] 0.0009009272  0.012421654  0.1867835412 -0.000142395
[4,] 0.0103245214 -0.001755035 -0.0001423950  0.192883488

So the distribution of the observed covariance is very narrow compared
to the simulated data.

None of the eigenvalues of the observed covariance matrix are
negative, and it appears to be a positive definite matrix. Here is
what I did to create the simulated data:

Chol <- chol(CEUcovar)
Z <- matrix(rnorm(20351 * 8368), 8368)
X <- t(Chol) %*% Z
sample8 <- data.frame(as.matrix(t(X)))> dim(sample8)[1] 20351  8368
cov8=cov(sample8,method='spearman')

[earlier I've also tried sample8 <- rmvnorm(1000,
mean=rep(0,ncol(CEUcovar)), sigma=CEUcovar, method="eigen") with as
'bad' results, much larger covariance values in the simulated data ]

Any ideas of WHY the simulated data have such a different covariance?
Any experience with similar issues? Would be happy to supply the
covariance matrix if anyone wants to give it a try.
Any suggestions? Anything apparent that I left our or neglected?

Any advice would be highly appreciated.
Best,
Bo

Bert Gunter

2012-Aug-11 14:27 UTC

head link

[R] Problem when creating matrix of values based on covariance matrix

Sampling error?   Do you realize how large a sample size you would
need to precisely estimate an 8000 x 8000 covariance matrix? Probably
exceeds the number of stars in our galaxy...

Numerical issues may also play a role, but I am too ignorant on this
aspect to offer advice.

Finally, this is really not an R question, so you would probably do
better to post on a stats site like stats.stackexchange.com rather
than here.

-- Bert

On Sat, Aug 11, 2012 at 7:17 AM, Boel Brynedal <brynedal at gmail.com>
wrote:> Hi,
>
> I want to simulate a data set with similar covariance structure as my
> observed data, and have calculated a covariance matrix (dimensions
> 8368*8368). So far I've tried two approaches to simulating data:
> rmvnorm from the mvtnorm package, and by using the Cholesky
> decomposition
(http://www.cerebralmastication.com/2010/09/cholesk-post-on-correlated-random-normal-generation/).
> The problem is that the resulting covariance structure in my simulated
> data is very different from the original supplied covariance vector.
> Lets just look at some of the values:
>
>> cov8[1:4,1:4] # covariance of simulated data
>             X1          X2         X3         X4
> X1 34515296.00    99956.69   369538.1  1749086.6
> X2    99956.69 34515296.00  2145289.9  -624961.1
> X3   369538.08  2145289.93 34515296.0  -163716.5
> X4  1749086.62  -624961.09  -163716.5 34515296.0
>> CEUcovar[1:4,1:4]
>              [,1]         [,2]          [,3]         [,4]
> [1,] 0.1873402987  0.001837229  0.0009009272  0.010324521
> [2,] 0.0018372286  0.188665853  0.0124216535 -0.001755035
> [3,] 0.0009009272  0.012421654  0.1867835412 -0.000142395
> [4,] 0.0103245214 -0.001755035 -0.0001423950  0.192883488
>
> So the distribution of the observed covariance is very narrow compared
> to the simulated data.
>
> None of the eigenvalues of the observed covariance matrix are
> negative, and it appears to be a positive definite matrix. Here is
> what I did to create the simulated data:
>
> Chol <- chol(CEUcovar)
> Z <- matrix(rnorm(20351 * 8368), 8368)
> X <- t(Chol) %*% Z
> sample8 <- data.frame(as.matrix(t(X)))
>> dim(sample8)
> [1] 20351  8368
> cov8=cov(sample8,method='spearman')
>
> [earlier I've also tried sample8 <- rmvnorm(1000,
> mean=rep(0,ncol(CEUcovar)), sigma=CEUcovar, method="eigen") with
as
> 'bad' results, much larger covariance values in the simulated data
]
>
> Any ideas of WHY the simulated data have such a different covariance?
> Any experience with similar issues? Would be happy to supply the
> covariance matrix if anyone wants to give it a try.
> Any suggestions? Anything apparent that I left our or neglected?
>
> Any advice would be highly appreciated.
> Best,
> Bo
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

Michael Dewey

2012-Aug-12 13:54 UTC

head link

[R] Problem when creating matrix of values based on covariance matrix

At 15:17 11/08/2012, Boel Brynedal wrote:>Hi,
>
>I want to simulate a data set with similar covariance structure as my
>observed data, and have calculated a covariance matrix (dimensions
>8368*8368). So far I've tried two approaches to simulating data:
>rmvnorm from the mvtnorm package, and by using the Cholesky
>decomposition 
>(http://www.cerebralmastication.com/2010/09/cholesk-post-on-correlated-random-normal-generation/).
>The problem is that the resulting covariance structure in my simulated
>data is very different from the original supplied covariance vector.
It is, of course, not guaranteed to be the same as you are only 
sampling from the distribution. In your example below you draw a 
sample of size 1000 from a 8368 variable distribution so I suspect it 
is almost sure to be different although I am surprised how different. 
What happens if you increase the sample size?
>Lets just look at some of the values:
>
> > cov8[1:4,1:4] # covariance of simulated data
>             X1          X2         X3         X4
>X1 34515296.00    99956.69   369538.1  1749086.6
>X2    99956.69 34515296.00  2145289.9  -624961.1
>X3   369538.08  2145289.93 34515296.0  -163716.5
>X4  1749086.62  -624961.09  -163716.5 34515296.0
> > CEUcovar[1:4,1:4]
>              [,1]         [,2]          [,3]         [,4]
>[1,] 0.1873402987  0.001837229  0.0009009272  0.010324521
>[2,] 0.0018372286  0.188665853  0.0124216535 -0.001755035
>[3,] 0.0009009272  0.012421654  0.1867835412 -0.000142395
>[4,] 0.0103245214 -0.001755035 -0.0001423950  0.192883488
>
>So the distribution of the observed covariance is very narrow compared
>to the simulated data.
>
>None of the eigenvalues of the observed covariance matrix are
>negative, and it appears to be a positive definite matrix. Here is
>what I did to create the simulated data:
>
>Chol <- chol(CEUcovar)
>Z <- matrix(rnorm(20351 * 8368), 8368)
>X <- t(Chol) %*% Z
>sample8 <- data.frame(as.matrix(t(X)))
> > dim(sample8)
>[1] 20351  8368
>cov8=cov(sample8,method='spearman')
>
>[earlier I've also tried sample8 <- rmvnorm(1000,
>mean=rep(0,ncol(CEUcovar)), sigma=CEUcovar, method="eigen") with
as
>'bad' results, much larger covariance values in the simulated data ]
>
>Any ideas of WHY the simulated data have such a different covariance?
>Any experience with similar issues? Would be happy to supply the
>covariance matrix if anyone wants to give it a try.
>Any suggestions? Anything apparent that I left our or neglected?
>
>Any advice would be highly appreciated.
>Best,
>Bo
Michael Dewey
info at aghmed.fsnet.co.uk
http://www.aghmed.fsnet.co.uk/home.html

peter dalgaard

2012-Aug-12 14:16 UTC

head link

[R] Problem when creating matrix of values based on covariance matrix

On Aug 11, 2012, at 16:17 , Boel Brynedal wrote:
> cov8=cov(sample8,method='spearman')
There's your problem. I'm surprised that nobody seems to have picked up
on this, but Spearman covariances are of the ranks, not of the data. Try
method="pearson".

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

Reasonably Related Threads

Search for more apparently analagous threads

R help - Aug 2012 - Problem when creating matrix of values based on covariance matrix

[R] Problem when creating matrix of values based on covariance matrix

[R] Problem when creating matrix of values based on covariance matrix

[R] Problem when creating matrix of values based on covariance matrix

[R] Problem when creating matrix of values based on covariance matrix

Reasonably Related Threads