Andras Farkas
2012-Oct-14 14:52 UTC
[R] multivariate lognormal distribution simulation in compositions
Dear All,
thanks to Berend, my question posted yesturday was solved succesfully
here: http://r.789695.n4.nabble.com/hep-on-arithmetic-covariance-conversion-to-log-covariance-td4646068.html .
I posted the question with the assumption of using the results with
rlnorm.rplus() from compositions. Unfortunatelly, I am not getting reasonable
enough outcome. Am I applying the results wrongfully? The example: I have known
means and cov matrix of a multivariate distribution as follows:
sigma
<-matrix(c(0.0037,-0.0001,-0.0370,-0.0136,0.0026,-0.0001,0.0001,0.0008,-0.0015,-0.0011,-0.0370,0.0008,1.0466,0.7208,-0.0455,-0.0136,-0.0015,0.7208,1.1717,0.0346,0.0026,-0.0011,-0.0455,0.0346,0.0348),byrow=TRUE,ncol=5)
mu <-c(0.094,0.006,1.337,1.046,0.263)
sampling form a lognormal distribution would be reasonable in this case as
far as I can tell so with the help of Berend, the cov matrix was converted to
log-return from an arithmetic return as follows:
logreturn <- function(am,asigma) {
M <- 1/(1+am)
S <- log( diag(M) %*% asigma %*% diag(M) + 1)
mu <- log(1+am) - diag(S)/2
list(mean=mu, vcov=S)
}
z <- logreturn(mu, sigma)
logmean <-z$mean
cov <-z$vcov
following I used :
i <-matrix(rlnorm.rplus(5000,logmean,cov),ncol=5), which will give me
results much different from my starting point of mu.
To try to better explain my goal here: I also happened to know the medians and
SDs of this multivariate structure, and they are as follow :
median <-c(0.071,0.003,0.78,0.472,0.203)
SD <-c(0.06,0.011,1.023,1.082,0.187)
next I did a quick experiment that I hope will better explain what I am looking
for, using the example of the second parameter from the multivariate structure,
but runing it separatelly. The known mean, median and SD of this second
parameter is:
mean <-0.006
median <-0.003
sd <-<-0.011
next I generate assuming a normal distribution:
summary(rnorm(5000,mean,sd))
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.035160 -0.001519 0.005863 0.005856 0.013370 0.046380
sd(rnorm(5000,mean,sd))0.0111094
R did exactly what I asked it to do, but it did not give me back what I need to
a reasonable degree, ie my mean and SD are reasonably close to the starting
point, but the median is not. So instead I did:
meanlog <-log(mean)-0.5*log(1+sd^2/(mean^2))
sdlog <-sqrt(log(1+sd^2/(mean^2)))
summary(rlnorm(5000,meanlog,sdlog))
Min. 1st Qu. Median Mean 3rd Qu. Max.
3.052e-05 1.242e-03 2.853e-03 5.907e-03 6.470e-03 1.461e-01
sd(rlnorm(5000,meanlog,sdlog))
0.009526925
Here using the rlnorm function, R gave me results that are much more reasonable
and closer to my starting point, ie: mean, SD, and the median are all look
reasonably close to the original values. Also important, there were no negative
values generated when I used the rlnorm function, which is essential to what I
am trying to do (it also tells me probably that making the assumption of the
data being from a normal distribution is wrong).
Obviously, I could do the simulations by "breaking up" the
multivariate structure and simulate all parameters separatelly, but then the
covariance matrix would be totally ignorred. I would apreciate all ideas on how
to simulate this multivariate example using the covariance matrix that would
produce near identical distribution to the original data set. Perhaps lognormal
is not the best assumption to be made here?
Thanks for the help,
Andras
[[alternative HTML version deleted]]
