Andras Farkas
2012-Oct-14 14:52 UTC
[R] multivariate lognormal distribution simulation in compositions
Dear All, thanks to Berend, my question posted yesturday was solved succesfully here: http://r.789695.n4.nabble.com/hep-on-arithmetic-covariance-conversion-to-log-covariance-td4646068.html . I posted the question with the assumption of using the results with rlnorm.rplus() from compositions. Unfortunatelly, I am not getting reasonable enough outcome. Am I applying the results wrongfully? The example: I have known means and cov matrix of a multivariate distribution as follows: sigma <-matrix(c(0.0037,-0.0001,-0.0370,-0.0136,0.0026,-0.0001,0.0001,0.0008,-0.0015,-0.0011,-0.0370,0.0008,1.0466,0.7208,-0.0455,-0.0136,-0.0015,0.7208,1.1717,0.0346,0.0026,-0.0011,-0.0455,0.0346,0.0348),byrow=TRUE,ncol=5) mu <-c(0.094,0.006,1.337,1.046,0.263) sampling form a lognormal distribution would be reasonable in this case as far as I can tell so with the help of Berend, the cov matrix was converted to log-return from an arithmetic return as follows: logreturn <- function(am,asigma) { M <- 1/(1+am) S <- log( diag(M) %*% asigma %*% diag(M) + 1) mu <- log(1+am) - diag(S)/2 list(mean=mu, vcov=S) } z <- logreturn(mu, sigma) logmean <-z$mean cov <-z$vcov following I used : i <-matrix(rlnorm.rplus(5000,logmean,cov),ncol=5), which will give me results much different from my starting point of mu. To try to better explain my goal here: I also happened to know the medians and SDs of this multivariate structure, and they are as follow : median <-c(0.071,0.003,0.78,0.472,0.203) SD <-c(0.06,0.011,1.023,1.082,0.187) next I did a quick experiment that I hope will better explain what I am looking for, using the example of the second parameter from the multivariate structure, but runing it separatelly. The known mean, median and SD of this second parameter is: mean <-0.006 median <-0.003 sd <-<-0.011 next I generate assuming a normal distribution: summary(rnorm(5000,mean,sd)) Min. 1st Qu. Median Mean 3rd Qu. Max. -0.035160 -0.001519 0.005863 0.005856 0.013370 0.046380 sd(rnorm(5000,mean,sd))0.0111094 R did exactly what I asked it to do, but it did not give me back what I need to a reasonable degree, ie my mean and SD are reasonably close to the starting point, but the median is not. So instead I did: meanlog <-log(mean)-0.5*log(1+sd^2/(mean^2)) sdlog <-sqrt(log(1+sd^2/(mean^2))) summary(rlnorm(5000,meanlog,sdlog)) Min. 1st Qu. Median Mean 3rd Qu. Max. 3.052e-05 1.242e-03 2.853e-03 5.907e-03 6.470e-03 1.461e-01 sd(rlnorm(5000,meanlog,sdlog)) 0.009526925 Here using the rlnorm function, R gave me results that are much more reasonable and closer to my starting point, ie: mean, SD, and the median are all look reasonably close to the original values. Also important, there were no negative values generated when I used the rlnorm function, which is essential to what I am trying to do (it also tells me probably that making the assumption of the data being from a normal distribution is wrong). Obviously, I could do the simulations by "breaking up" the multivariate structure and simulate all parameters separatelly, but then the covariance matrix would be totally ignorred. I would apreciate all ideas on how to simulate this multivariate example using the covariance matrix that would produce near identical distribution to the original data set. Perhaps lognormal is not the best assumption to be made here? Thanks for the help, Andras [[alternative HTML version deleted]]