Andras Farkas
2012-Oct-12 18:39 UTC
[R] better example for multivariate data simulation question-please help if you can
Dear?All, ? a few weeks ago I have posted a question on the R help listserv that?some of you have responded to with a great solution, would like to thank you for that? again.?I thought I would reach out to you with the issue I am trying to solve now. I have posted the question a few days ago, but probably it was not?clear enough, so I thought i try it again.?At times I have a multivariate example on my hand with known information of means,?SDs and medians for the variables, and the covariance matrix of those variables.?Occasionally, these parameters have a strong enough relationship between them that a covariance matrix can be established. Please see attached document as an example. Usually when?I?(a medicine people) simulate (and it is not to say that this is the best approach), we use a lognormal distribution to avoid from negative values being generated because physiologic variables almost are never negative (we also really do not know better, unfortunatelly). For the most part I use another software?that is capable of reproducing reasonable means and medians and SD if I enter the covariance matrix, but that is not a free resource (so I can not share the solutions with others), nor does it have the?Sweave option for standard reports like R does that can be distributed for free.?Unfortunately in R I am having a?hard time figuring the solution out.?I have tried to use the multivariate normal distribution function?mvrnorm from the MASS package, or the?Mvnorm from?mvtnorm package, but will get negative values simulated, which I can not afford, also, at times the simulated means, medians and?SDs are quiet different from what I started with (which may be due to the assumption I make?with regards to the distribution of the data). I was wondering if?anyone would be willing to provide?some thoughts on how you think one should try to attempt to?simulate in R a multivariate distribution with covariance matrix (using the?attached data?as an example) that would result in reasonable means, medians and SD as compared to the original values? While to have a better idea about the actual distribution of the data would probably be invaluable to?accurately reproduce the data (and to choose a probability distribution to simulate with), often times in the medical literature we only have information?available?similar to what I have attached, (and we make the assumption of it being?log normally distributed as I have mentioned it above).?I would greatly?appreciate your help, ? Sincerely, ? Andras
R. Michael Weylandt
2012-Oct-13 12:28 UTC
[R] better example for multivariate data simulation question-please help if you can
[Lightly edited for legibility.] On Fri, Oct 12, 2012 at 7:39 PM, Andras Farkas <motyocska at yahoo.com> wrote:> Dear All, > > [A] few weeks ago I have posted a question on the R help listserv that some of you have responded to with a great solution, would like to thank you for that again. I thought I would reach out to you with the issue I am trying to solve now. I have posted the question a few days ago, but probably it was not clear enough, so I thought i try it again. [\n\n] > > At times I have a multivariate example on my hand with known information of means, SDs and medians for the variables, and the covariance matrix of those variables. Occasionally, these parameters have a strong enough relationship between them that a covariance matrix can be established. Please see attached document as an example. [\n\n]> Usually when I (a medicine people) simulate (and it is not to say that this is the best approach), we use a lognormal distribution to avoid from negative values being generated because physiologic variables almost are never negative (we also really do not know better, unfortunatelly). For the most part I use another software that is capable of reproducing reasonable means and medians and SD if I enter the covariance matrix, but that is not a free resource (so I can not share the solutions with others), nor does it have the Sweave option for standard reports like R does that can be distributed for free. Unfortunately in R I am having a hard time figuring the solution out. I have tried to use the multivariate normal distribution function mvrnorm from the MASS package, or the Mvnorm from mvtnorm package, but will get negative values simulated, which I can not afford, also, at times the simulated means, medians and SDs are quiet different from what I started with (which may be due to the assumption I make with regards to the distribution of the data). [\n\n] > > I was wondering if anyone would be willing to provide some thoughts on how you think one should try to attempt to simulate in R a multivariate distribution with covariance matrix (using the attached data as an example) that would result in reasonable means, medians and SD as compared to the original values? While to have a better idea about the actual distribution of the data would probably be invaluable to accurately reproduce the data (and to choose a probability distribution to simulate with), often times in the medical literature we only have information available similar to what I have attached, (and we make the assumption of it being log normally distributed as I have mentioned it above). I would greatly appreciate your help, > > Sincerely, > > Andras > ______________________________________________Hi Andras, It seems that your attachment did not make it through the mail server: you probably need to include it inline as plain text if it's a reasonable size. Anyways, I believe your problem is that mvrnorm() et al generate multivariate _normals_, not multivariate lognormals. Perhaps have a look at these functions: http://rss.acs.unt.edu/Rdoc/library/compositions/html/rlnorm.html You might also think about truncated normals. Cheers, Michael