Richard and Barbara Males
2010-Apr-29 15:17 UTC
[R] generating correlated random variables from different distributions
I need to generate a set of correlated random variables for a Monte Carlo simulation. The solutions I have found (http://www.stat.uiuc.edu/stat428/cndata.html, http://www.sitmo.com/doc/Generating_Correlated_Random_Numbers), using Cholesky Decomposition, seem to work only if the variables come from the same distribution with the same parameters. My situation is that each variable may be described by a different distribution (or different parameters of the same distribution). This approach does not seem to work, see code and results below. Am I missing something here? My math/statistics is not very good, will I need to generate correlated uniform random variables on (0,1) and then use the inverse distributions to get the desired results I am looking for? That is acceptable, but I would prefer to just generate the individual distributions and then correlate them. Any advice much appreciated. Thanks in advance R. Males Cincinnati, Ohio, USA Sample Code: # Testing Correlated Random Variables # reference http://www.sitmo.com/doc/Generating_Correlated_Random_Numbers # reference http://www.stat.uiuc.edu/stat428/cndata.html # create the correlation matrix corMat=matrix(c(1,0.6,0.3,0.6,1,0.5,0.3,0.5,1),3,3) cholMat=chol(corMat) # create the matrix of random variables set.seed(1000) nValues=10000 # generate some random values matNormalAllSame=cbind(rnorm(nValues),rnorm(nValues),rnorm(nValues)) matNormalDifferent=cbind(rnorm(nValues,1,1.5),rnorm(nValues,2,0.5),rnorm(nValues,6,1.8)) matUniformAllSame=cbind(runif(nValues),runif(nValues),runif(nValues)) matUniformDifferent=cbind(runif(nValues,1,1.5),runif(nValues,2,3.5),runif(nValues,6,10.8)) # bind to a matrix print("correlation Matrix") print(corMat) print("Cholesky Decomposition") print (cholMat) # test same normal resultMatNormalAllSame=matNormalAllSame%*%cholMat print("correlation matNormalAllSame") print(cor(resultMatNormalAllSame)) # test different normal resultMatNormalDifferent=matNormalDifferent%*%cholMat print("correlation matNormalDifferent") print(cor(resultMatNormalDifferent)) # test same uniform resultMatUniformAllSame=matUniformAllSame%*%cholMat print("correlation matUniformAllSame") print(cor(resultMatUniformAllSame)) # test different uniform resultMatUniformDifferent=matUniformDifferent%*%cholMat print("correlation matUniformDifferent") print(cor(resultMatUniformDifferent)) and results [1] "correlation Matrix" [,1] [,2] [,3] [1,] 1.0 0.6 0.3 [2,] 0.6 1.0 0.5 [3,] 0.3 0.5 1.0 [1] "Cholesky Decomposition" [,1] [,2] [,3] [1,] 1 0.6 0.3000000 [2,] 0 0.8 0.4000000 [3,] 0 0.0 0.8660254 [1] "correlation matNormalAllSame" <== ok [,1] [,2] [,3] [1,] 1.0000000 0.6036468 0.3013823 [2,] 0.6036468 1.0000000 0.5005440 [3,] 0.3013823 0.5005440 1.0000000 [1] "correlation matNormalDifferent" <== no good [,1] [,2] [,3] [1,] 1.0000000 0.9141472 0.2676162 [2,] 0.9141472 1.0000000 0.2959178 [3,] 0.2676162 0.2959178 1.0000000 [1] "correlation matUniformAllSame" <== ok [,1] [,2] [,3] [1,] 1.0000000 0.5971519 0.2959195 [2,] 0.5971519 1.0000000 0.5011267 [3,] 0.2959195 0.5011267 1.0000000 [1] "correlation matUniformDifferent" <== no good [,1] [,2] [,3] [1,] 1.0000000 0.2312000 0.0351460 [2,] 0.2312000 1.0000000 0.1526293 [3,] 0.0351460 0.1526293 1.0000000>
Greg Snow
2010-Apr-29 16:31 UTC
[R] generating correlated random variables from different distributions
The method you are using (multiply by cholesky) works for normal distributions, but not necessarily for others (if you want different means/sd, then add/multiply after transforming). For other distributions this process can sometimes give the correlation you want, but may change the variable(s) to no longer have the desired distribution. The short answer to your question is "It Depends", the full long answer could fill a full semester course. If you tell us more of your goal we may be able to give a more useful answer. The copula package is one possibility. If you know the conditional distribution of each variable given the others then you can use gibbs sampling. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org 801.408.8111> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Richard and Barbara Males > Sent: Thursday, April 29, 2010 9:18 AM > To: r-help at r-project.org > Subject: [R] generating correlated random variables from different > distributions > > I need to generate a set of correlated random variables for a Monte > Carlo simulation. The solutions I have found > (http://www.stat.uiuc.edu/stat428/cndata.html, > http://www.sitmo.com/doc/Generating_Correlated_Random_Numbers), using > Cholesky Decomposition, seem to work only if the variables come from > the same distribution with the same parameters. My situation is that > each variable may be described by a different distribution (or > different parameters of the same distribution). This approach does > not seem to work, see code and results below. Am I missing something > here? My math/statistics is not very good, will I need to generate > correlated uniform random variables on (0,1) and then use the inverse > distributions to get the desired results I am looking for? That is > acceptable, but I would prefer to just generate the individual > distributions and then correlate them. Any advice much appreciated. > Thanks in advance > > R. Males > Cincinnati, Ohio, USA > > Sample Code: > # Testing Correlated Random Variables > > # reference > http://www.sitmo.com/doc/Generating_Correlated_Random_Numbers > # reference http://www.stat.uiuc.edu/stat428/cndata.html > # create the correlation matrix > corMat=matrix(c(1,0.6,0.3,0.6,1,0.5,0.3,0.5,1),3,3) > cholMat=chol(corMat) > # create the matrix of random variables > set.seed(1000) > nValues=10000 > > # generate some random values > > matNormalAllSame=cbind(rnorm(nValues),rnorm(nValues),rnorm(nValues)) > matNormalDifferent=cbind(rnorm(nValues,1,1.5),rnorm(nValues,2,0.5),rnor > m(nValues,6,1.8)) > matUniformAllSame=cbind(runif(nValues),runif(nValues),runif(nValues)) > matUniformDifferent=cbind(runif(nValues,1,1.5),runif(nValues,2,3.5),run > if(nValues,6,10.8)) > > # bind to a matrix > print("correlation Matrix") > print(corMat) > print("Cholesky Decomposition") > print (cholMat) > > # test same normal > > resultMatNormalAllSame=matNormalAllSame%*%cholMat > print("correlation matNormalAllSame") > print(cor(resultMatNormalAllSame)) > > # test different normal > > resultMatNormalDifferent=matNormalDifferent%*%cholMat > print("correlation matNormalDifferent") > print(cor(resultMatNormalDifferent)) > > # test same uniform > resultMatUniformAllSame=matUniformAllSame%*%cholMat > print("correlation matUniformAllSame") > print(cor(resultMatUniformAllSame)) > > # test different uniform > resultMatUniformDifferent=matUniformDifferent%*%cholMat > print("correlation matUniformDifferent") > print(cor(resultMatUniformDifferent)) > > and results > > [1] "correlation Matrix" > [,1] [,2] [,3] > [1,] 1.0 0.6 0.3 > [2,] 0.6 1.0 0.5 > [3,] 0.3 0.5 1.0 > [1] "Cholesky Decomposition" > [,1] [,2] [,3] > [1,] 1 0.6 0.3000000 > [2,] 0 0.8 0.4000000 > [3,] 0 0.0 0.8660254 > [1] "correlation matNormalAllSame" <== ok > [,1] [,2] [,3] > [1,] 1.0000000 0.6036468 0.3013823 > [2,] 0.6036468 1.0000000 0.5005440 > [3,] 0.3013823 0.5005440 1.0000000 > [1] "correlation matNormalDifferent" <== no good > [,1] [,2] [,3] > [1,] 1.0000000 0.9141472 0.2676162 > [2,] 0.9141472 1.0000000 0.2959178 > [3,] 0.2676162 0.2959178 1.0000000 > [1] "correlation matUniformAllSame" <== ok > [,1] [,2] [,3] > [1,] 1.0000000 0.5971519 0.2959195 > [2,] 0.5971519 1.0000000 0.5011267 > [3,] 0.2959195 0.5011267 1.0000000 > [1] "correlation matUniformDifferent" <== no good > [,1] [,2] [,3] > [1,] 1.0000000 0.2312000 0.0351460 > [2,] 0.2312000 1.0000000 0.1526293 > [3,] 0.0351460 0.1526293 1.0000000 > > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
Richard and Barbara Males
2010-May-02 19:36 UTC
[R] generating correlated random variables from different distributions
Thank you for your reply. The application is a Monte Carlo simulation in environmental planning. Different possible remediation measures have different costs, and produce different results. For example, a $20,000 plan may add 10 acres of wetlands and 12 acres of bird habitat. The desire is to describe the uncertainty in the cost and the outputs (acres of wetlands, acres of bird habitat) by distributions. The cost may be described by a normal distribution, mean $20k, $5k SD, and the 12 acres of birds may be described by a uniform distribution (10 to 14). [These are just examples, not representative of a real problem]. We may know (or think) that wetlands and bird habitat are positively correlated (0.6), and that there is a stronger correlation of both with cost (0.85). So the effort is to generate, through MCS, values at each iteration of cost, acres of wetland, and acres of bird habitat, such that the resultant values give the same correlation, and the values of cost, bird habitat and wetland habitat return the input distributions. The overall desire is compare different remediation measures, taking into account uncertainty in costs and results. One possible approach (although I have not tried it yet, but will do so in the near future) is to generate, for each iteration, three independent (0,1) random variables, correlate them via the Cholesky approach, and use them as input to the inverse normal, inverse uniform, etc. to get the three variables for each iteration. The primary distributions of interest are normal, uniform, triangular, gamma, and arbitrary cdf, so this approach seems plausible in that inverse distributions are readily available. Thanks in advance. Dick Males Cincinnati, OH, USA On Thu, Apr 29, 2010 at 12:31 PM, Greg Snow <Greg.Snow at imail.org> wrote:> The method you are using (multiply by cholesky) works for normal distributions, but not necessarily for others (if you want different means/sd, then add/multiply after transforming). > > For other distributions this process can sometimes give the correlation you want, but may change the variable(s) to no longer have the desired distribution. > > The short answer to your question is "It Depends", the full long answer could fill a full semester course. ?If you tell us more of your goal we may be able to give a more useful answer. ?The copula package is one possibility. ?If you know the conditional distribution of each variable given the others then you can use gibbs sampling. > > -- > Gregory (Greg) L. Snow Ph.D. > Statistical Data Center > Intermountain Healthcare > greg.snow at imail.org > 801.408.8111 > > >> -----Original Message----- >> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- >> project.org] On Behalf Of Richard and Barbara Males >> Sent: Thursday, April 29, 2010 9:18 AM >> To: r-help at r-project.org >> Subject: [R] generating correlated random variables from different >> distributions >> >> I need to generate a set of correlated random variables for a Monte >> Carlo simulation. ? The solutions I have found >> (http://www.stat.uiuc.edu/stat428/cndata.html, >> http://www.sitmo.com/doc/Generating_Correlated_Random_Numbers), using >> Cholesky Decomposition, seem to work only if the variables come from >> the same distribution with the same parameters. ?My situation is that >> each variable may be described by a different distribution (or >> different parameters of the same distribution). ?This approach does >> not seem to work, see code and results below. ?Am I missing something >> here? ?My math/statistics is not very good, will I need to generate >> correlated uniform random variables on (0,1) and then use the inverse >> distributions to get the desired results I am looking for? ?That is >> acceptable, but I would prefer to just generate the individual >> distributions and then correlate them. ?Any advice much appreciated. >> Thanks in advance >> >> R. Males >> Cincinnati, Ohio, USA >> >> Sample Code: >> # Testing Correlated Random Variables >> >> # reference >> http://www.sitmo.com/doc/Generating_Correlated_Random_Numbers >> # reference http://www.stat.uiuc.edu/stat428/cndata.html >> # create the correlation matrix >> corMat=matrix(c(1,0.6,0.3,0.6,1,0.5,0.3,0.5,1),3,3) >> cholMat=chol(corMat) >> # create the matrix of random variables >> set.seed(1000) >> nValues=10000 >> >> # generate some random values >> >> matNormalAllSame=cbind(rnorm(nValues),rnorm(nValues),rnorm(nValues)) >> matNormalDifferent=cbind(rnorm(nValues,1,1.5),rnorm(nValues,2,0.5),rnor >> m(nValues,6,1.8)) >> matUniformAllSame=cbind(runif(nValues),runif(nValues),runif(nValues)) >> matUniformDifferent=cbind(runif(nValues,1,1.5),runif(nValues,2,3.5),run >> if(nValues,6,10.8)) >> >> # bind to a matrix >> print("correlation Matrix") >> print(corMat) >> print("Cholesky Decomposition") >> print (cholMat) >> >> # test same normal >> >> resultMatNormalAllSame=matNormalAllSame%*%cholMat >> print("correlation matNormalAllSame") >> print(cor(resultMatNormalAllSame)) >> >> # test different normal >> >> resultMatNormalDifferent=matNormalDifferent%*%cholMat >> print("correlation matNormalDifferent") >> print(cor(resultMatNormalDifferent)) >> >> # test same uniform >> resultMatUniformAllSame=matUniformAllSame%*%cholMat >> print("correlation matUniformAllSame") >> print(cor(resultMatUniformAllSame)) >> >> # test different uniform >> resultMatUniformDifferent=matUniformDifferent%*%cholMat >> print("correlation matUniformDifferent") >> print(cor(resultMatUniformDifferent)) >> >> and results >> >> [1] "correlation Matrix" >> ? ? ?[,1] [,2] [,3] >> [1,] ?1.0 ?0.6 ?0.3 >> [2,] ?0.6 ?1.0 ?0.5 >> [3,] ?0.3 ?0.5 ?1.0 >> [1] "Cholesky Decomposition" >> ? ? ?[,1] [,2] ? ? ?[,3] >> [1,] ? ?1 ?0.6 0.3000000 >> [2,] ? ?0 ?0.8 0.4000000 >> [3,] ? ?0 ?0.0 0.8660254 >> [1] "correlation matNormalAllSame" <== ok >> ? ? ? ? ? [,1] ? ? ?[,2] ? ? ?[,3] >> [1,] 1.0000000 0.6036468 0.3013823 >> [2,] 0.6036468 1.0000000 0.5005440 >> [3,] 0.3013823 0.5005440 1.0000000 >> [1] "correlation matNormalDifferent" <== no good >> ? ? ? ? ? [,1] ? ? ?[,2] ? ? ?[,3] >> [1,] 1.0000000 0.9141472 0.2676162 >> [2,] 0.9141472 1.0000000 0.2959178 >> [3,] 0.2676162 0.2959178 1.0000000 >> [1] "correlation matUniformAllSame" <== ok >> ? ? ? ? ? [,1] ? ? ?[,2] ? ? ?[,3] >> [1,] 1.0000000 0.5971519 0.2959195 >> [2,] 0.5971519 1.0000000 0.5011267 >> [3,] 0.2959195 0.5011267 1.0000000 >> [1] "correlation matUniformDifferent" <== no good >> ? ? ? ? ? [,1] ? ? ?[,2] ? ? ?[,3] >> [1,] 1.0000000 0.2312000 0.0351460 >> [2,] 0.2312000 1.0000000 0.1526293 >> [3,] 0.0351460 0.1526293 1.0000000 >> > >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting- >> guide.html >> and provide commented, minimal, self-contained, reproducible code. >
Greg Snow
2010-May-04 20:20 UTC
[R] generating correlated random variables from different distributions
Transforming variables will generally change the correlation, so your method will give you correlated variables, but not exactly at the correlations you specify (though with some trial and error you may be able to get close). If you are happy with those results, then your problem is solved, if you need more control over the relationship then something like the Gibbs sample may be what you need. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org 801.408.8111> -----Original Message----- > From: Richard and Barbara Males [mailto:rbmales at gmail.com] > Sent: Sunday, May 02, 2010 1:37 PM > To: Greg Snow > Cc: r-help at r-project.org > Subject: Re: [R] generating correlated random variables from different > distributions > > Thank you for your reply. The application is a Monte Carlo simulation > in environmental planning. Different possible remediation measures > have different costs, and produce different results. For example, a > $20,000 plan may add 10 acres of wetlands and 12 acres of bird > habitat. The desire is to describe the uncertainty in the cost and > the outputs (acres of wetlands, acres of bird habitat) by > distributions. The cost may be described by a normal distribution, > mean $20k, $5k SD, and the 12 acres of birds may be described by a > uniform distribution (10 to 14). [These are just examples, not > representative of a real problem]. We may know (or think) that > wetlands and bird habitat are positively correlated (0.6), and that > there is a stronger correlation of both with cost (0.85). So the > effort is to generate, through MCS, values at each iteration of cost, > acres of wetland, and acres of bird habitat, such that the resultant > values give the same correlation, and the values of cost, bird habitat > and wetland habitat return the input distributions. The overall > desire is compare different remediation measures, taking into account > uncertainty in costs and results. > > One possible approach (although I have not tried it yet, but will do > so in the near future) is to generate, for each iteration, three > independent (0,1) random variables, correlate them via the Cholesky > approach, and use them as input to the inverse normal, inverse > uniform, etc. to get the three variables for each iteration. The > primary distributions of interest are normal, uniform, triangular, > gamma, and arbitrary cdf, so this approach seems plausible in that > inverse distributions are readily available. > > Thanks in advance. > > Dick Males > Cincinnati, OH, USA > > On Thu, Apr 29, 2010 at 12:31 PM, Greg Snow <Greg.Snow at imail.org> > wrote: > > The method you are using (multiply by cholesky) works for normal > distributions, but not necessarily for others (if you want different > means/sd, then add/multiply after transforming). > > > > For other distributions this process can sometimes give the > correlation you want, but may change the variable(s) to no longer have > the desired distribution. > > > > The short answer to your question is "It Depends", the full long > answer could fill a full semester course. ?If you tell us more of your > goal we may be able to give a more useful answer. ?The copula package > is one possibility. ?If you know the conditional distribution of each > variable given the others then you can use gibbs sampling. > > > > -- > > Gregory (Greg) L. Snow Ph.D. > > Statistical Data Center > > Intermountain Healthcare > > greg.snow at imail.org > > 801.408.8111 > > > > > >> -----Original Message----- > >> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > >> project.org] On Behalf Of Richard and Barbara Males > >> Sent: Thursday, April 29, 2010 9:18 AM > >> To: r-help at r-project.org > >> Subject: [R] generating correlated random variables from different > >> distributions > >> > >> I need to generate a set of correlated random variables for a Monte > >> Carlo simulation. ? The solutions I have found > >> (http://www.stat.uiuc.edu/stat428/cndata.html, > >> http://www.sitmo.com/doc/Generating_Correlated_Random_Numbers), > using > >> Cholesky Decomposition, seem to work only if the variables come from > >> the same distribution with the same parameters. ?My situation is > that > >> each variable may be described by a different distribution (or > >> different parameters of the same distribution). ?This approach does > >> not seem to work, see code and results below. ?Am I missing > something > >> here? ?My math/statistics is not very good, will I need to generate > >> correlated uniform random variables on (0,1) and then use the > inverse > >> distributions to get the desired results I am looking for? ?That is > >> acceptable, but I would prefer to just generate the individual > >> distributions and then correlate them. ?Any advice much appreciated. > >> Thanks in advance > >> > >> R. Males > >> Cincinnati, Ohio, USA > >> > >> Sample Code: > >> # Testing Correlated Random Variables > >> > >> # reference > >> http://www.sitmo.com/doc/Generating_Correlated_Random_Numbers > >> # reference http://www.stat.uiuc.edu/stat428/cndata.html > >> # create the correlation matrix > >> corMat=matrix(c(1,0.6,0.3,0.6,1,0.5,0.3,0.5,1),3,3) > >> cholMat=chol(corMat) > >> # create the matrix of random variables > >> set.seed(1000) > >> nValues=10000 > >> > >> # generate some random values > >> > >> matNormalAllSame=cbind(rnorm(nValues),rnorm(nValues),rnorm(nValues)) > >> > matNormalDifferent=cbind(rnorm(nValues,1,1.5),rnorm(nValues,2,0.5),rnor > >> m(nValues,6,1.8)) > >> > matUniformAllSame=cbind(runif(nValues),runif(nValues),runif(nValues)) > >> > matUniformDifferent=cbind(runif(nValues,1,1.5),runif(nValues,2,3.5),run > >> if(nValues,6,10.8)) > >> > >> # bind to a matrix > >> print("correlation Matrix") > >> print(corMat) > >> print("Cholesky Decomposition") > >> print (cholMat) > >> > >> # test same normal > >> > >> resultMatNormalAllSame=matNormalAllSame%*%cholMat > >> print("correlation matNormalAllSame") > >> print(cor(resultMatNormalAllSame)) > >> > >> # test different normal > >> > >> resultMatNormalDifferent=matNormalDifferent%*%cholMat > >> print("correlation matNormalDifferent") > >> print(cor(resultMatNormalDifferent)) > >> > >> # test same uniform > >> resultMatUniformAllSame=matUniformAllSame%*%cholMat > >> print("correlation matUniformAllSame") > >> print(cor(resultMatUniformAllSame)) > >> > >> # test different uniform > >> resultMatUniformDifferent=matUniformDifferent%*%cholMat > >> print("correlation matUniformDifferent") > >> print(cor(resultMatUniformDifferent)) > >> > >> and results > >> > >> [1] "correlation Matrix" > >> ? ? ?[,1] [,2] [,3] > >> [1,] ?1.0 ?0.6 ?0.3 > >> [2,] ?0.6 ?1.0 ?0.5 > >> [3,] ?0.3 ?0.5 ?1.0 > >> [1] "Cholesky Decomposition" > >> ? ? ?[,1] [,2] ? ? ?[,3] > >> [1,] ? ?1 ?0.6 0.3000000 > >> [2,] ? ?0 ?0.8 0.4000000 > >> [3,] ? ?0 ?0.0 0.8660254 > >> [1] "correlation matNormalAllSame" <== ok > >> ? ? ? ? ? [,1] ? ? ?[,2] ? ? ?[,3] > >> [1,] 1.0000000 0.6036468 0.3013823 > >> [2,] 0.6036468 1.0000000 0.5005440 > >> [3,] 0.3013823 0.5005440 1.0000000 > >> [1] "correlation matNormalDifferent" <== no good > >> ? ? ? ? ? [,1] ? ? ?[,2] ? ? ?[,3] > >> [1,] 1.0000000 0.9141472 0.2676162 > >> [2,] 0.9141472 1.0000000 0.2959178 > >> [3,] 0.2676162 0.2959178 1.0000000 > >> [1] "correlation matUniformAllSame" <== ok > >> ? ? ? ? ? [,1] ? ? ?[,2] ? ? ?[,3] > >> [1,] 1.0000000 0.5971519 0.2959195 > >> [2,] 0.5971519 1.0000000 0.5011267 > >> [3,] 0.2959195 0.5011267 1.0000000 > >> [1] "correlation matUniformDifferent" <== no good > >> ? ? ? ? ? [,1] ? ? ?[,2] ? ? ?[,3] > >> [1,] 1.0000000 0.2312000 0.0351460 > >> [2,] 0.2312000 1.0000000 0.1526293 > >> [3,] 0.0351460 0.1526293 1.0000000 > >> > > >> > >> ______________________________________________ > >> R-help at r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide http://www.R-project.org/posting- > >> guide.html > >> and provide commented, minimal, self-contained, reproducible code. > >