I wish to generate a vector of uniformly distributed data with a defined correlation to another vector The only function I have been able to find doing something similar is corgen from the library ecodist. The following code generates data with the desired correlation to the vector x but the resulting vector y is normal and not uniform distributed library(ecodist) x <- runif(10^5) y <- corgen(x=x, r=.5)$y Do anyone know a similar function generating uniform distributed data or a way of transforming y to the desired distribution while keeping the correlation between x and y Kind regards, Soren
David Winsemius
2011-Feb-20 05:16 UTC
[R] Generating uniformly distributed correlated data.
On Feb 19, 2011, at 9:17 PM, S??ren Faurby wrote:> I wish to generate a vector of uniformly distributed data with a > defined correlation to another vector > > The only function I have been able to find doing something similar > is corgen from the library ecodist. > > The following code generates data with the desired correlation to > the vector x but the resulting vector y is normal and not uniform > distributed > > library(ecodist) > x <- runif(10^5) > y <- corgen(x=x, r=.5)$y > > Do anyone know a similar function generating uniform distributed > data or a way of transforming y to the desired distribution while > keeping the correlation between x and yPackage "copula" should support that. (And.) These citations to the archives identified with an RSiteSearch search on terms: uniform multivariate correlation http://finzi.psych.upenn.edu/Rhelp10/2010-November/258834.html http://finzi.psych.upenn.edu/R/Rhelp02/archive/57042.html (Not technically the Archives.) -- David Winsemius, MD West Hartford, CT
Peter Langfelder
2011-Feb-20 05:18 UTC
[R] Generating uniformly distributed correlated data.
On Sat, Feb 19, 2011 at 6:17 PM, S??ren Faurby <soren.faurby at biology.au.dk> wrote:> I wish to generate a vector of uniformly distributed data with a defined > correlation to another vector > > The only function I have been able to find doing something similar is corgen > from the library ecodist. > > The following code generates data with the desired correlation to the vector > x but the resulting vector y is normal and not uniform distributed > > library(ecodist) > x <- runif(10^5) > y <- corgen(x=x, r=.5)$y > > Do anyone know a similar function generating uniform distributed data or a > way of transforming y to the desired distribution while keeping the > correlation between x and yHi Soren, I'm not aware of such functions, but you can try the following code: # generate some x n = 100 x = runif(n) r = 0.5; y = r * scale(x) + sqrt(1-r^2) * scale(runif(n)); cor(x,y) The result is not exactly 0.5 because cor(x, runif(n)) is not exactly zero, but on average you get 0.5 (try to run it 1000 times and calculate the mean and standard error of the obtained values). Note that the correlation will be on average r no matter what the distribution of x is, but the distribution of y will be uniform only if the distribution of x is uniform. Peter> > Kind regards, Soren > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Jorge Ivan Velez
2011-Feb-20 05:21 UTC
[R] Generating uniformly distributed correlated data.
Hi Soren, Take a look at http://tolstoy.newcastle.edu.au/R/help/05/07/7741.html HTH, Jorge On Sat, Feb 19, 2011 at 9:17 PM, Søren Faurby <> wrote:> I wish to generate a vector of uniformly distributed data with a defined > correlation to another vector > > The only function I have been able to find doing something similar is > corgen from the library ecodist. > > The following code generates data with the desired correlation to the > vector x but the resulting vector y is normal and not uniform distributed > > library(ecodist) > x <- runif(10^5) > y <- corgen(x=x, r=.5)$y > > Do anyone know a similar function generating uniform distributed data or a > way of transforming y to the desired distribution while keeping the > correlation between x and y > > Kind regards, Soren > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Enrico Schumann
2011-Feb-20 11:36 UTC
[R] Generating uniformly distributed correlated data.
maybe this helps http://comisef.wikidot.com/tutorial:correlateduniformvariates regards enrico> -----Urspr?ngliche Nachricht----- > Von: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] Im Auftrag von S??ren Faurby > Gesendet: Sonntag, 20. Februar 2011 03:18 > An: r-help at r-project.org > Betreff: [R] Generating uniformly distributed correlated data. > > I wish to generate a vector of uniformly distributed data > with a defined correlation to another vector > > The only function I have been able to find doing something > similar is corgen from the library ecodist. > > The following code generates data with the desired > correlation to the vector x but the resulting vector y is > normal and not uniform distributed > > library(ecodist) > x <- runif(10^5) > y <- corgen(x=x, r=.5)$y > > Do anyone know a similar function generating uniform > distributed data or a way of transforming y to the desired > distribution while keeping the correlation between x and y > > Kind regards, Soren > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Erich Neuwirth
2011-Feb-21 12:03 UTC
[R] Generating uniformly distributed correlated data.
hw<-function(r){ (3-sqrt(1+8*r))/4 } x<-runif(1000) y<-(x+runif(1000,-hw(0.5),hw(0.5))) %% 1 x and y will have correlation 0.5 and will be uniformly distributed on the unit interval. Replacing 0.5 by any nonnegative number r between 0 and 1 will create correlated uniformly distributed random numbers with correlation r. plot(x,y) will show the construction of the joint distribution of these random numbers. The rest is simple algebra. On 2/20/2011 3:17 AM, S?ren Faurby wrote:> I wish to generate a vector of uniformly distributed data with a defined > correlation to another vector > > The only function I have been able to find doing something similar is > corgen from the library ecodist. > > The following code generates data with the desired correlation to the > vector x but the resulting vector y is normal and not uniform distributed > > library(ecodist) > x <- runif(10^5) > y <- corgen(x=x, r=.5)$y > > Do anyone know a similar function generating uniform distributed data or > a way of transforming y to the desired distribution while keeping the > correlation between x and y > > Kind regards, Soren > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Mike Marchywka
2011-Feb-21 15:00 UTC
[R] Generating uniformly distributed correlated data.
----------------------------------------> Date: Mon, 21 Feb 2011 15:53:26 +0100 > From: erich.neuwirth at univie.ac.at > To: marchywka at hotmail.com > CC: soren.faurby at biology.au.dk; r-help at r-project.org > Subject: Re: [R] Generating uniformly distributed correlated data. > > We want to generate a distribution on the unit square with the following > properties > * It is concentrated on a "reasonable" subset of the square, > and the restricted distribution is uniform on this subset. > * Both marginal distributions are uniform on the unit interval. > * All horizontal and all vertical cross sections are sets of lines > segments with the same total length > > If we find a geometric figure with these properties, we have solved the > problem. > > So we define the distribution to be uniform on the following area: > (it is distorted but should give the idea) > > x***/-----------------/***x > |**/-----------------/****| > |*/-----------------/*****| > |/-----------------/******| > |-----------------/******/| > |----------------/******/-| > |---------------/******/--| > |--------------/******/---| > |-------------/******/----| > |------------/******/-----| > |-----------/******/------| > |----------/******/-------| > |---------/******/--------| > |--------/******/---------| > |-------/******/----------| > |------/******/-----------| > |-----/******/------------| > |----/******/-------------| > |---/******/--------------| > |--/******/---------------| > |-/******/----------------| > |/******/-----------------| > |******/-----------------/| > |*****/-----------------/*| > |****/-----------------/**| > x***/-----------------/***x > > There is the same number of stars in each horizontal row and each > vertical column. > > > So we define > g(x1,x2)= 1 abs(x1-x2) <= a or > abs(x1-x2+1) <= a or > abs(x1-x2-1) <= a > 0 elsewhere > > The total area of the shape is 2*a. > The admissible range for a is <0,1/2> > therefore > f(x1,x2)=g(x1,x2)/(2*a) > is a density functions. > This is where simple algebra comes in. > This distribution has > expected value 1/2 and variance 1/12 for both margins > (uniform distribution), and it has > covariance = (1-3*a+2*a2)/12 > and correlation = 1 - 3*a + 2*a2 > > The inverse function of 1 - 3*2 + 2*a2 is > (3-sqrt(1+8*r))/4 > > Therefore we can compute that our distribution with > a=(3-sqrt(1+8*r))/4 > will produce a given r. > > > Ho do we create random numbers from this distribution? > By using conditional densities. > x1 is sampled from the uniform distribution, and for a give x1 > we produce x2 by a uniform distribution on the along the vertical cross > cut of the geometrical shape (which is either 1 or 2 intervals). > And which is most easily implemented by using the modulo operator %%. > > This mechanism is NOT a convolution. Applying module after the addition > makes it a nonconvolution. Adding independent random variables > without doing anything further is a convolution, by applying a trimming > operation, the convolution property gets lost. > >The thing inside the mod allows convolution, as I mentioned the effect of the mod is to move back the pieces that fall outside the desired range and they happen to restore the uniform distribution. I thought my explanation was simple and easy after the fact but not sure it would have motivated the original design too well.> > > > >
Kjetil Halvorsen
2011-Feb-21 15:19 UTC
[R] Generating uniformly distributed correlated data.
one simple idea is to generate correlated normals (vector multivariate normal), and then use the cumulative distribution function F_i of component i such: F_i(X_i), which is uniform. Kjetil (this will not preserve tha value of the correlation coefficient, so you must experiment) On Mon, Feb 21, 2011 at 7:30 PM, Mike Marchywka <marchywka at hotmail.com> wrote:> > > > > > > ---------------------------------------- >> Date: Mon, 21 Feb 2011 15:53:26 +0100 >> From: erich.neuwirth at univie.ac.at >> To: marchywka at hotmail.com >> CC: soren.faurby at biology.au.dk; r-help at r-project.org >> Subject: Re: [R] Generating uniformly distributed correlated data. >> >> We want to generate a distribution on the unit square with the following >> properties >> * It is concentrated on a "reasonable" subset of the square, >> and the restricted distribution is uniform on this subset. >> * Both marginal distributions are uniform on the unit interval. >> * All horizontal and all vertical cross sections are sets of lines >> segments with the same total length >> >> If we find a geometric figure with these properties, we have solved the >> problem. >> >> So we define the distribution to be uniform on the following area: >> (it is distorted but should give the idea) >> >> x***/-----------------/***x >> |**/-----------------/****| >> |*/-----------------/*****| >> |/-----------------/******| >> |-----------------/******/| >> |----------------/******/-| >> |---------------/******/--| >> |--------------/******/---| >> |-------------/******/----| >> |------------/******/-----| >> |-----------/******/------| >> |----------/******/-------| >> |---------/******/--------| >> |--------/******/---------| >> |-------/******/----------| >> |------/******/-----------| >> |-----/******/------------| >> |----/******/-------------| >> |---/******/--------------| >> |--/******/---------------| >> |-/******/----------------| >> |/******/-----------------| >> |******/-----------------/| >> |*****/-----------------/*| >> |****/-----------------/**| >> x***/-----------------/***x >> >> There is the same number of stars in each horizontal row and each >> vertical column. >> >> >> So we define >> g(x1,x2)= 1 abs(x1-x2) <= a or >> abs(x1-x2+1) <= a or >> abs(x1-x2-1) <= a >> 0 elsewhere >> >> The total area of the shape is 2*a. >> The admissible range for a is <0,1/2> >> therefore >> f(x1,x2)=g(x1,x2)/(2*a) >> is a density functions. >> This is where simple algebra comes in. >> This distribution has >> expected value 1/2 and variance 1/12 for both margins >> (uniform distribution), and it has >> covariance = (1-3*a+2*a2)/12 >> and correlation = 1 - 3*a + 2*a2 >> >> The inverse function of 1 - 3*2 + 2*a2 is >> (3-sqrt(1+8*r))/4 >> >> Therefore we can compute that our distribution with >> a=(3-sqrt(1+8*r))/4 >> will produce a given r. >> >> >> Ho do we create random numbers from this distribution? >> By using conditional densities. >> x1 is sampled from the uniform distribution, and for a give x1 >> we produce x2 by a uniform distribution on the along the vertical cross >> cut of the geometrical shape (which is either 1 or 2 intervals). >> And which is most easily implemented by using the modulo operator %%. >> >> This mechanism is NOT a convolution. Applying module after the addition >> makes it a nonconvolution. Adding independent random variables >> without doing anything further is a convolution, by applying a trimming >> operation, the convolution property gets lost. >> >> > The thing inside the mod allows convolution, as I mentioned the effect of > the mod is to move back the pieces that fall outside the desired range > and they happen to restore the uniform distribution. I thought my > explanation was simple and easy after the fact but not sure > it would have motivated the original design too well. > > > >> >> >> >> >> > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Larry Hotchkiss
2011-Feb-21 17:13 UTC
[R] Generating uniformly distributed correlated data.
Hi list: Here is one approach that will generate two uniformly distributed variables with a correlation of 0.5 -- > N <- 1000 > x <- sample(1:9,N,replace=TRUE) > uni <- runif(N) > y <- (uni<0.5)*x+(uni>0.5)*sort(x) > (y.freq <- table(y) ) y 1 2 3 4 5 6 7 8 9 123 114 113 105 126 102 110 95 112 > chisq.test(y.freq) Chi-squared test for given probabilities data: y.freq X-squared = 6.812, df = 8, p-value = 0.557 > cor(x,y) [1] 0.5418051 It seems obvious that the correlation can be adjusted by changing 0.5 to r and 1-r, respectively in the assigment -- y <- (uni<0.5)*x+(uni>0.5)*sort(x) It is worth reflecting about whether this algorithm reflects the real-world process you wish to simulate. Larry Hotchkiss On 2/21/2011 6:00 AM, r-help-request@r-project.org wrote:> Generating uniformly distributed correlated data.Message: 1 Date: Sun, 20 Feb 2011 12:36:43 +0100 From: "Enrico Schumann"<enricoschumann@yahoo.de> To: "=?iso-8859-1?Q?'S=C3=B8ren_Faurby'?=" <soren.faurby@biology.au.dk> Cc:r-help@r-project.org Subject: Re: [R] Generating uniformly distributed correlated data. Message-ID:<7C892818271C43E08DCFD8277CBD01CB@EnricosPC> Content-Type: text/plain; charset="iso-8859-1" maybe this helps http://comisef.wikidot.com/tutorial:correlateduniformvariates regards enrico> > -----Urspr?ngliche Nachricht----- > > Von:r-help-bounces@r-project.org > > [mailto:r-help-bounces@r-project.org] Im Auftrag von S??ren Faurby > > Gesendet: Sonntag, 20. Februar 2011 03:18 > > An:r-help@r-project.org > > Betreff: [R] Generating uniformly distributed correlated data. > > > > I wish to generate a vector of uniformly distributed data > > with a defined correlation to another vector > > > > The only function I have been able to find doing something > > similar is corgen from the library ecodist. > > > > The following code generates data with the desired > > correlation to the vector x but the resulting vector y is > > normal and not uniform distributed > > > > library(ecodist) > > x<- runif(105) > > y<- corgen(x=x, r=.5)$y > > > > Do anyone know a similar function generating uniform > > distributed data or a way of transforming y to the desired > > distribution while keeping the correlation between x and y > > > > Kind regards, Soren > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code.------------------------------ [[alternative HTML version deleted]]