Shane Phillips
2011-Apr-24 23:00 UTC
[R] Random Normal Variable Correlated to an Existing Binomial Variable
Hi, R-Helpers! I have a dataframe that contains a binomial variable. I need to add another random variable drawn from a normal distribution with a specific mean and standard deviation. This variable also needs to be correlated with the existing binomial variable with a specific correlation (say .75). Any ideas? Thanks! Shane
Petr Savicky
2011-Apr-25 09:58 UTC
[R] Random Normal Variable Correlated to an Existing Binomial Variable
On Sun, Apr 24, 2011 at 07:00:26PM -0400, Shane Phillips wrote:> Hi, R-Helpers! > > I have a dataframe that contains a binomial variable. I need to add another random variable drawn from a normal distribution with a specific mean and standard deviation. This variable also needs to be correlated with the existing binomial variable with a specific correlation (say .75). Any ideas?Hi. If X, Y are dependent random variables and we want to generate y, so that (x, y) is a pair from their joint distribution with known x, then y should be generated from the conditional distribution P(Y|X=x). If the probability P(X=x) is not too small, then this may be done by rejection sampling: Generate pairs (X, Y) until the condition X=x is satisfied and use the corresponding Y. It remains to generate pairs (X, Y), where Y is a normal variable and X a binomial one. The parameters of Y are known, the parameters of X should be chosen somehow and the correlation of X and Y is known. I suggest the following. Compute the distribution of X as a vector of probabilities p_0, ..., p_n (see ?dbinom). Find a nondecreasing function f() from reals to {0, .., n} such that f(Y) has distribution p_0, ..., p_n. The function may be determined by a sequence of cutpoints a_1, ..., a_n defining f(y) as follows y f(y) (-infty, a_1) 0 [a_1, a_2) 1 ... [a_n, infty) n For each i, the cutpoint a_i is the (p_0 + ... + p_{i-1})-quantile of Y (see ?qnorm). See ?cut for computing f(). The pair (f(Y), Y) has the required marginal distributions and, in my opinion, the maximal possible correlation. If this correlation is lower than the requested one, then i think there is no solution. If the correlation of (f(Y), Y) is at least the required one, then use a mixture of the distribution (f(Y), Y) and (X, Y), where X has the required marginal distribution of X, but is generated independently from Y. The mixture parameter may be determined as a solution of an equation with one variable. Hope this helps. Petr Savicky.
Enrico Schumann
2011-Apr-27 07:06 UTC
[R] Random Normal Variable Correlated to an Existing BinomialVariable
Hi, do you know the parameters of the binomial variate? then maybe you could use something like the code below. as Petr pointed out, it is generally not guaranteed that you can create variates with any linear correlation (ie, depending on the parameters of the binomial) n <- 100 # how many variates # your binomial variate (example) size <- 10; prob <- 0.2 vecB <- rbinom(n, size = size, prob = prob) rho <- 0.75 # desired cor m <- 0.5 # mean and sd of Gaussian sig <- 2 rho <- 2*sin(rho*pi/6) # a small correction C <- matrix(rho, nrow = 2, ncol = 2) diag(C) <- 1; C <- chol(C) # (1) transform binomial to Gaussian X1 <- qnorm(pbinom(vecB, size = size, prob = prob)) # (2) create another Gaussian X2 <- rnorm(n) X <- cbind(X1,X2) # (3) induce correlation (does not change X1) X <- X %*% C # (4) make uniforms U <- pnorm(X) # (5) ... and put them into the inverses vecB1 <- qbinom(U[,1],size,prob) vecG <- qnorm(U[,2], mean = m, sd = sig) # check plot(vecB1,vecG) cor(vecB1,vecG) all.equal(vecB1,vecB) sd(vecG) (linear correlation is not affected by linear transformation, so you can enforce exactly your desired mean and standard deviation for the Gaussian by rescaling it in the end) regards, enrico> -----Urspr?ngliche Nachricht----- > Von: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] Im Auftrag von Shane Phillips > Gesendet: Montag, 25. April 2011 01:00 > An: R-help at r-project.org > Betreff: [R] Random Normal Variable Correlated to an Existing > BinomialVariable > > Hi, R-Helpers! > > I have a dataframe that contains a binomial variable. I need > to add another random variable drawn from a normal > distribution with a specific mean and standard deviation. > This variable also needs to be correlated with the existing > binomial variable with a specific correlation (say .75). Any ideas? > > Thanks! > > Shane > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.