Hi R-fellows, I am trying to simulate a multivariate correlated sample via the Gaussian copula method. One variable is a binary variable, that should be autocorrelated. The autocorrelation should be rho = 0.2. Furthermore, the overall probability to get either outcome of the binary variable should be 0.5. Below you can see the R code (I use for simplicity a diagonal matrix in rmvnorm even if it produces no correlated sample): "sampleCop" <- function(n = 1000, rho = 0.2) { require(splus2R) mvrs <- rmvnorm(n + 1, mean = rep(0, 3), cov = diag(3)) pmvrs <- pnorm(mvrs, 0, 1) var1 <- matrix(0, nrow = n + 1, ncol = 1) var1[1] <- qbinom(pmvrs[1, 1], 1, 0.5) if(var1[1] == 0) var1[nrow(mvrs)] <- -1 for(i in 1:(nrow(pmvrs) - 1)) { if(pmvrs[i + 1, 1] <= rho) var1[i + 1] <- var1[i] else var1[i + 1] <- var1[i] * (-1) } sample <- matrix(0, nrow = n, ncol = 4) sample[, 1] <- var1[1:nrow(var1) - 1] sample[, 2] <- var1[2:nrow(var1)] sample[, 3] <- qnorm(pmvrs[1:nrow(var1) - 1, 2], 0, 1, 1, 0) sample[, 4] <- qnorm(pmvrs[1:nrow(var1) - 1, 3], 0, 1, 1, 0) sample } Now, the code is fine, everything compiles. But when I compute the autocorrelation of the binary variable, it is not 0.2, but 0.6. Does anyone know why this happens? Best Regards Simon [[alternative HTML version deleted]]
I have no idea what your code is doing, nor why you want correlated binary variables. Correlation makes little or no sense in the context of binary random variables --- or more generally in the context of discrete random variables. Be that as it may, it is an easy calculation to show that if X and Y are binary random variables both with success probability of 0.5 then cor(X,Y) = 0.2 if and only if Pr(X=1 | Y = 1) = 0.6. So just generate X and Y using that fact: set.seed(42) X <- numeric(1000) Y <- numeric(1000) for(i in 1:1000) { Y[i] <- rbinom(1,1,0.5) X[i] <- if(Y[i]==1) rbinom(1,1,0.6) else rbinom(1,1,0.4) } # Check: cor(X,Y) # Get 0.2012336 Looks about right. Note that the sample proportions are 0.484 and 0.485 for X and Y respectively. These values do not differ significantly from 0.5. cheers, Rolf Turner On 28/09/12 08:26, Simon Zehnder wrote:> Hi R-fellows, > > I am trying to simulate a multivariate correlated sample via the Gaussian copula method. One variable is a binary variable, that should be autocorrelated. The autocorrelation should be rho = 0.2. Furthermore, the overall probability to get either outcome of the binary variable should be 0.5. > Below you can see the R code (I use for simplicity a diagonal matrix in rmvnorm even if it produces no correlated sample): > > "sampleCop" <- function(n = 1000, rho = 0.2) { > > require(splus2R) > mvrs <- rmvnorm(n + 1, mean = rep(0, 3), cov = diag(3)) > pmvrs <- pnorm(mvrs, 0, 1) > var1 <- matrix(0, nrow = n + 1, ncol = 1) > var1[1] <- qbinom(pmvrs[1, 1], 1, 0.5) > if(var1[1] == 0) var1[nrow(mvrs)] <- -1 > for(i in 1:(nrow(pmvrs) - 1)) { > if(pmvrs[i + 1, 1] <= rho) var1[i + 1] <- var1[i] > else var1[i + 1] <- var1[i] * (-1) > } > sample <- matrix(0, nrow = n, ncol = 4) > sample[, 1] <- var1[1:nrow(var1) - 1] > sample[, 2] <- var1[2:nrow(var1)] > sample[, 3] <- qnorm(pmvrs[1:nrow(var1) - 1, 2], 0, 1, 1, 0) > sample[, 4] <- qnorm(pmvrs[1:nrow(var1) - 1, 3], 0, 1, 1, 0) > > sample > > } > > Now, the code is fine, everything compiles. But when I compute the autocorrelation of the binary variable, it is not 0.2, but 0.6. Does anyone know why this happens?
André Gabriel
2012-Sep-28 12:02 UTC
[R] RES: Generating an autocorrelated binary variable
I think the package BinarySimCLF can help. See http://cran.r-project.org/web/packages/binarySimCLF/binarySimCLF.pdf. Andr? Gabriel. -----Mensagem original----- De: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Em nome de Rolf Turner Enviada em: sexta-feira, 28 de setembro de 2012 00:02 Para: Simon Zehnder Cc: r help Assunto: Re: [R] Generating an autocorrelated binary variable I have no idea what your code is doing, nor why you want correlated binary variables. Correlation makes little or no sense in the context of binary random variables --- or more generally in the context of discrete random variables. Be that as it may, it is an easy calculation to show that if X and Y are binary random variables both with success probability of 0.5 then cor(X,Y) 0.2 if and only if Pr(X=1 | Y = 1) = 0.6. So just generate X and Y using that fact: set.seed(42) X <- numeric(1000) Y <- numeric(1000) for(i in 1:1000) { Y[i] <- rbinom(1,1,0.5) X[i] <- if(Y[i]==1) rbinom(1,1,0.6) else rbinom(1,1,0.4) } # Check: cor(X,Y) # Get 0.2012336 Looks about right. Note that the sample proportions are 0.484 and 0.485 for X and Y respectively. These values do not differ significantly from 0.5. cheers, Rolf Turner On 28/09/12 08:26, Simon Zehnder wrote:> Hi R-fellows, > > I am trying to simulate a multivariate correlated sample via the Gaussiancopula method. One variable is a binary variable, that should be autocorrelated. The autocorrelation should be rho = 0.2. Furthermore, the overall probability to get either outcome of the binary variable should be 0.5.> Below you can see the R code (I use for simplicity a diagonal matrix inrmvnorm even if it produces no correlated sample):> > "sampleCop" <- function(n = 1000, rho = 0.2) { > > require(splus2R) > mvrs <- rmvnorm(n + 1, mean = rep(0, 3), cov = diag(3)) > pmvrs <- pnorm(mvrs, 0, 1) > var1 <- matrix(0, nrow = n + 1, ncol = 1) > var1[1] <- qbinom(pmvrs[1, 1], 1, 0.5) > if(var1[1] == 0) var1[nrow(mvrs)] <- -1 > for(i in 1:(nrow(pmvrs) - 1)) { > if(pmvrs[i + 1, 1] <= rho) var1[i + 1] <- var1[i] > else var1[i + 1] <- var1[i] * (-1) > } > sample <- matrix(0, nrow = n, ncol = 4) > sample[, 1] <- var1[1:nrow(var1) - 1] > sample[, 2] <- var1[2:nrow(var1)] > sample[, 3] <- qnorm(pmvrs[1:nrow(var1) - 1, 2], 0, 1, 1, 0) > sample[, 4] <- qnorm(pmvrs[1:nrow(var1) - 1, 3], 0, 1, 1, 0) > > sample > > } > > Now, the code is fine, everything compiles. But when I compute theautocorrelation of the binary variable, it is not 0.2, but 0.6. Does anyone know why this happens? ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.