Bliese, Paul D LTC USAMH
2005-Sep-27 06:57 UTC
[R] Simulate phi-coefficient (correlation between dichotomous vars)
Newsgroup members, I appreciate the help on this topic. David Duffy provided a solution (below) that was quite helpful, and came close to what I needed. It did a great job creating two vectors of dichotomous variables with a known correlation (what I referred to as a phi-coefficient). My situation is a bit more complicated and I'm not sure it is easily solved. The problem is that I must assume one of the vectors is constant and generate one or more vectors that covary with the constant vector. In a continuous example I could use the following code that I got from the S-PLUS newsgroup year ago: sample.cor<-function (x, rho) { y <- (rho * (x - mean(x)))/sqrt(var(x)) + sqrt(1 - rho^2) * rnorm(length(x)) cat("Sample corr = ", cor(x, y), "\n") return(y) } X<-rnorm(100) #a constant vector Y1<-sample.cor(X,.30) # a new vector that correlates with X .30 Y2<-sample.cor(X,.45) # a second vector that correlates with X .45 I can, of course, have X be a vector of zeros and ones, and I can dichotomize Y1 and Y2, but the program always returns a phi-coefficient correlation lower than the continuous correlation. Mathematically, I guess this is expected because the phi-coefficient is partially a function of the percentage of positive responses. This, in turn, explains Pearson's (1900) interest in the whole area of tetrachoric correlations -- a tetrachoric correlation being the Pearson product moment correlation that would have been observed had two dichotomously scored variables been measured on a continuous scale (Pearson, 1900). Appreciate any additional input or possible solutions. Paul -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of David Duffy Sent: Monday, September 12, 2005 1:34 AM To: r-help at stat.math.ethz.ch Subject: [R] Simulate phi-coefficient> From: "Bliese, Paul D LTC USAMH" <paul.bliese at us.army.mil> > > Given a sample of zeros and ones, for example: > > VECTOR1<-rep(c(1,0),c(15,10)) > How would I create a new sample (VECTOR2) also containing zeros and > ones, in which the phi-coefficient between the two sample vectors was > drawn from a population with a known phi-coefficient value? > > I know there are ways to do this with normally distributed numbers(for> example the mvrnorm function in MASS), but am stumped when dealingwith> dichotomous variables. > > PaulOne way is to sample from the 2x2 table with the specified means and pearson correlation (phi): for a fourfold table, a b c d with marginal proportions p1 and p2 cov <- phi * sqrt(p1*(1-p1)*p2*(1-p2)) a <- p1*p2 + cov b <- p1*(1-p2) - cov c <- (1-p1)*p2 - cov d <- (1-p1)*(1-p2) + cov expand.grid(0:1,0:1)[sample(1:4, size=25, replace=TRUE, prob=c(a,b,c,d)),] David. | David Duffy (MBBS PhD) ,-_|\ | email: davidD at qimr.edu.au ph: INT+61+7+3362-0217 fax: -0101 / * | Epidemiology Unit, Queensland Institute of Medical Research \_,-._/ | 300 Herston Rd, Brisbane, Queensland 4029, Australia GPG 4D0B994A v ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
David Duffy
2005-Sep-27 21:38 UTC
[R] Simulate phi-coefficient (correlation between dichotomous vars)
On Tue, 27 Sep 2005, Bliese, Paul D LTC USAMH wrote:> My situation is a bit more complicated and I'm not sure it is easily > solved. The problem is that I must assume one of the vectors is > constant and generate one or more vectors that covary with the constant > vector. > >> One way is to sample from the 2x2 table with the specified means and >> pearson correlation (phi): >> >> for a fourfold table, a b >> c d >> with marginal proportions p1 and p2 >> cov <- phi * sqrt(p1*(1-p1)*p2*(1-p2)) >> a <- p1*p2 + cov >> b <- p1*(1-p2) - cov >> c <- (1-p1)*p2 - cov >> d <- (1-p1)*(1-p2) + cov >Calculate the conditional probabilities from the above P(X2=1|X1=1)= a/(a+b) = p2 + cov/p1 P(X2=1|X1=0)= c/(c+d) = p2 - cov/(1-p1) condsim <- function(X, phi, p2, p1=NULL) { if (!all(X %in% c(0,1))) stop("expecting 1's and 0's") if (is.null(p1)) p1 <- mean(X) covar <- phi * sqrt(p1*(1-p1)*p2*(1-p2)) if (covar>0 && covar>(min(p1,p2)-p1*p2)) { warning("Specified correlation too large for given marginal proportions") covar <- min(p1,p2)-p1*p2 }else if (covar<0 && covar < -min(p1*p2,(1-p1)*(1-p2))) { warning("Specified correlation too small for given marginal proportions") covar <- -min(p1*p2,(1-p1)*(1-p2)) } Y <- X i1 <- X==1 i0 <- X==0 Y[i1] <- rbinom(sum(i1),1, p2 + covar/p1) Y[i0] <- rbinom(sum(i0),1, p2 - covar/(1-p1)) data.frame(X,Y) } David Duffy.