Bliese, Paul D LTC USAMH
2006-Jun-28 11:31 UTC
[R] Simulate dichotomous correlation matrix
Newsgroup members, Does anyone have a clever way to simulate a correlation matrix such that each column contains dichotomous variables (0,1) and where each column has different prevalence rates. For instance, I would like to simulate the following correlation matrix:> CORMAT[1:4,1:4]PUREPT PTCUT2 PHQCUT2T ALCCUTT2 PUREPT 1.0000000 0.5141552 0.1913139 0.1917923 PTCUT2 0.5141552 1.0000000 0.2913552 0.2204097 PHQCUT2T 0.1913139 0.2913552 1.0000000 0.1803987 ALCCUTT2 0.1917923 0.2204097 0.1803987 1.0000000 Where the prevalence for each variable is:> prevvals=c(0.26,0.10,0.09,0.10)I can use the mvrnorm function in MASS to create a matrix containing random normal variables and dichotomize these variables into 0,1; however, this is a less than ideal solution as my observed correlation matrix is downwardly biased and the amount of the bias is related to the prevalence of each variable. Thanks, Paul D. Bliese Heidelberg, Germany COMM: +49-6221-172626
"Bliese, Paul D LTC USAMH" <paul.bliese at us.army.mil> writes:> Newsgroup members, > > Does anyone have a clever way to simulate a correlation matrix such that > each column contains dichotomous variables (0,1) and where each column > has different prevalence rates. > > For instance, I would like to simulate the following correlation matrix: > > > CORMAT[1:4,1:4] > PUREPT PTCUT2 PHQCUT2T ALCCUTT2 > PUREPT 1.0000000 0.5141552 0.1913139 0.1917923 > PTCUT2 0.5141552 1.0000000 0.2913552 0.2204097 > PHQCUT2T 0.1913139 0.2913552 1.0000000 0.1803987 > ALCCUTT2 0.1917923 0.2204097 0.1803987 1.0000000 > > Where the prevalence for each variable is: > > > prevvals=c(0.26,0.10,0.09,0.10) > > I can use the mvrnorm function in MASS to create a matrix containing > random normal variables and dichotomize these variables into 0,1; > however, this is a less than ideal solution as my observed correlation > matrix is downwardly biased and the amount of the bias is related to the > prevalence of each variable.This is related to the concept of polychoric correlations: These are correlations that could be passed to mvrnorm and dichotomized by thresholds to give data with an observed distribution. The question is if there is a nice way to go from raw correlations and prevalences to polychoric corr. and thresholds. The threshold bit is easy, just take qnorm(), but the other bit might not. You could try looking into the polycor package and see which pieces of information are used there. Alternatively, you could notice that what you really have is the set of all 2x2 marginals of a 2x2x2x2 table (you can reconstruct sum(X), sum(Y) and sum(XY) from the information given) and you could fit a (log-linear) model for all 16 probabilities using the IPS algorithm. -- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907