thr3ads.net - R help - [R] Simulate dichotomous correlation matrix [Jun 2006]

If this information is useful, please help other people find it:
Share via:

Bliese, Paul D LTC USAMH

2006-Jun-28 11:31 UTC

[R] Simulate dichotomous correlation matrix

Newsgroup members,

Does anyone have a clever way to simulate a correlation matrix such that
each column contains dichotomous variables (0,1) and where each column
has different prevalence rates.

For instance, I would like to simulate the following correlation matrix:
> CORMAT[1:4,1:4]          PUREPT    PTCUT2    PHQCUT2T  ALCCUTT2
PUREPT   1.0000000 0.5141552 0.1913139 0.1917923
PTCUT2   0.5141552 1.0000000 0.2913552 0.2204097
PHQCUT2T 0.1913139 0.2913552 1.0000000 0.1803987
ALCCUTT2 0.1917923 0.2204097 0.1803987 1.0000000

Where the prevalence for each variable is:
> prevvals=c(0.26,0.10,0.09,0.10)
I can use the mvrnorm function in MASS to create a matrix containing
random normal variables and dichotomize these variables into 0,1;
however, this is a less than ideal solution as my observed correlation
matrix is downwardly biased and the amount of the bias is related to the
prevalence of each variable.

Thanks,


Paul D. Bliese
Heidelberg, Germany
COMM:  +49-6221-172626

Peter Dalgaard

2006-Jun-28 12:21 UTC

head link

[R] Simulate dichotomous correlation matrix

"Bliese, Paul D LTC USAMH" <paul.bliese at us.army.mil> writes:
> Newsgroup members,
> 
> Does anyone have a clever way to simulate a correlation matrix such that
> each column contains dichotomous variables (0,1) and where each column
> has different prevalence rates.
> 
> For instance, I would like to simulate the following correlation matrix:
> 
> > CORMAT[1:4,1:4]
>           PUREPT    PTCUT2    PHQCUT2T  ALCCUTT2
> PUREPT   1.0000000 0.5141552 0.1913139 0.1917923
> PTCUT2   0.5141552 1.0000000 0.2913552 0.2204097
> PHQCUT2T 0.1913139 0.2913552 1.0000000 0.1803987
> ALCCUTT2 0.1917923 0.2204097 0.1803987 1.0000000
> 
> Where the prevalence for each variable is:
> 
> > prevvals=c(0.26,0.10,0.09,0.10)
> 
> I can use the mvrnorm function in MASS to create a matrix containing
> random normal variables and dichotomize these variables into 0,1;
> however, this is a less than ideal solution as my observed correlation
> matrix is downwardly biased and the amount of the bias is related to the
> prevalence of each variable.
This is related to the concept of polychoric correlations: These are
correlations that could be passed to mvrnorm and dichotomized by
thresholds to give data with an observed distribution. The question is
if there is a nice way to go from raw correlations and prevalences to
polychoric corr. and thresholds. The threshold bit is easy, just take
qnorm(), but the other bit might not. You could try looking into the
polycor package and see which pieces of information are used there. 

Alternatively, you could notice that what you really have is the set
of all 2x2 marginals of a 2x2x2x2 table (you can reconstruct sum(X),
sum(Y) and sum(XY) from the information given) and you could fit a
(log-linear) model for all 16 probabilities using the IPS algorithm. 


-- 
   O__  ---- Peter Dalgaard             ?ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907

R help - Jun 2006 - Simulate dichotomous correlation matrix

[R] Simulate dichotomous correlation matrix

[R] Simulate dichotomous correlation matrix