thr3ads.net - R help - [R] polychoric correlation: issue with coefficient sign [Jan 2009]

If this information is useful, please help other people find it:
Share via:

Dorothee

2009-Jan-12 23:06 UTC

[R] polychoric correlation: issue with coefficient sign

Hello,

I am running polychoric correlations on a dataset composed of 12 ordinal and
binary variables (N =384), using the polycor package.
One of the association (between 2 dichotomous variables) is very high using
the 2-step estimate (0.933 when polychoric run only between the two
variables; but 0.801 when polychoric run on the 12 variables). The same
correlation run with ML estimate returns a singularity message.

First, I would like to know why the estimations between only the two
dichotomous variables and with all the variables at once (with the 2-step
estimate) returns slightly different results.

Secondly, when i checked back the distribution of these two dichotomous
variables they appear about symmetrically opposed. Therefore, one should
indeed expect a strong association between them, but a negative one, isn't
it? Why does the polychoric correlation returns a positive coefficient? What
does it mean for the rest of the coefficients, should i trust them?

I have to say I'm new to R and not very strong in statistics, I hope I
haven't posted a stupid question...

cheers,
Dorothee
-- 
View this message in context:
http://www.nabble.com/polychoric-correlation%3A-issue-with-coefficient-sign-tp21425977p21425977.html
Sent from the R help mailing list archive at Nabble.com.

Jim Lemon

2009-Jan-13 10:16 UTC

head link

[R] polychoric correlation: issue with coefficient sign

Dorothee wrote:> Hello,
>
> I am running polychoric correlations on a dataset composed of 12 ordinal
and
> binary variables (N =384), using the polycor package.
> One of the association (between 2 dichotomous variables) is very high using
> the 2-step estimate (0.933 when polychoric run only between the two
> variables; but 0.801 when polychoric run on the 12 variables). The same
> correlation run with ML estimate returns a singularity message.
>
> First, I would like to know why the estimations between only the two
> dichotomous variables and with all the variables at once (with the 2-step
> estimate) returns slightly different results.
>
> Secondly, when i checked back the distribution of these two dichotomous
> variables they appear about symmetrically opposed. Therefore, one should
> indeed expect a strong association between them, but a negative one,
isn't
> it? Why does the polychoric correlation returns a positive coefficient?
What
> does it mean for the rest of the coefficients, should i trust them?
>
> I have to say I'm new to R and not very strong in statistics, I hope I
> haven't posted a stupid question...
>   Hi Dorothee,
This may be similar to a problem I encountered with the biserial.cor 
function, where the default specification of which value of the 
dichotomous variable to use as the reference value gave me a correlation 
coefficient with an apparently reversed sign. It might be that your the 
values of your categorial variable are not in the order you assume.

Jim

Dorothee

2009-Jan-14 01:49 UTC

head link

[R] polychoric correlation: issue with coefficient sign

Thank you so much for all your answers! And sorry for being scarce on the
details.
My dataset has 12 variables (6 ordinal coded from 1 to 5, and 6 binary) and
384 cases without missing value. High values mean 'positive' attitude
toward
the object of study.

I probably went too fast in my earlier impression that the variables'
distribution were almost symmetrically opposed. I got confused by the high
frequency of the combination (0, 1), sorry. Here is the crosstab:

Observed counts
   x2
x1    0   1
  0  23   0
  1 334  27

Expected counts
   x2
x1          0         1
  0  21.38281  1.617188
  1 335.61719 25.382812

The actual counts for (0, 0) and (1, 1) being slightly above the expected
counts, I can now understand the positive correlation.  But does the high
polychoric correlation make sense when the variables are so skewed and the
difference between the actual and expected counts of the crosstab is so
small?

Regarding the difference of correlation coefficient between x1 and x2 with
polychor and hetcor:
I used 'hetcor' (polycor package) with 2-step and ML estimations on the
whole dataset. The data were first declared as 'factor' otherwise hetcor
would just compute Pearson correlations.

hc = hetcor(thedata,ML=F, std.err=F)
(correlation x1x2) 0.8013

hc = hetcor(thedata,ML=T, std.err=T)
"Error in solve.default(result$hessian) : 
  Lapack routine dgesv: system is exactly singular".

Using polychor with the 2-step, and ML estimates:

polychor(x1,x2, ML=F, std.err=F) 
[1] 0.9330044

polychor(x1,x2, ML=T, std.err=T) 
"Error in solve.default(result$hessian) : 
  Lapack routine dgesv: system is exactly singular".


Murray, you mentioned that the correlation between my two variables could be
affected by other variables, hence the difference between polychor (on only
two variables) and hetcor (on all the variables).  
I run polychor and hector on created variables (correlated and not
correlated). Although I thought that the heterogeneous correlations were run
only within each pair of variables (therefore, not being affected by other
variables), a third variable correlated with x1 and x2 does slightly affect
the correlation between x1 and x2. Thanks for this suggestion. I need to
look better into the computation of polychoric correlations?!

-- 
View this message in context:
http://www.nabble.com/polychoric-correlation%3A-issue-with-coefficient-sign-tp21425977p21448444.html
Sent from the R help mailing list archive at Nabble.com.

Stas Kolenikov

2009-Jan-14 04:39 UTC

head link

[R] polychoric correlation: issue with coefficient sign

The original Olsson's paper
(http://www.citeulike.org/user/ctacmo/article/553309) did mention that
the greatest biases and numeric problems were encountered when the two
variables had opposite skewness. Your example is even more extreme:
tetrachoric and polychoric correlations do not like zero counts. It
actually means that your data sit on a straight line, but that line
does not pass through the intersection of the thresholds. The nominal
estimate of the correlation should be 1, and what you see should be
insignificantly different from 1. No wonder you get LAPACK errors: at
some point, you had to invert matrix( c(1,1,1,1), 2, 2) or compute its
determinant in the ML computations. My own Stata implementation of
polychoric correlation choked on your data and stopped with an
error... which I should've handled more gracefully :)). The data with
0.5 added produced the same correlation estimate but different
standard errors.

John Fox offered all other feasible explanations, like handling of
missing data in the pairwise and full data set computations. But with
unstable computations you can end just anywhere on the range of
estimates; the standard errors should tell you that your estimate is
quite imprecise.

On 1/12/09, Dorothee <ddurpoix at gmail.com>
wrote:>
>  Hello,
>
>  I am running polychoric correlations on a dataset composed of 12 ordinal
and
>  binary variables (N =384), using the polycor package.
>  One of the association (between 2 dichotomous variables) is very high
using
>  the 2-step estimate (0.933 when polychoric run only between the two
>  variables; but 0.801 when polychoric run on the 12 variables). The same
>  correlation run with ML estimate returns a singularity message.
>
>  First, I would like to know why the estimations between only the two
>  dichotomous variables and with all the variables at once (with the 2-step
>  estimate) returns slightly different results.
>
>  Secondly, when i checked back the distribution of these two dichotomous
>  variables they appear about symmetrically opposed. Therefore, one should
>  indeed expect a strong association between them, but a negative one,
isn't
>  it? Why does the polychoric correlation returns a positive coefficient?
What
>  does it mean for the rest of the coefficients, should i trust them?
>
>  I have to say I'm new to R and not very strong in statistics, I hope I
>  haven't posted a stupid question...
>
-- 
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.

Possibly Parallel Threads

Search for more reasonably related threads

R help - Jan 2009 - polychoric correlation: issue with coefficient sign

[R] polychoric correlation: issue with coefficient sign

[R] polychoric correlation: issue with coefficient sign

[R] polychoric correlation: issue with coefficient sign

[R] polychoric correlation: issue with coefficient sign

Possibly Parallel Threads