I hope someone here knows the answer to this since it will save me from delving deep into documentation. Based on 22 pairs of vectors, I have noticed that tetrachoric correlation coefficients in stata are almost uniformly higher than those in R, sometimes dramatically so (TCC=.61 in stata, .51 in R; .51 in stata, .39 in R). Stata's estimate is higher than R's in 20 out of 22 computations, although the estimates always fall within the 95% CI for the TCC calculated by R. Do stata and R calculate TCC in dramatically different ways? Is the handling of missing data perhaps different? Any thoughts? Btw, I am sending this question only to the R-help list. Thanks, Janet -------------------- This email message is for the sole use of the intended recip...{{dropped}}
Dear Janet, Are you using the polychor() function in the polycor package to compute tetrachoric correlations? If so, two methods are provided: A relatively quick method (the default) and ML. The methods implemented are described in the references given in ?polycor. Missing data simply are eliminated from the contingency table from which a tetrachoric correlation is computed. If, however, you're using hetcor() to compute a matrix of tetrachoric correlations, then missing data are handled according to the use argument, which defaults to "complete.obs" and is described in ?hetcor. If you want to know whether polychor() or Stata is right, then one thing that you might do is try them on data for which you know the answer. If you do this, you should of course make sure that both are trying to compute the same thing (e.g., the ML estimate). I hope this helps, John On Fri, 23 Jun 2006 10:42:12 -0700 Janet Rosenbaum <jrosenba at rand.org> wrote:> > I hope someone here knows the answer to this since it will save me > from > delving deep into documentation. > > Based on 22 pairs of vectors, I have noticed that tetrachoric > correlation coefficients in stata are almost uniformly higher than > those > in R, sometimes dramatically so (TCC=.61 in stata, .51 in R; .51 in > stata, .39 in R). Stata's estimate is higher than R's in 20 out of > 22 > computations, although the estimates always fall within the 95% CI > for > the TCC calculated by R. > > Do stata and R calculate TCC in dramatically different ways? Is the > handling of missing data perhaps different? Any thoughts? > > Btw, I am sending this question only to the R-help list. > > Thanks, > > Janet > > > -------------------- > > This email message is for the sole use of the intended\ > ...{{dropped}}
Janet Rosenbaum <jrosenba at rand.org> writes:> I hope someone here knows the answer to this since it will save me from > delving deep into documentation. > > Based on 22 pairs of vectors, I have noticed that tetrachoric > correlation coefficients in stata are almost uniformly higher than those > in R, sometimes dramatically so (TCC=.61 in stata, .51 in R; .51 in > stata, .39 in R). Stata's estimate is higher than R's in 20 out of 22 > computations, although the estimates always fall within the 95% CI for > the TCC calculated by R. > > Do stata and R calculate TCC in dramatically different ways? Is the > handling of missing data perhaps different? Any thoughts? > > Btw, I am sending this question only to the R-help list.A bit more information seems necessary: - tetrachoric correlations depend on 4 numbers, so you should be able to give a direct example - you're not telling us how you calculate the TCC in R. This is not obvious (package polycor?). -- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907