On Sat, 2007-02-24 at 10:30 -0800, francogrex wrote:> Just a general question concerning the woolf test (package vcd), when we
have
> stratified data (2x2 tables) and when the p.value of the woolf-test is
> below 0.05 then we assume that there is a heterogeneity and a common odds
> ratio cannot be computed?
> Does this mean that we have to try to add more stratification variables
> (stratify more) to make the woolf-test p.value insignificant?
> Also in the same package there is an oddsratio function that seems to be
> calculating the odds ratio even when the values are non-integers and even
> when one cell has a null value (in contrast to the mantel_hanszel or the
> fisher exact test that do not admit zero values or non-integers) does
anyone
> know what's the difference and how to optain a p.value on that odds
ratio?
> Thanks
I'll defer comments on your 'vcd' function specific questions to the
package authors.
However, from a general perspective the Woolf Test, or for that matter,
the Breslow-Day Test (+/- Tarone Correction), are tests that are akin to
running tests for the homogeneity of variances prior to performing an
ANOVA. You are testing to see if the underlying assumptions of the
latter test have been violated.
If the tests are "significant", it is not an _absolute_
contraindication
to continuing, but the result should give you pause to consider what
else is going on in your data.
These tests are commonly applied, for example, in multi-center clinical
trials to test for the 'poolability' of the data from the sites into a
common data set for analysis.
In the case of the ANOVA, having markedly different variances suggests
that the differences in the means may not be the most 'interesting' of
findings in your data.
Similarly, with stratified 2x2 tables, you can perform a CMH test using
mantelhaen.test() and you will get a common odds ratio and CI's.
However, the common odds ratio may be at best, less interesting than
other underlying characteristics and at worst perhaps, misleading.
For an example, let's use the UCBAdmissions dataset and some of the
examples in ?mantelhaen.test. For specific information on the dataset,
see ?UCBAdmissions. In fact, reading the Details on that help page and
running the examples, especially the series of 6 mosaicplot()s, should
provide additional insight into the findings I present below.
So, let's start:
First, let's charge ahead and run the CMH test on the data:
> mantelhaen.test(UCBAdmissions)
Mantel-Haenszel chi-squared test with continuity correction
data: UCBAdmissions
Mantel-Haenszel X-squared = 1.4269, df = 1, p-value = 0.2323
alternative hypothesis: true common odds ratio is not equal to 1
95 percent confidence interval:
0.7719074 1.0603298
sample estimates:
common odds ratio
0.9046968
So, we can see that the resultant test statistic is not significant, yet
we do get a common OR and CI's.
Now, let's do the Woolf test, using the code provided
in ?mantelhaen.test:
> woolf(UCBAdmissions)
[1] 0.003427200
This suggests that the assumption of common odds ratios in the CMH test
is violated and indeed if we run:
> apply(UCBAdmissions, 3, function(x) (x[1,1]*x[2,2])/(x[1,2]*x[2,1]))
A B C D E F
0.3492120 0.8025007 1.1330596 0.9212838 1.2216312 0.8278727
We can see that the OR's range from 0.3492120 to 1.2216312.
So, let's take a look at the 'raw' data and do some further testing.
A
while ago, I wrote (and believe posted) code to take a data.frame.table
object and convert it back to it's raw structure as a non-summarized
data frame. Let's do that with UCBAdmissions, since UCBAdmissions is a
summarized 3D table:
> str(UCBAdmissions)
table [1:2, 1:2, 1:6] 512 313 89 19 353 207 17 8 120 205 ...
- attr(*, "dimnames")=List of 3
..$ Admit : chr [1:2] "Admitted" "Rejected"
..$ Gender: chr [1:2] "Male" "Female"
..$ Dept : chr [1:6] "A" "B" "C" "D"
...
Here is the code:
expand.dft <- function(x, na.strings = "NA", as.is = FALSE, dec =
".")
{
DF <- sapply(1:nrow(x), function(i) x[rep(i, each = x$Freq[i]), ],
simplify = FALSE)
DF <- subset(do.call("rbind", DF), select = -Freq)
for (i in 1:ncol(DF))
{
DF[[i]] <- type.convert(as.character(DF[[i]]),
na.strings = na.strings,
as.is = as.is, dec = dec)
}
DF
}
So:
DF <- expand.dft(as.data.frame.table(UCBAdmissions))
> str(DF)
'data.frame': 4526 obs. of 3 variables:
$ Admit : Factor w/ 2 levels "Admitted","Rejected": 1 1 1 1
1 1 1 1 1 1 ...
$ Gender: Factor w/ 2 levels "Female","Male": 2 2 2 2 2 2 2
2 2 2 ...
$ Dept : Factor w/ 6 levels
"A","B","C","D",..: 1 1 1 1 1 1 1 1 1 1
...
Now, let's create logistic regression models on the raw data, first
using just the two covariates of Gender and Dept:
fit <- glm(Admit ~ Gender + Dept, family = binomial, data = DF)
> summary(fit)
Call:
glm(formula = Admit ~ Gender + Dept, family = binomial, data = DF)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.3613 -0.9588 0.3741 0.9306 1.4773
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.68192 0.09911 -6.880 5.97e-12 ***
GenderMale 0.09987 0.08085 1.235 0.217
DeptB 0.04340 0.10984 0.395 0.693
DeptC 1.26260 0.10663 11.841 < 2e-16 ***
DeptD 1.29461 0.10582 12.234 < 2e-16 ***
DeptE 1.73931 0.12611 13.792 < 2e-16 ***
DeptF 3.30648 0.16998 19.452 < 2e-16 ***
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 6044.3 on 4525 degrees of freedom
Residual deviance: 5187.5 on 4519 degrees of freedom
AIC: 5201.5
Number of Fisher Scoring iterations: 5
Now, for comparison, let's create the same model, but this time
including interaction terms for Gender*Dept:
fitIT <- glm(Admit ~ Gender * Dept, family = binomial, data = DF)
> summary(fitIT)
Call:
glm(formula = Admit ~ Gender * Dept, family = binomial, data = DF)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.3793 -0.9768 0.3820 0.9127 1.8642
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.5442 0.2527 -6.110 9.94e-10 ***
GenderMale 1.0521 0.2627 4.005 6.21e-05 ***
DeptB 0.7904 0.4977 1.588 0.11224
DeptC 2.2046 0.2672 8.252 < 2e-16 ***
DeptD 2.1662 0.2750 7.878 3.32e-15 ***
DeptE 2.7013 0.2790 9.682 < 2e-16 ***
DeptF 4.1250 0.3297 12.512 < 2e-16 ***
GenderMale:DeptB -0.8321 0.5104 -1.630 0.10306
GenderMale:DeptC -1.1770 0.2996 -3.929 8.53e-05 ***
GenderMale:DeptD -0.9701 0.3026 -3.206 0.00135 **
GenderMale:DeptE -1.2523 0.3303 -3.791 0.00015 ***
GenderMale:DeptF -0.8632 0.4027 -2.144 0.03206 *
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 6044.3 on 4525 degrees of freedom
Residual deviance: 5167.3 on 4514 degrees of freedom
AIC: 5191.3
Number of Fisher Scoring iterations: 5
Two key things to notice:
1. Note that the coefficients of the main effects of Gender and Dept
change materially from the first model to the second, demonstrating the
impact of the interaction terms.
2. The interaction terms themselves are generally quite significant.
Further, running:
> anova(fit, fitIT, test = "Chisq")
Analysis of Deviance Table
Model 1: Admit ~ Gender + Dept
Model 2: Admit ~ Gender * Dept
Resid. Df Resid. Dev Df Deviance P(>|Chi|)
1 4519 5187.5
2 4514 5167.3 5 20.2 0.001144
pretty well confirms that the impact of including the interaction terms
in the model is quite significant and that there is confounding going on
between Gender and Department, which is arguably, more interesting than
the isolated impact of Gender on Admissions.
You might want to look further at the help page for the dataset as I
noted above, to gain further graphical insight into the data.
To your query, the goal is not to 'make' the Woolf test non-significant,
but to better understand why it is significant to start with and dig
deeper into your data to gain further insight.
HTH,
Marc Schwartz