Hello, I think I am beginning to understand what is involved in the so-called "Type-I, II, ..." ANOVAS (thanks to all the replies I got for yesterday's post). I have a question that will help me (and others?) understand it better (or remove a misunderstanding): I know that ANOVA is really a special case of regression where the predictor variable is categorical. I know that there can be various types of regression models commonly called "stepwise, add, remove...", where one controls which predictors are added to the regression model and in what order. Is this what the various "Types" of ANOVA correspond to? I mean that I think of my ANOVA as a regression model (a General Linear Model) and the various ways of entering predictors as the various ANOVA "Types". Hope that makes sense... Ravi Kulkarni -- View this message in context: http://n4.nabble.com/ANOVA-Types-and-Regression-models-the-same-tp1574654p1574654.html Sent from the R help mailing list archive at Nabble.com.
If memory serves, Bill Venables said in the paper cited several times here, that there's only one type of sums of squares. So there's only one type of "ANOVA" (if I understand what you mean by ANOVA). Just forget about the different types of tests, and simply ask yourself this (hopefully simple and straight forward) question: Which pair of models when compared will answer the question you have at hand? It's not sufficient to just ask: "Is factor X significant?" It depends on what else is in the model you're entertaining. I think it's high time to retire the archaic concept of the different types of sums of squares. IMHO they are the biggest red herrings in Statistics. Best, Andy From: Ravi Kulkarni> > Hello, > I think I am beginning to understand what is involved in > the so-called > "Type-I, II, ..." ANOVAS (thanks to all the replies I got for > yesterday's > post). I have a question that will help me (and others?) understand it > better (or remove a misunderstanding): > I know that ANOVA is really a special case of regression where the > predictor variable is categorical. I know that there can be > various types of > regression models commonly called "stepwise, add, remove...", > where one > controls which predictors are added to the regression model > and in what > order. > Is this what the various "Types" of ANOVA correspond to? I > mean that I > think of my ANOVA as a regression model (a General Linear > Model) and the > various ways of entering predictors as the various ANOVA "Types". > Hope that makes sense... > > Ravi Kulkarni > -- > View this message in context: > http://n4.nabble.com/ANOVA-Types-and-Regression-models-the-sam > e-tp1574654p1574654.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Notice: This e-mail message, together with any attachme...{{dropped:10}}
Frank E Harrell Jr
2010-Mar-02 13:46 UTC
[R] ANOVA "Types" and Regression models: the same?
Ravi Kulkarni wrote:> Hello, > I think I am beginning to understand what is involved in the so-called > "Type-I, II, ..." ANOVAS (thanks to all the replies I got for yesterday's > post). I have a question that will help me (and others?) understand it > better (or remove a misunderstanding): > I know that ANOVA is really a special case of regression where the > predictor variable is categorical. I know that there can be various types of > regression models commonly called "stepwise, add, remove...", where one > controls which predictors are added to the regression model and in what > order. > Is this what the various "Types" of ANOVA correspond to? I mean that I > think of my ANOVA as a regression model (a General Linear Model) and the > various ways of entering predictors as the various ANOVA "Types". > Hope that makes sense... > > Ravi KulkarniRavi, John Fox's posting provided a lot of information. Briefly, the "Types" refer to whether effects are adjusted for all other effects in the model (Types II, III, IV) or no (Type I or sequential tests only adjust for EARLIER terms in the model). See John's posting for definitions of Types II and III and for reasons to almost always use II or III. Stepwise procedures are a whole different kettle of fish, and they yield invalid statistical tests in almost all cases involving more than two candidate variables and more than one fitted model. The world might be a better place if these types were not invented and we stated statistical tests more intuitively. The contrast function in the rms package is one of many ways to do that. Here are the examples from the help file. See especially the examples using type='joint'. set.seed(1) age <- rnorm(200,40,12) sex <- factor(sample(c('female','male'),200,TRUE)) logit <- (sex=='male') + (age-40)/5 y <- ifelse(runif(200) <= plogis(logit), 1, 0) f <- lrm(y ~ pol(age,2)*sex) # Compare a 30 year old female to a 40 year old male # (with or without age x sex interaction in the model) contrast(f, list(sex='female', age=30), list(sex='male', age=40)) # For a model containing two treatments, centers, and treatment # x center interaction, get 0.95 confidence intervals separately # by cente center <- factor(sample(letters[1:8],500,TRUE)) treat <- factor(sample(c('a','b'), 500,TRUE)) y <- 8*(treat=='b') + rnorm(500,100,20) f <- ols(y ~ treat*center) lc <- levels(center) contrast(f, list(treat='b', center=lc), list(treat='a', center=lc)) # Get 'Type III' contrast: average b - a treatment effect over # centers, weighting centers equally (which is almost always # an unreasonable thing to do) contrast(f, list(treat='b', center=lc), list(treat='a', center=lc), type='average') # Get 'Type II' contrast, weighting centers by the number of # subjects per center. Print the design contrast matrix used. k <- contrast(f, list(treat='b', center=lc), list(treat='a', center=lc), type='average', weights=table(center)) print(k, X=TRUE) # Note: If other variables had interacted with either treat # or center, we may want to list settings for these variables # inside the list()'s, so as to not use default settings # For a 4-treatment study, get all comparisons with treatment 'a' treat <- factor(sample(c('a','b','c','d'), 500,TRUE)) y <- 8*(treat=='b') + rnorm(500,100,20) dd <- datadist(treat,center); options(datadist='dd') f <- ols(y ~ treat*center) lt <- levels(treat) contrast(f, list(treat=lt[-1]), list(treat=lt[ 1]), cnames=paste(lt[-1],lt[1],sep=':'), conf.int=1-.05/3) # Compare each treatment with average of all others for(i in 1:length(lt)) { cat('Comparing with',lt[i],'\n\n') print(contrast(f, list(treat=lt[-i]), list(treat=lt[ i]), type='average')) } options(datadist=NULL) # Six ways to get the same thing, for a variable that # appears linearly in a model and does not interact with # any other variables. We estimate the change in y per # unit change in a predictor x1. Methods 4, 5 also # provide confidence limits. Method 6 computes nonparametric # bootstrap confidence limits. Methods 2-6 can work # for models that are nonlinear or non-additive in x1. # For that case more care is needed in choice of settings # for x1 and the variables that interact with x1. coef(fit)['x1'] # method 1 diff(predict(fit, gendata(x1=c(0,1)))) # method 2 g <- Function(fit) # method 3 g(x1=1) - g(x1=0) summary(fit, x1=c(0,1)) # method 4 k <- contrast(fit, list(x1=1), list(x1=0)) # method 5 print(k, X=TRUE) fit <- update(fit, x=TRUE, y=TRUE) # method 6 b <- bootcov(fit, B=500, coef.reps=TRUE) bootplot(b, X=k$X) # bootstrap distribution and CL # In a model containing age, race, and sex, # compute an estimate of the mean response for a # 50 year old male, averaged over the races using # observed frequencies for the races as weights f <- ols(y ~ age + race + sex) contrast(f, list(age=50, sex='male', race=levels(race)), type='average', weights=table(race)) # Plot the treatment effect (drug - placebo) as a function of age # and sex in a model in which age nonlinearly interacts with treatment # for females only set.seed(1) n <- 800 treat <- factor(sample(c('drug','placebo'), n,TRUE)) sex <- factor(sample(c('female','male'), n,TRUE)) age <- rnorm(n, 50, 10) y <- .05*age + (sex=='female')*(treat=='drug')*.05*abs(age-50) + rnorm(n) f <- ols(y ~ rcs(age,4)*treat*sex) d <- datadist(age, treat, sex); options(datadist='d') # show separate estimates by treatment and sex plot(Predict(f, age, treat, sex='female')) plot(Predict(f, age, treat, sex='male')) ages <- seq(35,65,by=5); sexes <- c('female','male') w <- contrast(f, list(treat='drug', age=ages, sex=sexes), list(treat='placebo', age=ages, sex=sexes)) xYplot(Cbind(Contrast, Lower, Upper) ~ age | sex, data=w, ylab='Drug - Placebo') xYplot(Cbind(Contrast, Lower, Upper) ~ age, groups=sex, data=w, ylab='Drug - Placebo', method='alt bars') options(datadist=NULL) # Examples of type='joint' contrast tests set.seed(1) x1 <- rnorm(100) x2 <- factor(sample(c('a','b','c'), 100, TRUE)) dd <- datadist(x1, x2); options(datadist='dd') y <- x1 + (x2=='b') + rnorm(100) # First replicate a test statistic from anova() f <- ols(y ~ x2) anova(f) contrast(f, list(x2=c('b','c')), list(x2='a'), type='joint') # Repeat with a redundancy; compare a vs b, a vs c, b vs c contrast(f, list(x2=c('a','a','b')), list(x2=c('b','c','c')), type='joint') # Get a test of association of a continuous predictor with y # First assume linearity, then cubic f <- lrm(y>0 ~ x1 + x2) anova(f) contrast(f, list(x1=1), list(x1=0), type='joint') # a minimum set of contrasts xs <- seq(-2, 2, length=20) contrast(f, list(x1=0), list(x1=xs), type='joint') # All contrasts were redundant except for the first, because of # linearity assumption f <- lrm(y>0 ~ pol(x1,3) + x2) anova(f) contrast(f, list(x1=0), list(x1=xs), type='joint') print(contrast(f, list(x1=0), list(x1=xs), type='joint'), jointonly=TRUE) # All contrasts were redundant except for the first 3, because of # cubic regression assumption # Now do something that is difficult to do without cryptic contrast # matrix operations: Allow each of the three x2 groups to have a different # shape for the x1 effect where x1 is quadratic. Test whether there is # a difference in mean levels of y for x2='b' vs. 'c' or whether # the shape or slope of x1 is different between x2='b' and x2='c' regardless # of how they differ when x2='a'. In other words, test whether the mean # response differs between group b and c at any value of x1. # This is a 3 d.f. test (intercept, linear, quadratic effects) and is # a better approach than subsetting the data to remove x2='a' then # fitting a simpler model, as it uses a better estimate of sigma from # all the data. f <- ols(y ~ pol(x1,2) * x2) anova(f) contrast(f, list(x1=xs, x2='b'), list(x1=xs, x2='c'), type='joint') # Note: If using a spline fit, there should be at least one value of # x1 between any two knots and beyond the outer knots. -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of Biostatistics Vanderbilt University