thr3ads.net - R help - [R] anova planned comparisons/contrasts [Nov 2007]

If this information is useful, please help other people find it:
Share via:

Tyler Smith

2007-Nov-22 19:38 UTC

[R] anova planned comparisons/contrasts

Hi,

I'm trying to figure out how anova works in R by translating the
examples in Sokal And Rohlf's (1995 3rd edition) Biometry. I've hit a
snag with planned comparisons, their box 9.4 and section 9.6. It's a
basic anova design:

treatment <- factor(rep(c("control", "glucose",
"fructose",
	                  "gluc+fruct", "sucrose"), each = 10))

length <- c(75, 67, 70, 75, 65, 71, 67, 67, 76, 68,
            57, 58, 60, 59, 62, 60, 60, 57, 59, 61,
            58, 61, 56, 58, 57, 56, 61, 60, 57, 58,
            58, 59, 58, 61, 57, 56, 58, 57, 57, 59,
            62, 66, 65, 63, 64, 62, 65, 65, 62, 67)

sugars <- data.frame(treatment, length)

The basic analysis is easy enough:

anova(lm(length ~ treatment, sugars))

S&R proceed to calculate planned comparisons between control and all
other groups, between gluc+fruct and the remaining sugars, and among
the three pure sugars. I can get the first two comparisons using the
contrasts() function:

contrasts(sugars$treatment) <- matrix(c(-4, 1, 1,  1,  1,
                                        0, -1, 3, -1, -1), 5, 2) 

summary(lm(length ~ treatment, sugars))

The results appear to be the same, excepting that the book calculates
an F value, whereas R produces an equivalent t value. However, I can't
figure out how to calculate a contrast among three groups. For those
without access to Sokal and Rohlf, I've written an R function that
performs the analysis they use, copied below. Is there a better way to
do this in R?

Also, I don't know how to interpret the Estimate and Std. Error
columns from the summary of the lm with contrasts. It's clear the
intercept in this case is the grand mean, but what are the other
values? They appear to be the difference between one of the contrast
groups and the grand mean -- wouldn't an estimate of the differences
between the contrasted groups be more appropriate, or am I (likely)
misinterpreting this?

Thanks!

Tyler

MyContrast <- function(Var, Group, G1, G2, G3=NULL) {
  ## Var == data vector, Group == factor
  ## G1, G2, G3 == character vectors of factor levels to contrast
  
  nG1 = sum(Group %in% G1)
  nG2 = sum(Group %in% G2)
  GrandMean = mean(Var[Group %in% c(G1, G2, G3)]) 
  G1Mean = mean(Var[Group %in% G1])
  G2Mean = mean(Var[Group %in% G2])

  if(is.null(G3))
    MScontr = nG1 * ((G1Mean - GrandMean)^2) + 
    	      nG2 * ((G2Mean - GrandMean)^2)
  else {
      nG3 = sum(Group %in% G3)
      G3Mean = mean(Var[Group %in% G3])
      MScontr = (nG1 * ((G1Mean - GrandMean)^2) + 
              	 nG2 * ((G2Mean - GrandMean)^2) + 
	    	 nG3 * ((G3Mean - GrandMean)^2))/2
    }
  
  An <- anova(lm(Var ~ Group))
  MSwithin = An[3]['Residuals',]
  DegF = An$Df[length(An$Df)]
  Fval = MScontr / MSwithin
  Pval = 1 - pf(Fval, 1, DegF)
  
  return (list(MS_contrasts = MScontr, MS_within = MSwithin, 
  	       F_value = Fval, P_value = Pval))
}

## The first two contrasts produce the same (+/- rounding error)
## p-values as obtained using contrasts()
MyContrast(sugars$length, sugars$treatment, 'control', 
			  c("fructose", "gluc+fruct", "glucose",
			  "sucrose"))
MyContrast(sugars$length, sugars$treatment, 'gluc+fruct',
			  c("fructose", "glucose", "sucrose"))

## How do you do this in standard R?
MyContrast(sugars$length, sugars$treatment, "fructose",
"glucose",
			  "sucrose")

Dieter Menne

2007-Nov-22 20:38 UTC

head link

[R] anova planned comparisons/contrasts

Tyler Smith <tyler.smith <at> mail.mcgill.ca> writes:
> I'm trying to figure out how anova works in R by translating the
> examples in Sokal And Rohlf's (1995 3rd edition) Biometry. I've hit
a
> snag with planned comparisons, their box 9.4 and section 9.6. It's a
> basic anova design:
> 
.... how to do contrast....

It's easier to use function estimable in package gmodels, or glht in package
multcomp.

Dieter

Peter Alspach

2007-Nov-22 20:44 UTC

head link

[R] anova planned comparisons/contrasts

Tyler 

For balanced data like this you might find aov() gives an output which
is more comparable to Sokal and Rohlf (which I don't have):
> trtCont <- C(sugars$treatment, matrix(c(-4,1,1,1,1, 0,-1,3,-1,-1), 5,
2))> sugarsAov <- aov(length ~ trtCont, sugars)
> summary(sugarsAov, split=list(trtCont=list('control vs rest'=1,
'gf vsothers'=2)))
                           Df  Sum Sq Mean Sq  F value    Pr(>F)    
trtCont                     4 1077.32  269.33  49.3680 6.737e-16 ***
  trtCont: control vs rest  1  832.32  832.32 152.5637 4.680e-16 ***
  trtCont: gf vs others     1   48.13   48.13   8.8228  0.004759 ** 
Residuals                  45  245.50    5.46                       
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1 > model.tables(sugarsAov, type='mean', se=T)Tables of means
Grand mean
      
61.94 

 trtCont 
trtCont
   control   fructose gluc+fruct    glucose    sucrose 
      70.1       58.2       58.0       59.3       64.1 

Standard errors for differences of means
        trtCont
          1.045
replic.      10

Since the two specified contrasts are orthogonal, the difference among
the remaining three sugars can be obtained by substraction:
> sugarsSum <- summary(sugarsAov,                       split=list(trtCont=list('control vs rest'=1,
'gf
vs others'=2)))
# Sum Sq> sugarsSum[[1]][1,2]-sum(sugarsSum[[1]][2:3,2])         
196.8667 
# Mean Sq> (sugarsSum[[1]][1,2]-sum(sugarsSum[[1]][2:3,2]))/2         
98.43333 
# F value> ((sugarsSum[[1]][1,2]-sum(sugarsSum[[1]][2:3,2]))/2) /sugarsSum[[1]][4,3]
         
18.04277 
# Pr(>F)> 1-pf(((sugarsSum[[1]][1,2]-sum(sugarsSum[[1]][2:3,2]))/2) /sugarsSum[[1]][4,3], 2, 45)
             
1.762198e-06 

HTH ........

Peter Alspach
> [mailto:r-help-bounces at r-project.org] On Behalf Of Tyler Smith
> Subject: [R] anova planned comparisons/contrasts
> 
> Hi,
> 
> I'm trying to figure out how anova works in R by translating 
> the examples in Sokal And Rohlf's (1995 3rd edition) 
> Biometry. I've hit a snag with planned comparisons, their box 
> 9.4 and section 9.6. It's a basic anova design:
> 
> treatment <- factor(rep(c("control", "glucose",
"fructose",
> 	                  "gluc+fruct", "sucrose"), each =
10))
> 
> length <- c(75, 67, 70, 75, 65, 71, 67, 67, 76, 68,
>             57, 58, 60, 59, 62, 60, 60, 57, 59, 61,
>             58, 61, 56, 58, 57, 56, 61, 60, 57, 58,
>             58, 59, 58, 61, 57, 56, 58, 57, 57, 59,
>             62, 66, 65, 63, 64, 62, 65, 65, 62, 67)
> 
> sugars <- data.frame(treatment, length)
> 
> The basic analysis is easy enough:
> 
> anova(lm(length ~ treatment, sugars))
> 
> S&R proceed to calculate planned comparisons between control 
> and all other groups, between gluc+fruct and the remaining 
> sugars, and among the three pure sugars. I can get the first 
> two comparisons using the
> contrasts() function:
> 
> contrasts(sugars$treatment) <- matrix(c(-4, 1, 1,  1,  1,
>                                         0, -1, 3, -1, -1), 5, 2) 
> 
> summary(lm(length ~ treatment, sugars))
> 
> The results appear to be the same, excepting that the book 
> calculates an F value, whereas R produces an equivalent t 
> value. However, I can't figure out how to calculate a 
> contrast among three groups. For those without access to 
> Sokal and Rohlf, I've written an R function that performs the 
> analysis they use, copied below. Is there a better way to do 
> this in R?
> 
> Also, I don't know how to interpret the Estimate and Std. 
> Error columns from the summary of the lm with contrasts. It's 
> clear the intercept in this case is the grand mean, but what 
> are the other values? They appear to be the difference 
> between one of the contrast groups and the grand mean -- 
> wouldn't an estimate of the differences between the 
> contrasted groups be more appropriate, or am I (likely) 
> misinterpreting this?
> 
> Thanks!
> 
> Tyler[Code deleted]

Greg Snow

2007-Nov-26 23:26 UTC

head link

[R] anova planned comparisons/contrasts

Others have shown some approaches that work well for after you fit the
model.  Here is another approach starting with the model fit itself:

tmp <- c("control", "glucose", "fructose", 
	                  "gluc+fruct", "sucrose")

treatment <- factor(rep( tmp, each=10 ), levels=tmp)

length <- c(75, 67, 70, 75, 65, 71, 67, 67, 76, 68,
            57, 58, 60, 59, 62, 60, 60, 57, 59, 61,
            58, 61, 56, 58, 57, 56, 61, 60, 57, 58,
            58, 59, 58, 61, 57, 56, 58, 57, 57, 59,
            62, 66, 65, 63, 64, 62, 65, 65, 62, 67)



cm1 <- rbind( int=1/5, cntrl = c(4, -1,-1,-1,-1)/4,
	plus=c( 0, -1, -1, 3, -1 )/3,
	sug1=c( 0, -1, 1, 0, 0),
	sug2=c( 0, -1, -1, 0, 2)/2 )

cm2 <- zapsmall(solve(cm1))

contrasts(treatment) <- cm2[,-1]

treatment

sugars <- data.frame( length=length, treatment=treatment )

fit <- aov( length ~ treatment, data=sugars )
summary.lm(fit)
summary(fit)
summary(fit, split=list(treatment=list(
  'Control vs. Sugars'=1, 'Gluc+Fruc vs. sugars'=2,
'Sugars'=3:4)))


Is this more what you were looking for?

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at intermountainmail.org
(801) 408-8111
 
 
> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Tyler Smith
> Sent: Thursday, November 22, 2007 12:38 PM
> To: r-help at stat.math.ethz.ch
> Subject: [R] anova planned comparisons/contrasts
> 
> Hi,
> 
> I'm trying to figure out how anova works in R by translating 
> the examples in Sokal And Rohlf's (1995 3rd edition) 
> Biometry. I've hit a snag with planned comparisons, their box 
> 9.4 and section 9.6. It's a basic anova design:
> 
> treatment <- factor(rep(c("control", "glucose",
"fructose",
> 	                  "gluc+fruct", "sucrose"), each =
10))
> 
> length <- c(75, 67, 70, 75, 65, 71, 67, 67, 76, 68,
>             57, 58, 60, 59, 62, 60, 60, 57, 59, 61,
>             58, 61, 56, 58, 57, 56, 61, 60, 57, 58,
>             58, 59, 58, 61, 57, 56, 58, 57, 57, 59,
>             62, 66, 65, 63, 64, 62, 65, 65, 62, 67)
> 
> sugars <- data.frame(treatment, length)
> 
> The basic analysis is easy enough:
> 
> anova(lm(length ~ treatment, sugars))
> 
> S&R proceed to calculate planned comparisons between control 
> and all other groups, between gluc+fruct and the remaining 
> sugars, and among the three pure sugars. I can get the first 
> two comparisons using the
> contrasts() function:
> 
> contrasts(sugars$treatment) <- matrix(c(-4, 1, 1,  1,  1,
>                                         0, -1, 3, -1, -1), 5, 2) 
> 
> summary(lm(length ~ treatment, sugars))
> 
> The results appear to be the same, excepting that the book 
> calculates an F value, whereas R produces an equivalent t 
> value. However, I can't figure out how to calculate a 
> contrast among three groups. For those without access to 
> Sokal and Rohlf, I've written an R function that performs the 
> analysis they use, copied below. Is there a better way to do 
> this in R?
> 
> Also, I don't know how to interpret the Estimate and Std. 
> Error columns from the summary of the lm with contrasts. It's 
> clear the intercept in this case is the grand mean, but what 
> are the other values? They appear to be the difference 
> between one of the contrast groups and the grand mean -- 
> wouldn't an estimate of the differences between the 
> contrasted groups be more appropriate, or am I (likely) 
> misinterpreting this?
> 
> Thanks!
> 
> Tyler
> 
> MyContrast <- function(Var, Group, G1, G2, G3=NULL) {
>   ## Var == data vector, Group == factor
>   ## G1, G2, G3 == character vectors of factor levels to contrast
>   
>   nG1 = sum(Group %in% G1)
>   nG2 = sum(Group %in% G2)
>   GrandMean = mean(Var[Group %in% c(G1, G2, G3)])
>   G1Mean = mean(Var[Group %in% G1])
>   G2Mean = mean(Var[Group %in% G2])
> 
>   if(is.null(G3))
>     MScontr = nG1 * ((G1Mean - GrandMean)^2) + 
>     	      nG2 * ((G2Mean - GrandMean)^2)
>   else {
>       nG3 = sum(Group %in% G3)
>       G3Mean = mean(Var[Group %in% G3])
>       MScontr = (nG1 * ((G1Mean - GrandMean)^2) + 
>               	 nG2 * ((G2Mean - GrandMean)^2) + 
> 	    	 nG3 * ((G3Mean - GrandMean)^2))/2
>     }
>   
>   An <- anova(lm(Var ~ Group))
>   MSwithin = An[3]['Residuals',]
>   DegF = An$Df[length(An$Df)]
>   Fval = MScontr / MSwithin
>   Pval = 1 - pf(Fval, 1, DegF)
>   
>   return (list(MS_contrasts = MScontr, MS_within = MSwithin, 
>   	       F_value = Fval, P_value = Pval)) }
> 
> ## The first two contrasts produce the same (+/- rounding 
> error) ## p-values as obtained using contrasts() 
> MyContrast(sugars$length, sugars$treatment, 'control', 
> 			  c("fructose", "gluc+fruct", "glucose",
> 			  "sucrose"))
> MyContrast(sugars$length, sugars$treatment, 'gluc+fruct',
> 			  c("fructose", "glucose", "sucrose"))
> 
> ## How do you do this in standard R?
> MyContrast(sugars$length, sugars$treatment, "fructose",
"glucose",
> 			  "sucrose")
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Apparently Analagous Threads

Search for more possibly parallel threads

R help - Nov 2007 - anova planned comparisons/contrasts

[R] anova planned comparisons/contrasts

[R] anova planned comparisons/contrasts

[R] anova planned comparisons/contrasts

[R] anova planned comparisons/contrasts

Apparently Analagous Threads