Version of R: Windows Version 2.0.0
The experimental design contains two plant lines - a control (C) and a
mutant (M) - grown out three separate times in plots A, B, C.
The design is unbalanced:
In plot A, 9 control plants were grown with 29 mutant plants.
In plot B, 8 control plants were grown with 20 mutant plants.
In plot C, 8 control plants were grown with 22 mutant plants.
The dependent variable, component A was collected three times from each
plant in each plot and then averaged and normalized to the plot's
control plants. There are thus 96 sample relative averages.
The null hypothesis:
component A for mutant = component A for control.
What we want to know:
1. Is there a difference in componentA between control and mutant plant
lines at the 0.05 significance level across plots?
2. How do we determine this difference using R?
Currently, the statistical method we are choosing is ANOVA using R.
> plotData <- read.table('PlotData.txt', header=TRUE)
> names(plotData)
[1] "Parent" "ComponentA" "Plot"
> summary(lm(ComponentA ~ Parent + Plot, plotData))
Call:
lm(formula = ComponentA ~ Parent + Plot, data = plotData)
Residuals:
Min 1Q Median 3Q Max
-16.423623 -4.778514 0.006824 4.555770 17.196949
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 99.1750 1.6349 60.661 <2e-16 ***
ParentMutant 1.5394 1.5821 0.973 0.333
PlotB -0.2747 1.6942 -0.162 0.872
PlotC -0.3042 1.6603 -0.183 0.855
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` '
1
Residual standard error: 6.795 on 92 degrees of freedom
Multiple R-Squared: 0.01084, Adjusted R-squared: -0.02142
F-statistic: 0.3359 on 3 and 92 DF, p-value: 0.7994
#Why do I get a different ANOVA result when I switch the formula order?
> anova(lm(ComponentA ~ Parent + Plot, plotData))
Analysis of Variance Table
Response: ComponentA
Df Sum Sq Mean Sq F value Pr(>F)
Parent 1 44.6 44.6 0.9657 0.3283
Plot 2 1.9 1.0 0.0210 0.9792
Residuals 92 4248.4 46.2 > anova(lm(ComponentA ~ Plot + Parent, plotData))
Analysis of Variance Table
Response: ComponentA
Df Sum Sq Mean Sq F value Pr(>F)
Plot 2 2.8 1.4 0.0305 0.9700
Parent 1 43.7 43.7 0.9468 0.3331
Residuals 92 4248.4 46.2
Several questions:
1. Why do we get a different ANOVA answer when we switch the order of
the explanatory variables , e.g. ComponentA ~ Parent + Plot vs
ComponentA ~ Plot + Parent?
2. Why are the Sum Sq and F ratios calculated by this ANOVA method
different than those that are calculated by SAS JMP?
Source Nparm DF Sum of Square F Ratio Prob > F
Parent 1 1 43.721996 0.9468 0.3331
Plot 2 2 1.939821 0.0210 0.9792
Thank you,
Sandie Peters
Informatics research scientist
Exelixis Plant Sciences
[[alternative HTML version deleted]]