thr3ads.net - R help - [R] How to interpret an ANOVA result? [May 2012]

If this information is useful, please help other people find it:
Share via:

Robert Latest

2012-May-14 10:17 UTC

[R] How to interpret an ANOVA result?

Hello all,

here's a real-world example: I'm measuring a quantity (d) at five
sites (site1 thru site5) on a silicon wafer. There is a clear
site-dependence of the measured value. To find out if this is a
measurement artifact I measured the wafer four times: twice in the
normal position (posN), and twice rotated by 180 degrees (posR). My
data looks like this (full, self-contained code at bottom). Note that
sites with the same number correspond to the same physical location on
the wafer (the rotation has already been taken into account here).
> head(x)     d site pos
1 1383    1   N
2 1377    1   R
3 1388    1   R
4 1373    1   N
5 1386    2   N
6 1394    2   R
> boxplot (d~pos+site)
This boxplot (see code) already hints at a true site-dependence of the
measured value (no artifact). OK, so let's do an ANOVA to make this
more quantitative:
> summary(lm(d ~ site*pos)
Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) 1378.000      3.078 447.672  < 2e-16 ***
site2         11.500      4.353   2.642  0.02466 *
site3         12.000      4.353   2.757  0.02025 *
site4         17.000      4.353   3.905  0.00294 **
site5          1.000      4.353   0.230  0.82294
posR           4.500      4.353   1.034  0.32561
site2:posR    -4.000      6.156  -0.650  0.53050
site3:posR   -10.500      6.156  -1.706  0.11890
site4:posR    -5.500      6.156  -0.893  0.39264
site5:posR    -3.000      6.156  -0.487  0.63655

Now I think that I see the following:
- The average of d at site1 in pos. N (first in alphabet) is 1378.
- Average values for site2, 3, 4 (especially 4) in pos. N deviate
significantly from pos. 1. For instance, values at site4 are on
average 17 greater than at site1.
- The average value at site5 does not differ significantly from site1.
OK, that was the top part of the result table. Now the bottom part:
- In reverse position(posR) the average of d at site1 is 4.5 bigger,
but that's not significant.
- The average of d at site3:posR is 10.5 smaller than something, but
smaller than what? And why does this -10.5 deviation have a p-value of
.1 (not significant) vs the .02 (significant) deviation of 11.5
(site2, top part)?

Let's see if I can figure that out. Difference between posN and posR
at site3 is not so big:>
mean(d[site==3&pos=="R"])-mean(d[site==3&pos=="N"])[1] -6
Is this what makes it insignificant?

Shuffling around the numbers until I get to -10.5:
>
mean(d[site==3&pos=="R"])-mean(d[site==3&pos=="N"])-(mean(d[site==1&pos=="R"])-mean(d[site==1&pos=="N"]))[1] -10.5

OK, one has to keep track of all the differences and stuff.

So I think I have understood about 80% of this simple example. The
reason I'm going after this so stubbornly is that I'm at the beginning
of a DOE which will take several weeks of measuring and will end up
being analyzed with a big ANOVA (two response and about six
explanatory variables, some continuous, some factorial). Already in
the DOE phase I want to understand what I will be doing with the data
later (this is for a Six Sigma project in an industrial production
environment, in case anybody wants to know).

Thanks,
robert

Here's the full dataset:

x <- structure(list(d = c(1383L, 1377L, 1388L, 1373L, 1386L, 1394L,
1386L, 1393L, 1390L, 1382L, 1386L, 1390L, 1395L, 1396L, 1392L,
1395L, 1378L, 1382L, 1379L, 1380L), site = structure(c(1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L,
5L, 5L), .Label = c("1", "2", "3", "4",
"5"), class = "factor"),
    pos = structure(c(1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L,
    2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L), .Label = c("N",
    "R"), class = "factor")), .Names = c("d",
"site", "pos"), row.names = c(NA,
-20L), class = "data.frame")
attach(x)
head(x)
boxplot (d~pos+site)

Bryan Hanson

2012-May-15 11:59 UTC

head link

[R] How to interpret an ANOVA result?

I see that no one has replied on this, so I'll take a stab.

This is probably a matter of personal taste, but I would suggest a somewhat
different and simpler approach.  What you have done is not strictly an ANOVA,
it's a linear model (they are related).  But the particular way you've
asked R to report gives you the answer in terms of the linear model.  That means
your significance stars refer to whether or not the slopes in the model differ
significantly from zero.  Perhaps you are aware of this.

Anyway, I thought your data set was interesting, so I took the approach that
comes to my mind.  Here it is.  It might be pretty much self-explanatory, if
not, try ?aov and ?TukeyHSD for details.  Maybe it answers your questions about
why things are significant or not.  Hopefully I didn't misunderstand your
questions.

Good Luck.  Bryan
***********
Bryan Hanson
Professor of Chemistry & Biochemistry
DePauw University


Using your full data set, in variable x:

res <- aov(d~site*pos, data = x)
summary(res)
            Df Sum Sq Mean Sq F value  Pr(>F)   
site         4  636.5  159.13   8.397 0.00308 **
pos          1    0.0    0.05   0.003 0.96005   
site:pos     4   59.7   14.93   0.788 0.55886   
Residuals   10  189.5   18.95                   
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 

# So clearly site is the only significant factor.
# Use TukeyHSD to see which sites are different from each other:

TukeyHSD(res)
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = d ~ site * pos, data = x)

$site
      diff         lwr        upr     p adj
2-1   9.50  -0.6304407 19.6304407 0.0686646
3-1   6.75  -3.3804407 16.8804407 0.2569519
4-1  14.25   4.1195593 24.3804407 0.0064949
5-1  -0.50 -10.6304407  9.6304407 0.9998140
3-2  -2.75 -12.8804407  7.3804407 0.8930095
4-2   4.75  -5.3804407 14.8804407 0.5603830
5-2 -10.00 -20.1304407  0.1304407 0.0533983
4-3   7.50  -2.6304407 17.6304407 0.1824748
5-3  -7.25 -17.3804407  2.8804407 0.2049583
5-4 -14.75 -24.8804407 -4.6195593 0.0051227

$pos
    diff       lwr      upr    p adj
R-N -0.1 -4.437723 4.237723 0.960045

$`site:pos`
         diff         lwr       upr     p adj
2:N-1:N  11.5  -5.7326665 28.732667 0.3077877
3:N-1:N  12.0  -5.2326665 29.232667 0.2662504
4:N-1:N  17.0  -0.2326665 34.232667 0.0539873
5:N-1:N   1.0 -16.2326665 18.232667 0.9999999
1:R-1:N   4.5 -12.7326665 21.732667 0.9818788
2:R-1:N  12.0  -5.2326665 29.232667 0.2662504
3:R-1:N   6.0 -11.2326665 23.232667 0.9093271
4:R-1:N  16.0  -1.2326665 33.232667 0.0750508
5:R-1:N   2.5 -14.7326665 19.732667 0.9997412
3:N-2:N   0.5 -16.7326665 17.732667 1.0000000
4:N-2:N   5.5 -11.7326665 22.732667 0.9418966
5:N-2:N -10.5 -27.7326665  6.732667 0.4048556
1:R-2:N  -7.0 -24.2326665 10.232667 0.8191838
2:R-2:N   0.5 -16.7326665 17.732667 1.0000000
3:R-2:N  -5.5 -22.7326665 11.732667 0.9418966
4:R-2:N   4.5 -12.7326665 21.732667 0.9818788
5:R-2:N  -9.0 -26.2326665  8.232667 0.5798223
4:N-3:N   5.0 -12.2326665 22.232667 0.9658140
5:N-3:N -11.0 -28.2326665  6.232667 0.3540251
1:R-3:N  -7.5 -24.7326665  9.732667 0.7639472
2:R-3:N   0.0 -17.2326665 17.232667 1.0000000
3:R-3:N  -6.0 -23.2326665 11.232667 0.9093271
4:R-3:N   4.0 -13.2326665 21.232667 0.9915584
5:R-3:N  -9.5 -26.7326665  7.732667 0.5185806
5:N-4:N -16.0 -33.2326665  1.232667 0.0750508
1:R-4:N -12.5 -29.7326665  4.732667 0.2293315
2:R-4:N  -5.0 -22.2326665 12.232667 0.9658140
3:R-4:N -11.0 -28.2326665  6.232667 0.3540251
4:R-4:N  -1.0 -18.2326665 16.232667 0.9999999
5:R-4:N -14.5 -31.7326665  2.732667 0.1224175
1:R-5:N   3.5 -13.7326665 20.732667 0.9966588
2:R-5:N  11.0  -6.2326665 28.232667 0.3540251
3:R-5:N   5.0 -12.2326665 22.232667 0.9658140
4:R-5:N  15.0  -2.2326665 32.232667 0.1041067
5:R-5:N   1.5 -15.7326665 18.732667 0.9999963
2:R-1:R   7.5  -9.7326665 24.732667 0.7639472
3:R-1:R   1.5 -15.7326665 18.732667 0.9999963
4:R-1:R  11.5  -5.7326665 28.732667 0.3077877
5:R-1:R  -2.0 -19.2326665 15.232667 0.9999581
3:R-2:R  -6.0 -23.2326665 11.232667 0.9093271
4:R-2:R   4.0 -13.2326665 21.232667 0.9915584
5:R-2:R  -9.5 -26.7326665  7.732667 0.5185806
4:R-3:R  10.0  -7.2326665 27.232667 0.4599257
5:R-3:R  -3.5 -20.7326665 13.732667 0.9966588
5:R-4:R -13.5 -30.7326665  3.732667 0.1683967

# Normally you can plot a TukeyHSD object but this one is too large. 

# The end.
> Hello all,
> 
> here's a real-world example: I'm measuring a quantity (d) at five
> sites (site1 thru site5) on a silicon wafer. There is a clear
> site-dependence of the measured value. To find out if this is a
> measurement artifact I measured the wafer four times: twice in the
> normal position (posN), and twice rotated by 180 degrees (posR). My
> data looks like this (full, self-contained code at bottom). Note that
> sites with the same number correspond to the same physical location on
> the wafer (the rotation has already been taken into account here).
> 
>> head(x)
>     d site pos
> 1 1383    1   N
> 2 1377    1   R
> 3 1388    1   R
> 4 1373    1   N
> 5 1386    2   N
> 6 1394    2   R
> 
>> boxplot (d~pos+site)
> 
> This boxplot (see code) already hints at a true site-dependence of the
> measured value (no artifact). OK, so let's do an ANOVA to make this
> more quantitative:
> 
>> summary(lm(d ~ site*pos)
> 
> Coefficients:
>            Estimate Std. Error t value Pr(>|t|)
> (Intercept) 1378.000      3.078 447.672  < 2e-16 ***
> site2         11.500      4.353   2.642  0.02466 *
> site3         12.000      4.353   2.757  0.02025 *
> site4         17.000      4.353   3.905  0.00294 **
> site5          1.000      4.353   0.230  0.82294
> posR           4.500      4.353   1.034  0.32561
> site2:posR    -4.000      6.156  -0.650  0.53050
> site3:posR   -10.500      6.156  -1.706  0.11890
> site4:posR    -5.500      6.156  -0.893  0.39264
> site5:posR    -3.000      6.156  -0.487  0.63655
> 
> Now I think that I see the following:
> - The average of d at site1 in pos. N (first in alphabet) is 1378.
> - Average values for site2, 3, 4 (especially 4) in pos. N deviate
> significantly from pos. 1. For instance, values at site4 are on
> average 17 greater than at site1.
> - The average value at site5 does not differ significantly from site1.
> OK, that was the top part of the result table. Now the bottom part:
> - In reverse position(posR) the average of d at site1 is 4.5 bigger,
> but that's not significant.
> - The average of d at site3:posR is 10.5 smaller than something, but
> smaller than what? And why does this -10.5 deviation have a p-value of
> .1 (not significant) vs the .02 (significant) deviation of 11.5
> (site2, top part)?
> 
> Let's see if I can figure that out. Difference between posN and posR
> at site3 is not so big:
>>
mean(d[site==3&pos=="R"])-mean(d[site==3&pos=="N"])
> [1] -6
> Is this what makes it insignificant?
> 
> Shuffling around the numbers until I get to -10.5:
> 
>>
mean(d[site==3&pos=="R"])-mean(d[site==3&pos=="N"])-(mean(d[site==1&pos=="R"])-mean(d[site==1&pos=="N"]))
> [1] -10.5
> 
> OK, one has to keep track of all the differences and stuff.
> 
> So I think I have understood about 80% of this simple example. The
> reason I'm going after this so stubbornly is that I'm at the
beginning
> of a DOE which will take several weeks of measuring and will end up
> being analyzed with a big ANOVA (two response and about six
> explanatory variables, some continuous, some factorial). Already in
> the DOE phase I want to understand what I will be doing with the data
> later (this is for a Six Sigma project in an industrial production
> environment, in case anybody wants to know).
> 
> Thanks,
> robert
> 
> Here's the full dataset:
> 
> x <- structure(list(d = c(1383L, 1377L, 1388L, 1373L, 1386L, 1394L,
> 1386L, 1393L, 1390L, 1382L, 1386L, 1390L, 1395L, 1396L, 1392L,
> 1395L, 1378L, 1382L, 1379L, 1380L), site = structure(c(1L, 1L,
> 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L,
> 5L, 5L), .Label = c("1", "2", "3",
"4", "5"), class = "factor"),
>    pos = structure(c(1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L,
>    2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L), .Label = c("N",
>    "R"), class = "factor")), .Names = c("d",
"site", "pos"), row.names = c(NA,
> -20L), class = "data.frame")
> attach(x)
> head(x)
> boxplot (d~pos+site)
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Rui Barradas

2012-May-15 13:52 UTC

head link

[R] How to interpret an ANOVA result?

Hello,

1. There's a function anova(), which you have forgot to use.

 > model <- lm(d~pos*site, data=x)
 > anova(model)
Analysis of Variance Table

Response: d
           Df Sum Sq Mean Sq F value   Pr(>F)
pos        1   0.05   0.050  0.0026 0.960045
site       4 636.50 159.125  8.3971 0.003084 **
pos:site   4  59.70  14.925  0.7876 0.558857
Residuals 10 189.50  18.950
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1

The waffer position and it's interaction with site are not significant.

2. There is also a function aov(), similar to lm() but with a slightly 
different use.

 > model.aov <- aov(d~pos*site, data=x)
 > summary(model.aov)
             Df Sum Sq Mean Sq F value  Pr(>F)
pos          1    0.0    0.05   0.003 0.96005
site         4  636.5  159.13   8.397 0.00308 **
pos:site     4   59.7   14.93   0.788 0.55886
Residuals   10  189.5   18.95
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1

The only difference is in the decimals displayed.

3. Back to lm(), you can use confint() to see if the estimates' 95% 
confidence intervals include zero.
If so, they're statiscally insignificant.

 > confint(model)
                   2.5 %      97.5 %
(Intercept) 1371.141457 1384.858543
posR          -5.199444   14.199444
site2          1.800556   21.199444
site3          2.300556   21.699444
site4          7.300556   26.699444
site5         -8.699444   10.699444
posR:site2   -17.717086    9.717086
posR:site3   -24.217086    3.217086
posR:site4   -19.217086    8.217086
posR:site5   -16.717086   10.717086

And the interpretation is the same.

(There are books on-line, check out
http://www.r-project.org/

At the bottom left, click Books.)

Hope this helps,

Em 15-05-2012 11:00,  Robert Latest escreveu:> Date: Mon, 14 May 2012 12:17:03 +0200 From: Robert Latest 
> <boblatest@gmail.com> To: r-help@r-project.org Subject: [R] How to 
> interpret an ANOVA result? Message-ID: 
> <CAMXbmUQx6EoB1zwiK5vMu0=hDdy-v0S_cqNH5puVq+S0xn7F0w@mail.gmail.com> 
> Content-Type: text/plain; charset=ISO-8859-1 Hello all, here's a 
> real-world example: I'm measuring a quantity (d) at five sites (site1 
> thru site5) on a silicon wafer. There is a clear site-dependence of 
> the measured value. To find out if this is a measurement artifact I 
> measured the wafer four times: twice in the normal position (posN), 
> and twice rotated by 180 degrees (posR). My data looks like this 
> (full, self-contained code at bottom). Note that sites with the same 
> number correspond to the same physical location on the wafer (the 
> rotation has already been taken into account here).
>> >  head(x)
>       d site pos
> 1 1383    1   N
> 2 1377    1   R
> 3 1388    1   R
> 4 1373    1   N
> 5 1386    2   N
> 6 1394    2   R
>
>> >  boxplot (d~pos+site)
> This boxplot (see code) already hints at a true site-dependence of the
> measured value (no artifact). OK, so let's do an ANOVA to make this
> more quantitative:
>
>> >  summary(lm(d ~ site*pos)
> Coefficients:
>              Estimate Std. Error t value Pr(>|t|)
> (Intercept) 1378.000      3.078 447.672<  2e-16 ***
> site2         11.500      4.353   2.642  0.02466 *
> site3         12.000      4.353   2.757  0.02025 *
> site4         17.000      4.353   3.905  0.00294 **
> site5          1.000      4.353   0.230  0.82294
> posR           4.500      4.353   1.034  0.32561
> site2:posR    -4.000      6.156  -0.650  0.53050
> site3:posR   -10.500      6.156  -1.706  0.11890
> site4:posR    -5.500      6.156  -0.893  0.39264
> site5:posR    -3.000      6.156  -0.487  0.63655
>
> Now I think that I see the following:
> - The average of d at site1 in pos. N (first in alphabet) is 1378.
> - Average values for site2, 3, 4 (especially 4) in pos. N deviate
> significantly from pos. 1. For instance, values at site4 are on
> average 17 greater than at site1.
> - The average value at site5 does not differ significantly from site1.
> OK, that was the top part of the result table. Now the bottom part:
> - In reverse position(posR) the average of d at site1 is 4.5 bigger,
> but that's not significant.
> - The average of d at site3:posR is 10.5 smaller than something, but
> smaller than what? And why does this -10.5 deviation have a p-value of
> .1 (not significant) vs the .02 (significant) deviation of 11.5
> (site2, top part)?
>
> Let's see if I can figure that out. Difference between posN and posR
> at site3 is not so big:
>> > 
mean(d[site==3&pos=="R"])-mean(d[site==3&pos=="N"])
> [1] -6
> Is this what makes it insignificant?
>
> Shuffling around the numbers until I get to -10.5:
>
>> > 
mean(d[site==3&pos=="R"])-mean(d[site==3&pos=="N"])-(mean(d[site==1&pos=="R"])-mean(d[site==1&pos=="N"]))
> [1] -10.5
>
> OK, one has to keep track of all the differences and stuff.
>
> So I think I have understood about 80% of this simple example. The
> reason I'm going after this so stubbornly is that I'm at the
beginning
> of a DOE which will take several weeks of measuring and will end up
> being analyzed with a big ANOVA (two response and about six
> explanatory variables, some continuous, some factorial). Already in
> the DOE phase I want to understand what I will be doing with the data
> later (this is for a Six Sigma project in an industrial production
> environment, in case anybody wants to know).
>
> Thanks,
> robert
>
> Here's the full dataset:
>
> x<- structure(list(d = c(1383L, 1377L, 1388L, 1373L, 1386L, 1394L,
> 1386L, 1393L, 1390L, 1382L, 1386L, 1390L, 1395L, 1396L, 1392L,
> 1395L, 1378L, 1382L, 1379L, 1380L), site = structure(c(1L, 1L,
> 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L,
> 5L, 5L), .Label = c("1", "2", "3",
"4", "5"), class = "factor"),
>      pos = structure(c(1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L,
>      2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L), .Label = c("N",
>      "R"), class = "factor")), .Names =
c("d", "site", "pos"), row.names = c(NA,
> -20L), class = "data.frame")
> attach(x)
> head(x)
> boxplot (d~pos+site)
>
	[[alternative HTML version deleted]]

Reasonably Related Threads

Search for more maybe matching threads

R help - May 2012 - How to interpret an ANOVA result?

[R] How to interpret an ANOVA result?

[R] How to interpret an ANOVA result?

[R] How to interpret an ANOVA result?

Reasonably Related Threads