thr3ads.net - R help - [R] Unbalanced data in split-plot analysis with aov() [Oct 2017]

If this information is useful, please help other people find it:
Share via:

Samuel Knapp

2017-Oct-10 22:07 UTC

[R] Unbalanced data in split-plot analysis with aov()

Dear all,

I'm analysing a split-plot experiment, where there are sometimes one or 
two values missing. I realized that if the data is slightly unbalanced, 
the effect of the subplot-treatment will also appear and be tested 
against the mainplot-error term.

I replicated this with the Oats dataset from Yates (1935), contained in 
the nlme package, where Variety is on mainplot, and nitro on subplot.

 > # Oats dataset (Yates 1935) from the nlme package
 > require(nlme); data <- get(data(Oats))
 > data$nitro <- factor(data$nitro);data$Block <- 
as.factor(as.character(data$Block))
 > nrow(data) # 6 Blocks * 4 N-levels * 3 Varieties = 72 obs -> 
orthogonal and balanced
[1] 72
 > # split-plot anova
 > summary(aov(yield ~ Block+Variety*nitro + Error(Block/Variety),data))

Error: Block
 ????? Df Sum Sq Mean Sq
Block? 5? 15875??? 3175

Error: Block:Variety
 ????????? Df Sum Sq Mean Sq F value Pr(>F)
Variety??? 2?? 1786?? 893.2?? 1.485? 0.272
Residuals 10?? 6013?? 601.3

Error: Within
 ????????????? Df Sum Sq Mean Sq F value?? Pr(>F)
nitro????????? 3? 20020??? 6674? 37.686 2.46e-12 ***
Variety:nitro? 6??? 322????? 54?? 0.303??? 0.932
Residuals???? 45?? 7969???? 177
---
Signif. codes:? 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
 > reduceddata <- data[-5,] # delete one observation
 > summary(aov(yield ~ Block+Variety*nitro + 
Error(Block/Variety),reduceddata))

Error: Block
 ????? Df Sum Sq Mean Sq
Block? 5? 16070??? 3214

Error: Block:Variety
 ????????? Df Sum Sq Mean Sq F value Pr(>F)
Variety??? 2?? 1820?? 910.0?? 1.373? 0.302
nitro????? 1????? 1???? 1.0?? 0.001? 0.970
Residuals? 9?? 5964?? 662.7

Error: Within
 ????????????? Df Sum Sq Mean Sq F value?? Pr(>F)
nitro????????? 3? 19766??? 6589?? 36.88 4.49e-12 ***
Variety:nitro? 6??? 333????? 55??? 0.31??? 0.928
Residuals???? 44?? 7860???? 179

Although I fully understand, that mixed-models should be preferred for 
non-orthogonal/unbalanced data, I think it's still valid to use the 
aov() approach, when only 1 or 2 datapoints are missing. Also, I don't 
really understand, why the structure of the ANOVA changes suddenly. Is 
there any argument, I could supply to change this behaviour?

When I do the same with lm() and subsequent anova(), and calculate 
F-value for Variety by hand, the estimates are still quite robust.

Best regards,

Sam


-- 
Samuel Knapp

Lehrstuhl f?r Pflanzenern?hrung
Technische Universit?t M?nchen
(Chair of Plant Nutrition
Technical University of Munich)

Emil-Ramann-Strasse 2
D-85354 Freising

Tel. +49 8161 71-3578
samuel.knapp at tum.de
www.researchgate.net/profile/Samuel_Knapp

-- 
Samuel Knapp

Lehrstuhl f?r Pflanzenern?hrung
Technische Universit?t M?nchen
(Chair of Plant Nutrition
Technical University of Munich)

Emil-Ramann-Strasse 2
D-85354 Freising

Tel. +49 8161 71-3578	
samuel.knapp at tum.de
www.researchgate.net/profile/Samuel_Knapp


	[[alternative HTML version deleted]]

peter dalgaard

2017-Oct-11 09:55 UTC

head link

[R] Unbalanced data in split-plot analysis with aov()

> On 11 Oct 2017, at 00:07 , Samuel Knapp <samuel.knapp at tum.de>
wrote:
> 
> Dear all,
> 
> I'm analysing a split-plot experiment, where there are sometimes one or
> two values missing. I realized that if the data is slightly unbalanced, 
> the effect of the subplot-treatment will also appear and be tested 
> against the mainplot-error term.
> 
> I replicated this with the Oats dataset from Yates (1935), contained in 
> the nlme package, where Variety is on mainplot, and nitro on subplot.
> 
>> # Oats dataset (Yates 1935) from the nlme package
>> require(nlme); data <- get(data(Oats))
>> data$nitro <- factor(data$nitro);data$Block <- 
> as.factor(as.character(data$Block))
>> nrow(data) # 6 Blocks * 4 N-levels * 3 Varieties = 72 obs -> 
> orthogonal and balanced
> [1] 72
>> # split-plot anova
>> summary(aov(yield ~ Block+Variety*nitro + Error(Block/Variety),data))
> 
> Error: Block
>       Df Sum Sq Mean Sq
> Block  5  15875    3175
> 
> Error: Block:Variety
>           Df Sum Sq Mean Sq F value Pr(>F)
> Variety    2   1786   893.2   1.485  0.272
> Residuals 10   6013   601.3
> 
> Error: Within
>               Df Sum Sq Mean Sq F value   Pr(>F)
> nitro          3  20020    6674  37.686 2.46e-12 ***
> Variety:nitro  6    322      54   0.303    0.932
> Residuals     45   7969     177
> ---
> Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
>> reduceddata <- data[-5,] # delete one observation
>> summary(aov(yield ~ Block+Variety*nitro + 
> Error(Block/Variety),reduceddata))
> 
> Error: Block
>       Df Sum Sq Mean Sq
> Block  5  16070    3214
> 
> Error: Block:Variety
>           Df Sum Sq Mean Sq F value Pr(>F)
> Variety    2   1820   910.0   1.373  0.302
> nitro      1      1     1.0   0.001  0.970
> Residuals  9   5964   662.7
> 
> Error: Within
>               Df Sum Sq Mean Sq F value   Pr(>F)
> nitro          3  19766    6589   36.88 4.49e-12 ***
> Variety:nitro  6    333      55    0.31    0.928
> Residuals     44   7860     179
> 
> Although I fully understand, that mixed-models should be preferred for 
> non-orthogonal/unbalanced data, I think it's still valid to use the 
> aov() approach, when only 1 or 2 datapoints are missing. Also, I don't 
> really understand, why the structure of the ANOVA changes suddenly. Is 
> there any argument, I could supply to change this behaviour?
The answer to this is in the math, and not really possible to cover here. 

If things are non-orthogonal, you don't have the pretty separation of
effects into error strata and one effect of that is that you can see the same
effect occurring multiple time with different error terms. (In a plain block
design, if not all treatments occur equally often in each block, the block means
contain information about the treatment effects if block effects are considered
random).

Worse, if the balanced structure of the design is broken, the orthogonal
decomposition of random effects does not correspond to the covariance structure
that you think you are assuming. The consequences of this require considerable
expertise to figure out. (I forget the details, but the nature is like having to
assume variance of random effects to depend on block size.)

One known-valid way of dealing with a small number of missing values, is to
retain the design, set the value of the missing values to something arbitrary,
and include a dummy variable for each missing observation which is 1 for the
position of that value and zero elsewhere. You then still get the effect of
estimating the dummy coefficient in multiple strata corresponding to an extended
model where the effect of the dummy is different for block means than for
residuals. However, at least you then know fairly well what is going on.

So, something like this (removing block from the fixed effects because you also
had is as a random effect):
> i5 <- rep(0,72); i5[5] <- 1
> summary(aov(yield ~ i5 + nitro*Variety + Error(Block/Variety),data))
Error: Block
          Df Sum Sq Mean Sq F value  Pr(>F)   
i5         1  14163   14163   33.08 0.00453 **
Residuals  4   1713     428                   
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Error: Block:Variety
          Df Sum Sq Mean Sq F value Pr(>F)
i5         1     26    26.0   0.039  0.847
Variety    2   1809   904.7   1.365  0.304
Residuals  9   5964   662.7               

Error: Within
              Df Sum Sq Mean Sq F value   Pr(>F)    
i5             1    352     352   1.971    0.167    
nitro          3  19766    6589  36.885 4.49e-12 ***
nitro:Variety  6    333      55   0.310    0.928    
Residuals     44   7860     179                     
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

You will notice that in this analysis, you can modify the value for yield[5] at
will, and still get the same output, except of course for the "i5"
entries. Also, quite interestingly, this reproduces the analysis with the 5th
obs removed, _provided_ that you put nitro before Variety in the model spec:
> summary(aov(yield ~ nitro*Variety + Error(Block/Variety),data[-5,]))
Error: Block
          Df Sum Sq Mean Sq F value  Pr(>F)   
nitro      1  14357   14357   33.53 0.00442 **
Residuals  4   1713     428                   
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Error: Block:Variety
          Df Sum Sq Mean Sq F value Pr(>F)
nitro      1     11    11.5   0.017  0.898
Variety    2   1809   904.7   1.365  0.304
Residuals  9   5964   662.7               

Error: Within
              Df Sum Sq Mean Sq F value   Pr(>F)    
nitro          3  19766    6589   36.88 4.49e-12 ***
nitro:Variety  6    333      55    0.31    0.928    
Residuals     44   7860     179                     
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

I suppose this means that the nitro term adjusts the analyses in the lower error
strata in the same way as the missing value indicator, but I really can't
explain the effect in detail.

-pd
> 
> When I do the same with lm() and subsequent anova(), and calculate 
> F-value for Variety by hand, the estimates are still quite robust.
> 
> Best regards,
> 
> Sam
> 
> 
> -- 
> Samuel Knapp
> 
> Lehrstuhl f?r Pflanzenern?hrung
> Technische Universit?t M?nchen
> (Chair of Plant Nutrition
> Technical University of Munich)
> 
> Emil-Ramann-Strasse 2
> D-85354 Freising
> 
> Tel. +49 8161 71-3578
> samuel.knapp at tum.de
> www.researchgate.net/profile/Samuel_Knapp
> 
> -- 
> Samuel Knapp
> 
> Lehrstuhl f?r Pflanzenern?hrung
> Technische Universit?t M?nchen
> (Chair of Plant Nutrition
> Technical University of Munich)
> 
> Emil-Ramann-Strasse 2
> D-85354 Freising
> 
> Tel. +49 8161 71-3578	
> samuel.knapp at tum.de
> www.researchgate.net/profile/Samuel_Knapp
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

Maybe Matching Threads

Search for more maybe matching threads

R help - Oct 2017 - Unbalanced data in split-plot analysis with aov()

[R] Unbalanced data in split-plot analysis with aov()

[R] Unbalanced data in split-plot analysis with aov()

Maybe Matching Threads