thr3ads.net - R help - [R] help with an unbalanced split plot [Oct 2010]

If this information is useful, please help other people find it:
Share via:

Eugenio Larios

2010-Oct-14 22:58 UTC

[R] help with an unbalanced split plot

Hi Everyone,

I am trying to analyze a split plot experiment in the field that was
arranged like this:
I am trying to measure the fitness consequences of seed size.

Factors (X):
*Seed size*: a continuous variable, normally distributed.
*Water*: Categorical Levels- wet and dry.
*Density*: Categorical Levels- high, medium and solo
*Plot*: Counts from 1 to 20
The *response variable *(Y) was the number of seeds produced at the end of
the season.

The experiment started 15 days after plants germinated in the field.
20 plots were chosen where there was high enough density so I could
manipulate it. In an area where artificial irrigation was possible for the
wet treatment, dry treatment was natural precip.
Water was blocked so 10 plots were wet and the other 10 were dry. Randomly
assigned.
Within those 20 plots 6 focal plants were chosen and randomly assigned the
three densities. (split plot design)
I did not control for seed size since it is continuous and normally
distributed, hoping that with 120 plants total (6 in each 20 blocks) I could
get all kind of sizes for every treatment. It worked ok.

I have been trying to analyze this with lme (library NLME). I am not quiet
sure which are my random variables. models I have used are:

m<-lme(log(fitness)~seedsize*density,random=~1|plot,data=dataset)
m<-lme(log(fitness)~seedsize+density+water,random=~1|plot,data=dataset)

I have also tried to include plot and water as random effects:

m<-lme(log(fitness)~seedsize+density+water,random=~1|plot/water,data=dataset)

I am actually not sure if I am using the right random variables here. Also
for some reason, it won't let me include seedsize*density*water triple
interaction
help!
thanks

-- 
Eugenio Larios
PhD Student
University of Arizona.
Ecology & Evolutionary Biology.
(520) 481-2263
elariosc@email.arizona.edu

	[[alternative HTML version deleted]]

Dennis Murphy

2010-Oct-15 00:34 UTC

head link

[R] help with an unbalanced split plot

Hi:

On Thu, Oct 14, 2010 at 3:58 PM, Eugenio Larios
<elariosc@email.arizona.edu>wrote:
> Hi Everyone,
>
> I am trying to analyze a split plot experiment in the field that was
> arranged like this:
> I am trying to measure the fitness consequences of seed size.
>
> Factors (X):
> *Seed size*: a continuous variable, normally distributed.
> *Water*: Categorical Levels- wet and dry.
> *Density*: Categorical Levels- high, medium and solo
> *Plot*: Counts from 1 to 20
> The *response variable *(Y) was the number of seeds produced at the end of
> the season.
>
> The experiment started 15 days after plants germinated in the field.
> 20 plots were chosen where there was high enough density so I could
> manipulate it. In an area where artificial irrigation was possible for the
> wet treatment, dry treatment was natural precip.
> Water was blocked so 10 plots were wet and the other 10 were dry. Randomly
> assigned.
> Within those 20 plots 6 focal plants were chosen and randomly assigned the
> three densities. (split plot design)
> I did not control for seed size since it is continuous and normally
> distributed, hoping that with 120 plants total (6 in each 20 blocks) I
> could
> get all kind of sizes for every treatment. It worked ok.
>
>From the description, it appears you have the following:    * water is a whole-plot treatment, each level assigned to 10 plots
    * seed size is a plot-level covariate
    * whole plot units are the plots

At this level, the ANOVA table is

Water                    1
Seed size              1
Water x seed size 1
Whole plot error   16   [plots]

The split plot treatment is density, and after its main effect is accounted
for, it is crossed with every term in the whole-plot ANOVA:

Density                                       2
Density * Water                          2
Density * seed size                    2
Density * Water * seed size       2
Residual                                    92           [subplots]

Total df  = 119

The ANOVA exercise is useful for understanding the structure of the
split-plot design - it is not exactly what lme() will fit (especially the
df), since lme() is fitting the model via likelihood rather than least
squares.

Your full lme model, including the test of unequal slopes in the two water
levels, should be

m <- lme(log(fitness) ~ seedsize * water * density, random = ~1|plot,
data=dataset)

Without the unequal slopes term (i.e., a parallel slopes model), it should
be

m2 <- lme(log(fitness) ~ (seedsize + water) * density, random = ~1 | plot,
data = dataset)

The specification of the first two terms on the RHS of the model formula is
associated with the whole-plot structure of your design.

I have been trying to analyze this with lme (library NLME). I am not
quiet> sure which are my random variables. models I have used are:
>
> m<-lme(log(fitness)~seedsize*density,random=~1|plot,data=dataset)
> m<-lme(log(fitness)~seedsize+density+water,random=~1|plot,data=dataset)
>
> I have also tried to include plot and water as random effects:
>
>
>
m<-lme(log(fitness)~seedsize+density+water,random=~1|plot/water,data=dataset)
>
> I am actually not sure if I am using the right random variables here. Also
> for some reason, it won't let me include seedsize*density*water triple
> interaction
>
You mentioned imbalance in your mail header - how imbalanced are you talking
about? The structure of the imbalance could have some impact on which
effects are or are not estimable, depending on its severity.


HTH,
Dennis

> help!
> thanks
>
> --
> Eugenio Larios
> PhD Student
> University of Arizona.
> Ecology & Evolutionary Biology.
> (520) 481-2263
> elariosc@email.arizona.edu
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Eugenio Larios

2010-Oct-15 16:35 UTC

head link

[R] help with an unbalanced split plot

Hi Dennis,

The first thing I did with my data was to explore it with 6 graphs
(wet-high, med, and solo-; dry-high, med, and solo-) and gave me very
interesting patterns: seed size in wet treatments is either negatively
correlated (high and medium densities) or flat (solo). But dry treatments
are all positively correlated! There is a very interesting switch there.

I also figured out why I can't do three way interactions. I explored the
structure of my data with str(mydata) and it shows that water treatment has
three levels when it should have just two. Then I went back to the excel
sheet, tried to sort the data by water treatment and I discover a single
data point from the wet treatment sticking out by itself. That is why R
reads three levels and since it is only one point, there cannot be any stats
of course.

thanks
E

On Thu, Oct 14, 2010 at 9:27 PM, Dennis Murphy <djmuser@gmail.com> wrote:
> Hi:
>
> On Thu, Oct 14, 2010 at 7:50 PM, Eugenio Larios <
> elariosc@email.arizona.edu> wrote:
>
>> Hi Dennis,
>>
>> thank you very much for your help, I really appreciate it.
>>
>> I forgot to say about the imbalance, yes. I only explained the original
>> set up, sorry. Let me explain.
>>
>> It is because in the process of the experiment which lasted 3 months I
>> lost individuals within the plots and I actually ended up losing 2
whole
>> plots (one dry and one wet) and some other individuals in other plots.
>>
>
> That still leaves you balanced at the plot level :)  Fortunately, you have
> enough replication. If you have missing subplots within the remaining
plots,
> that would be another source of imbalance at the subplot level, but you
> should have enough subplots to be able to estimate all of the interactions
> unless an entire treatment in one set of plots was missing.
>
> It's worth graphing your data to anticipate which effects/interactions
> should be significant; graphs involving the spatial configuration of the
> plots and subplots would also be worthwhile.
>
>>
>> My study system has this special feature that allows me to track
parental
>> seed sizes in plants germinated in the field, a persistent ring that
stays
>> attached to the root even when the plant has germinated, so some of the
>> plants I lost did not have this ring anymore. It happens sometimes but
most
>> of the time they have it. Also, some plants disappeared probably due to
>> predation, etc That made my experiment imbalanced.
>>
>
> That's common. No big deal.
>
>>
>> Do you think that will change the analysis? Also, do you think I should
>> use least squares ANOVA  (perhaps type III due to the imbalance?)
instead of
>> LMM? What about the random effects that my blocking has created?
>>
>
> Actually, with unbalanced data it's to your advantage to use lme() over
> ANOVA. Just don't place too much importance on the p-values of tests;
even
> the degrees of freedom are debatable. With unbalanced data, it's hard
to
> predict what the sampling distribution of a given statistic will actually
> be, so the p-values aren't as trustworthy.
>
> You mentioned that you couldn't fit a three-way interaction; given your
> data configuration, that shouldn't happen.
>
> (1) Get two-way tables of water * density, one for the counts and one for
> the averages, something like
>
> with(mydata, table(water, density))
> aggregate(log(fitness) ~ water + density, data = mydata, FUN = mean, na.rm
> = TRUE)
>
> In the first table, unless you have very low frequencies in some category,
> your data 'density' should be enough to estimate all the main
effects and
> interactions of interest. The second table is to check that you don't
have
> NaNs or missing cells, etc.
>
>>
>> I am new to R-help website so I wrote you this message to your email
but I
>> would like to post it on the R website, do you know how?
>>
>
> Wag answer: I hope so, since I managed to view and respond to your message
> :)
>
> More seriously, in gmail, the window that opens to produce replies has an
> option 'Reply to all'. I don't know if your e-mail client at
UofA has that
> feature, but if not, you could always cc R-help and put the e-mail address
> in by hand if necessary. Most mailers are smart enough to auto-complete an
> address as you type in the name, so you could see if that applies on your
> system.
>
> I keep a separate account for R-help because of the traffic volume - if you
> intend to subscribe to the list, you might want to do the same. It's
not
> unusual for 75-100 e-mails a weekday to enter your inbox...
>
>>
>> Thanks again!
>>
>> Eugenio
>>
>>
>> On Thu, Oct 14, 2010 at 5:34 PM, Dennis Murphy
<djmuser@gmail.com> wrote:
>>
>>> Hi:
>>>
>>> On Thu, Oct 14, 2010 at 3:58 PM, Eugenio Larios <
>>> elariosc@email.arizona.edu> wrote:
>>>
>>>> Hi Everyone,
>>>>
>>>> I am trying to analyze a split plot experiment in the field
that was
>>>> arranged like this:
>>>> I am trying to measure the fitness consequences of seed size.
>>>>
>>>> Factors (X):
>>>> *Seed size*: a continuous variable, normally distributed.
>>>> *Water*: Categorical Levels- wet and dry.
>>>> *Density*: Categorical Levels- high, medium and solo
>>>> *Plot*: Counts from 1 to 20
>>>> The *response variable *(Y) was the number of seeds produced at
the end
>>>> of
>>>> the season.
>>>>
>>>> The experiment started 15 days after plants germinated in the
field.
>>>> 20 plots were chosen where there was high enough density so I
could
>>>> manipulate it. In an area where artificial irrigation was
possible for
>>>> the
>>>> wet treatment, dry treatment was natural precip.
>>>> Water was blocked so 10 plots were wet and the other 10 were
dry.
>>>> Randomly
>>>> assigned.
>>>> Within those 20 plots 6 focal plants were chosen and randomly
assigned
>>>> the
>>>> three densities. (split plot design)
>>>> I did not control for seed size since it is continuous and
normally
>>>> distributed, hoping that with 120 plants total (6 in each 20
blocks) I
>>>> could
>>>> get all kind of sizes for every treatment. It worked ok.
>>>>
>>>
>>> From the description, it appears you have the following:
>>>     * water is a whole-plot treatment, each level assigned to 10
plots
>>>     * seed size is a plot-level covariate
>>>     * whole plot units are the plots
>>>
>>> At this level, the ANOVA table is
>>>
>>> Water                    1
>>> Seed size              1
>>> Water x seed size 1
>>> Whole plot error   16   [plots]
>>>
>>> The split plot treatment is density, and after its main effect is
>>> accounted for, it is crossed with every term in the whole-plot
ANOVA:
>>>
>>> Density                                       2
>>> Density * Water                          2
>>> Density * seed size                    2
>>> Density * Water * seed size       2
>>> Residual                                    92           [subplots]
>>>
>>> Total df  = 119
>>>
>>> The ANOVA exercise is useful for understanding the structure of the
>>> split-plot design - it is not exactly what lme() will fit
(especially the
>>> df), since lme() is fitting the model via likelihood rather than
least
>>> squares.
>>>
>>> Your full lme model, including the test of unequal slopes in the
two
>>> water levels, should be
>>>
>>> m <- lme(log(fitness) ~ seedsize * water * density, random =
~1|plot,
>>> data=dataset)
>>>
>>> Without the unequal slopes term (i.e., a parallel slopes model), it
>>> should be
>>>
>>> m2 <- lme(log(fitness) ~ (seedsize + water) * density, random =
~1 |
>>> plot, data = dataset)
>>>
>>> The specification of the first two terms on the RHS of the model
formula
>>> is associated with the whole-plot structure of your design.
>>>
>>> I have been trying to analyze this with lme (library NLME). I am
not
>>>> quiet
>>>> sure which are my random variables. models I have used are:
>>>>
>>>>
m<-lme(log(fitness)~seedsize*density,random=~1|plot,data=dataset)
>>>>
m<-lme(log(fitness)~seedsize+density+water,random=~1|plot,data=dataset)
>>>>
>>>> I have also tried to include plot and water as random effects:
>>>>
>>>>
>>>>
m<-lme(log(fitness)~seedsize+density+water,random=~1|plot/water,data=dataset)
>>>>
>>>> I am actually not sure if I am using the right random variables
here.
>>>> Also
>>>> for some reason, it won't let me include
seedsize*density*water triple
>>>> interaction
>>>>
>>>
>>> You mentioned imbalance in your mail header - how imbalanced are
you
>>> talking about? The structure of the imbalance could have some
impact on
>>> which effects are or are not estimable, depending on its severity.
>>>
>>>
>>> HTH,
>>> Dennis
>>>
>>>
>>>> help!
>>>> thanks
>>>>
>>>> --
>>>> Eugenio Larios
>>>> PhD Student
>>>> University of Arizona.
>>>> Ecology & Evolutionary Biology.
>>>> (520) 481-2263
>>>> elariosc@email.arizona.edu
>>>>
>>>>        [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help@r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible
code.
>>>>
>>>
>>>
>>
>>
>> --
>> Eugenio Larios
>> PhD Student
>> University of Arizona.
>> Ecology & Evolutionary Biology.
>> (520) 481-2263
>> elariosc@email.arizona.edu
>>
>
>

-- 
Eugenio Larios
PhD Student
University of Arizona.
Ecology & Evolutionary Biology.
(520) 481-2263
elariosc@email.arizona.edu

	[[alternative HTML version deleted]]

Maybe Matching Threads

Search for more apparently analagous threads

R help - Oct 2010 - help with an unbalanced split plot

[R] help with an unbalanced split plot

[R] help with an unbalanced split plot

[R] help with an unbalanced split plot

Maybe Matching Threads