thr3ads.net - R help - [R] Help with three-way anova [Apr 2005]

If this information is useful, please help other people find it:
Share via:

michael watson (IAH-C)

2005-Apr-05 14:51 UTC

[R] Help with three-way anova

Hi

I have data from 12 subjects.  The measurement is log(expression) of a
particular gene and can be assumed to be normally distributed.  The 12
subjects are divided into the following groups:

Infected, Vaccinated, Lesions - 3 measurements
Infected, Vaccintaed, No Lesions - 2 measurements
Infected, Not Vaccinated, Lesions - 4 measurements
Uninfected, Not Vaccinated, No Lesions - 3 measurements

Although presence/absence of lesions could be considered to be a
phenotype, here I would like to use it as a factor.  This explains some
of the imbalance in the design (ie we could not control how many
subjects, if any, in each group would get lesions).

First impressions - the data looks like we would expect.  Gene
expression is lowest in the infected/not vaccinated group, then next
lowest is the infected/vaccinated group and finally comes the
uninfected/not vaccinated group.  So the working hypothesis is that gene
expression of the gene in question is lowered by infection, but that the
vaccine somehow alleviates this effect, but not as much as to the level
of a totally uninfected subject.  We *might* have access to data
relating to uninfected/vaccinated group, my pet scientist is digging for
this as we speak.

As for lesions, well none of the uninfected subjects have them, all of
the infected/not vaccinated subjects have them, and some of the
infected/vaccinated have them, some don't.  Again, this makes for a very
sensible hypothesis if we treat presence/absence of lesions as a
phenotype, but in addition to that I want to know if gene expression is
linked to presence/absence of lesion, but only one group of subjects has
both lesions and non-lesions within it.  Eye-balling this group,
presence/absence of lesions and gene expression are not linked.

So I have this as a data.frame in R, and I wanted to run an analysis of
variance.  I did:

aov <-  aov(IL.4 ~ Infected + Vaccinated + Lesions, data)
summary(aov)

And got:

            Df  Sum Sq Mean Sq F value    Pr(>F)    
Infected     1 29.8482 29.8482 66.7037 3.761e-05 ***
Vaccinated   1 13.5078 13.5078 30.1868 0.0005777 ***
Lesions      1  0.0393  0.0393  0.0878 0.7746009    
Residuals    8  3.5798  0.4475                      
---

This tells me that Infected and Vaccinated are highly significant,
whereas lesions are not.

So, what I want to know is:

1) Given my unbalanced experimental design, is it valid to use aov?
2) Have I used aov() correctly?  If so, how do I get access results for
interactions?
3) Is there some other, more relevant way of analysing this?  What I am
really interested in is the gene expression, and whether it can be shown
to be statistically related to one or more of the factors involved
(Infected, Vaccinated, Lesions) or interactions between those factors.

Many thanks in advance

Mick

Federico Calboli

2005-Apr-05 15:32 UTC

head link

[R] Help with three-way anova

On Tue, 2005-04-05 at 15:51 +0100, michael watson (IAH-C) wrote:
> So, what I want to know is:
> 
> 1) Given my unbalanced experimental design, is it valid to use aov?
I'd say no. Use lm() instead, save your analysis in an object and then
possibly use drop1() to check the analysis
> 2) Have I used aov() correctly?  If so, how do I get access results for
> interactions?
The use of aov() per se seems fine, but you did not put any interaction
in the model... for that use factor * factor.

HTH,

F

-- 
Federico C. F. Calboli
Department of Epidemiology and Public Health
Imperial College, St Mary's Campus
Norfolk Place, London W2 1PG

Tel  +44 (0)20 7594 1602     Fax (+44) 020 7594 3193

f.calboli [.a.t] imperial.ac.uk
f.calboli [.a.t] gmail.com

John Fox

2005-Apr-05 15:39 UTC

head link

[R] Help with three-way anova

Dear Michael,

For unbalanced data, you might want to take a look at the Anova()
function in the car package.

As well, it probably makes sense to read something about how linear
models are expressed in R. ?lm and ?formula both have some information
about model formulas; the Introduction to R manual that comes with R
has a chapter on statistical models; and books on R typically take up
the subject at greater length.

I hope this helps,
 John 

On Tue, 5 Apr 2005 15:51:46 +0100
 "michael watson \(IAH-C\)" <michael.watson at bbsrc.ac.uk>
wrote:> Hi
> 
> I have data from 12 subjects.  The measurement is log(expression) of
> a
> particular gene and can be assumed to be normally distributed.  The
> 12
> subjects are divided into the following groups:
> 
> Infected, Vaccinated, Lesions - 3 measurements
> Infected, Vaccintaed, No Lesions - 2 measurements
> Infected, Not Vaccinated, Lesions - 4 measurements
> Uninfected, Not Vaccinated, No Lesions - 3 measurements
> 
> Although presence/absence of lesions could be considered to be a
> phenotype, here I would like to use it as a factor.  This explains
> some
> of the imbalance in the design (ie we could not control how many
> subjects, if any, in each group would get lesions).
> 
> First impressions - the data looks like we would expect.  Gene
> expression is lowest in the infected/not vaccinated group, then next
> lowest is the infected/vaccinated group and finally comes the
> uninfected/not vaccinated group.  So the working hypothesis is that
> gene
> expression of the gene in question is lowered by infection, but that
> the
> vaccine somehow alleviates this effect, but not as much as to the
> level
> of a totally uninfected subject.  We *might* have access to data
> relating to uninfected/vaccinated group, my pet scientist is digging
> for
> this as we speak.
> 
> As for lesions, well none of the uninfected subjects have them, all
> of
> the infected/not vaccinated subjects have them, and some of the
> infected/vaccinated have them, some don't.  Again, this makes for a
> very
> sensible hypothesis if we treat presence/absence of lesions as a
> phenotype, but in addition to that I want to know if gene expression
> is
> linked to presence/absence of lesion, but only one group of subjects
> has
> both lesions and non-lesions within it.  Eye-balling this group,
> presence/absence of lesions and gene expression are not linked.
> 
> So I have this as a data.frame in R, and I wanted to run an analysis
> of
> variance.  I did:
> 
> aov <-  aov(IL.4 ~ Infected + Vaccinated + Lesions, data)
> summary(aov)
> 
> And got:
> 
>             Df  Sum Sq Mean Sq F value    Pr(>F)    
> Infected     1 29.8482 29.8482 66.7037 3.761e-05 ***
> Vaccinated   1 13.5078 13.5078 30.1868 0.0005777 ***
> Lesions      1  0.0393  0.0393  0.0878 0.7746009    
> Residuals    8  3.5798  0.4475                      
> ---
> 
> This tells me that Infected and Vaccinated are highly significant,
> whereas lesions are not.
> 
> So, what I want to know is:
> 
> 1) Given my unbalanced experimental design, is it valid to use aov?
> 2) Have I used aov() correctly?  If so, how do I get access results
> for
> interactions?
> 3) Is there some other, more relevant way of analysing this?  What I
> am
> really interested in is the gene expression, and whether it can be
> shown
> to be statistically related to one or more of the factors involved
> (Infected, Vaccinated, Lesions) or interactions between those
> factors.
> 
> Many thanks in advance
> 
> Mick
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox/

michael watson (IAH-C)

2005-Apr-06 08:11 UTC

head link

[R] Help with three-way anova

OK, so I tried using lm() instead of aov() and they give similar
results:

My.aov <-  aov(IL.4 ~ Infected + Vaccinated + Lesions, data)
My.lm  <-   lm(IL.4 ~ Infected + Vaccinated + Lesions, data)

If I do summary(My.lm) and summary(My.aov), I get similar results, but
not identical.
If I do anova(My.aov) and anova(My.lm) I get identical results.  I guess
that's to be expected though.

Regarding the results of summary(My.lm), basically Intercept, Infected
and Vaccinated are all significant at p<=0.05.  I presume the
signifcance of the Intercept is that it is significantly different to
zero?  How do I interpret that?

Many thanks
Mick

-----Original Message-----
From: Federico Calboli [mailto:f.calboli at imperial.ac.uk] 
Sent: 05 April 2005 16:33
To: michael watson (IAH-C)
Cc: r-help
Subject: Re: [R] Help with three-way anova

On Tue, 2005-04-05 at 15:51 +0100, michael watson (IAH-C) wrote:
> So, what I want to know is:
> 
> 1) Given my unbalanced experimental design, is it valid to use aov?
I'd say no. Use lm() instead, save your analysis in an object and then
possibly use drop1() to check the analysis
> 2) Have I used aov() correctly?  If so, how do I get access results 
> for interactions?
The use of aov() per se seems fine, but you did not put any interaction
in the model... for that use factor * factor.

HTH,

F

-- 
Federico C. F. Calboli
Department of Epidemiology and Public Health
Imperial College, St Mary's Campus
Norfolk Place, London W2 1PG

Tel  +44 (0)20 7594 1602     Fax (+44) 020 7594 3193

f.calboli [.a.t] imperial.ac.uk
f.calboli [.a.t] gmail.com

michael watson (IAH-C)

2005-Apr-06 09:30 UTC

head link

[R] Help with three-way anova

OK, now I am lost.

I went from using aov(), which I fully understand, to lm() which I
probably don't.  I didn't specify a contrasts matrix in my call to
lm()....

Basically I want to find out if Infected/Uninfected affects the level of
IL.4, and if Vaccinated/Unvaccinated affects the level of IL.4,
obviously trying to separate the effects of Infection from the effects
of Vaccination.

The documentation for specifying contrasts to lm() is a little
convoluted, sending me to the help file for model.matrix.default, and
the help there doesn't really give me much to go on when trying to
figure out what contrasts matrix I need to use...

Many thanks for your help

Mick

-----Original Message-----
From: Federico Calboli [mailto:f.calboli at imperial.ac.uk] 
Sent: 06 April 2005 10:15
To: michael watson (IAH-C)
Cc: r-help
Subject: RE: [R] Help with three-way anova


On Wed, 2005-04-06 at 09:11 +0100, michael watson (IAH-C)
wrote:> OK, so I tried using lm() instead of aov() and they give similar
> results:
> 
> My.aov <-  aov(IL.4 ~ Infected + Vaccinated + Lesions, data)
> My.lm  <-   lm(IL.4 ~ Infected + Vaccinated + Lesions, data)
Incidentally, if you want interaction terms you need 

lm(IL.4 ~ Infected * Vaccinated * Lesions, data)

for all the possible interactions in the model (BUT you need enough
degrees of freedom from the start to be able to do
this).> 
> If I do summary(My.lm) and summary(My.aov), I get similar results, but
> not identical. If I do anova(My.aov) and anova(My.lm) I get identical 
> results.  I guess that's to be expected though.
> 
> Regarding the results of summary(My.lm), basically Intercept, Infected
> and Vaccinated are all significant at p<=0.05.  I presume the 
> signifcance of the Intercept is that it is significantly different to 
> zero?  How do I interpret that?
I guess it's all due to the contrast matrix you used. Check with
contrasts() the term(s) in the datafile you use as independent
variables, and change the contrast matrix as you see fit.

HTH,

F
-- 
Federico C. F. Calboli
Department of Epidemiology and Public Health
Imperial College, St Mary's Campus
Norfolk Place, London W2 1PG

Tel  +44 (0)20 7594 1602     Fax (+44) 020 7594 3193

f.calboli [.a.t] imperial.ac.uk
f.calboli [.a.t] gmail.com

michael watson (IAH-C)

2005-Apr-06 12:18 UTC

head link

[R] Help with three-way anova

Hi John

Thanks for your help, that was a very clear answer.  It looks as though,
due to my design, the best way forward is:
> contrasts(il4$Infected)   [,1]
I    -1
UI    1> contrasts(il4$Vaccinated)   [,1]
UV   -1
V     1> summary(lm(IL.4 ~ Infected * Vaccinated, il4))
Thanks
Mick

-----Original Message-----
From: John Fox [mailto:jfox at mcmaster.ca] 
Sent: 06 April 2005 12:52
To: michael watson (IAH-C)
Cc: 'r-help'; f.calboli at imperial.ac.uk
Subject: RE: [R] Help with three-way anova


Dear Mick,

For a three-way ANOVA, the difference between aov() and lm() is mostly
in the print and summary methods -- aov() calls lm() but in its summary
prints an ANOVA table rather than coefficient estimates, etc. You can
get the same ANOVA table from the object returned by lm via the anova()
function. The problem, however, is that for unbalanced data you'll get
sequential sums of squares which likely don't test hypotheses of
interest to you.

If you didn't explicitly set the contrast coding, then the out-of-box
default in R [options("contrasts")] is to use treatment.contr(), which
produces dummy-coded (0/1) contrasts. In this case, the "intercept"
represents the fitted value when all of the factors are at their
baseline levels, and it's probably entirely uninteresting to test
whether it is 0.

More generally, however, it seems unreasonable to try to learn how to
fit and interpret linear models in R from the help files. There's a
brief treatment in the Introduction to R manual that's distributed with
R, and many other more detailed treatments -- see
http://www.r-project.org/other-docs.html.

Regards,
 John

--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
-------------------------------- 
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of 
> michael watson (IAH-C)
> Sent: Wednesday, April 06, 2005 4:31 AM
> To: f.calboli at imperial.ac.uk
> Cc: r-help
> Subject: RE: [R] Help with three-way anova
> 
> OK, now I am lost.
> 
> I went from using aov(), which I fully understand, to lm()
> which I probably don't.  I didn't specify a contrasts matrix 
> in my call to lm()....
> 
> Basically I want to find out if Infected/Uninfected affects
> the level of IL.4, and if Vaccinated/Unvaccinated affects the 
> level of IL.4, obviously trying to separate the effects of 
> Infection from the effects of Vaccination.
> 
> The documentation for specifying contrasts to lm() is a
> little convoluted, sending me to the help file for 
> model.matrix.default, and the help there doesn't really give 
> me much to go on when trying to figure out what contrasts 
> matrix I need to use...
> 
> Many thanks for your help
> 
> Mick
> 
> -----Original Message-----
> From: Federico Calboli [mailto:f.calboli at imperial.ac.uk]
> Sent: 06 April 2005 10:15
> To: michael watson (IAH-C)
> Cc: r-help
> Subject: RE: [R] Help with three-way anova
> 
> 
> On Wed, 2005-04-06 at 09:11 +0100, michael watson (IAH-C) wrote:
> > OK, so I tried using lm() instead of aov() and they give similar
> > results:
> > 
> > My.aov <-  aov(IL.4 ~ Infected + Vaccinated + Lesions, data)
> > My.lm  <-   lm(IL.4 ~ Infected + Vaccinated + Lesions, data)
> 
> Incidentally, if you want interaction terms you need
> 
> lm(IL.4 ~ Infected * Vaccinated * Lesions, data)
> 
> for all the possible interactions in the model (BUT you need enough 
> degrees of freedom from the start to be able to do this).
> > 
> > If I do summary(My.lm) and summary(My.aov), I get similar
> results, but
> 
> > not identical. If I do anova(My.aov) and anova(My.lm) I get
> identical
> > results.  I guess that's to be expected though.
> > 
> > Regarding the results of summary(My.lm), basically
> Intercept, Infected
> 
> > and Vaccinated are all significant at p<=0.05.  I presume the
> > signifcance of the Intercept is that it is significantly 
> different to
> > zero?  How do I interpret that?
> 
> I guess it's all due to the contrast matrix you used. Check with
> contrasts() the term(s) in the datafile you use as independent 
> variables, and change the contrast matrix as you see fit.
> 
> HTH,
> 
> F
> --
> Federico C. F. Calboli
> Department of Epidemiology and Public Health
> Imperial College, St Mary's Campus
> Norfolk Place, London W2 1PG
> 
> Tel  +44 (0)20 7594 1602     Fax (+44) 020 7594 3193
> 
> f.calboli [.a.t] imperial.ac.uk
> f.calboli [.a.t] gmail.com
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html

Maybe Matching Threads

Search for more reasonably related threads

R help - Apr 2005 - Help with three-way anova

[R] Help with three-way anova

[R] Help with three-way anova

[R] Help with three-way anova

[R] Help with three-way anova

[R] Help with three-way anova

[R] Help with three-way anova

Maybe Matching Threads