thr3ads.net - R help - [R] Strange results with anova.glm() [Oct 2007]

If this information is useful, please help other people find it:
Share via:

Gustaf Granath

2007-Oct-29 16:46 UTC

[R] Strange results with anova.glm()

Hi,
I have been struggling with this problem for some time now. Internet, 
books haven't been able to help me.

## I have factorial design with counts (fruits) as response variable.
 > str(stubb)
'data.frame': 334 obs. of 5 variables:
$ id : int 6 23 24 25 26 27 28 29 31 34 ...
$ infl.treat : Factor w/ 2 levels "0","1": 2 2 2 2 1 1 1 2 1
1 ...
$ def.treat : Factor w/ 4 levels
"1","2","3","4": 2 3 1 3 3 3 2 2 3 1 ...
$ initial.size : num 854 158 256 332 396 ...
$ tot.fruit : int 45 8 19 8 10 7 4 7 2 6 ...

## I fit a glm model with initial.size as a covariate, and 
family=quasipoisson. I want to test if infl.treat or def.treat had an 
effect on tot.fruits. (i.e if different cutting treatments of the plants 
had an effect on total ## number of fruits).
## After model testing etc, I ended up with the following model:

 > model<-glm(tot.fruit ~ infl.treat + def.treat + initial.size, 
quasipoisson)
 > summary(model)
Call: glm(formula = tot.fruit ~ infl.treat + def.treat + initial.size,
family = quasipoisson)

Deviance Residuals:
Min 1Q Median 3Q Max
-6.1541 -2.1872 -0.7045 0.9926 7.5629

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.8788185 0.1093306 17.185 < 2e-16 ***
infl.treat1 -0.2456909 0.0924677 -2.657 0.00827 **
def.treat2 -0.1478382 0.1277139 -1.158 0.24788
def.treat3 -0.0780282 0.1207796 -0.646 0.51871
def.treat4 -0.2581576 0.1221538 -2.113 0.03532 *
initial.size 0.0013834 0.0000993 13.931 < 2e-16 ***
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

(Dispersion parameter for quasipoisson family taken to be 6.845172)

Null deviance: 3261.5 on 333 degrees of freedom
Residual deviance: 2131.6 on 328 degrees of freedom
AIC: NA

Number of Fisher Scoring iterations: 5

## It seems like both infl.treat and def.treat have an effect. However, 
when I do an ANOVA table. It shows no effect at all of infl.treat!!!! I 
find that very peculiar since I only have two levels of that factor. ## 
Shouldn't it be a significant effect also in the ANOVA table?? In fact. 
It is not even close to significant - 0.886!!
 > anova(model, test="F")
Analysis of Deviance Table
Model: quasipoisson, link: log
Response: tot.fruit
Terms added sequentially (first to last)

Df Deviance Resid. Df Resid. Dev F Pr(>F)
NULL 333 3261.5
infl.treat 1 0.1 332 3261.4 0.0205 0.88632
def.treat 3 58.3 329 3203.1 2.8376 0.03814 *
initial.size 1 1071.4 328 2131.6 156.5253 < 2e-16 ***

## If I do a ML test with a model with infl.treat and one without. The 
infl.treat is also significant and should be in the model!!
 > anova(model,model2, test="F")
Analysis of Deviance Table

Model 1: tot.fruit ~ infl.treat + def.treat + initial.size
Model 2: tot.fruit ~ def.treat + initial.size
Resid. Df Resid. Dev Df Deviance F Pr(>F)
1 328 2131.65
2 329 2180.40 -1 -48.75 7.1221 0.007992 **
## What is going on here?? I thought the ANOVA table would give me a 
quite similar result because I only have two levels (of the infl.treat 
factor) and no interactions in the model.
## I'm afraid I have missed something trivial though so please, be gentle ;)

Cheers,

Gustaf Granath

-- 
Gustaf Granath (PhD student)
Dept of Plant Ecology
Evolutionary Biology Centre (EBC)
Uppsala University

John Fox

2007-Oct-29 19:25 UTC

head link

[R] Strange results with anova.glm()

Dear Gustaf,

I can think of two reasons why the two tests can disagree. 

First, the t-test from the summary() output is based on the covariance
matrix of the coefficients, while the F-test in the anova() output is
based of fitting alternative models. The two are not in general the
same. 

Second, and apparently more important here, the test in summary() is
for each coefficient in the full model fit to the data, while the tests
in anova() are sequential tests; thus infl.treat is tested ignoring
everything else (but the intercept). When you use anova() to contrast
the full model with one deleting infl.treat you test the same
hypothesis as in the summary() output, and thus get similar (though not
identical) results.

You might take a look at the Anova() function in the car package which,
in a model of this structure, will test each term "after" the others.

I hope this helps,
 John

On Mon, 29 Oct 2007 17:46:08 +0100
 Gustaf Granath <Gustaf.Granath at ebc.uu.se>
wrote:> Hi,
> I have been struggling with this problem for some time now. Internet,
> 
> books haven't been able to help me.
> 
> ## I have factorial design with counts (fruits) as response variable.
>  > str(stubb)
> 'data.frame': 334 obs. of 5 variables:
> $ id : int 6 23 24 25 26 27 28 29 31 34 ...
> $ infl.treat : Factor w/ 2 levels "0","1": 2 2 2 2 1 1
1 2 1 1 ...
> $ def.treat : Factor w/ 4 levels
"1","2","3","4": 2 3 1 3 3 3 2 2 3 1
> ...
> $ initial.size : num 854 158 256 332 396 ...
> $ tot.fruit : int 45 8 19 8 10 7 4 7 2 6 ...
> 
> ## I fit a glm model with initial.size as a covariate, and 
> family=quasipoisson. I want to test if infl.treat or def.treat had an
> 
> effect on tot.fruits. (i.e if different cutting treatments of the
> plants 
> had an effect on total ## number of fruits).
> ## After model testing etc, I ended up with the following model:
> 
>  > model<-glm(tot.fruit ~ infl.treat + def.treat + initial.size, 
> quasipoisson)
>  > summary(model)
> Call: glm(formula = tot.fruit ~ infl.treat + def.treat +
> initial.size,
> family = quasipoisson)
> 
> Deviance Residuals:
> Min 1Q Median 3Q Max
> -6.1541 -2.1872 -0.7045 0.9926 7.5629
> 
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 1.8788185 0.1093306 17.185 < 2e-16 ***
> infl.treat1 -0.2456909 0.0924677 -2.657 0.00827 **
> def.treat2 -0.1478382 0.1277139 -1.158 0.24788
> def.treat3 -0.0780282 0.1207796 -0.646 0.51871
> def.treat4 -0.2581576 0.1221538 -2.113 0.03532 *
> initial.size 0.0013834 0.0000993 13.931 < 2e-16 ***
> ---
> Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
> 
> (Dispersion parameter for quasipoisson family taken to be 6.845172)
> 
> Null deviance: 3261.5 on 333 degrees of freedom
> Residual deviance: 2131.6 on 328 degrees of freedom
> AIC: NA
> 
> Number of Fisher Scoring iterations: 5
> 
> ## It seems like both infl.treat and def.treat have an effect.
> However, 
> when I do an ANOVA table. It shows no effect at all of infl.treat!!!!
> I 
> find that very peculiar since I only have two levels of that factor.
> ## 
> Shouldn't it be a significant effect also in the ANOVA table?? In
> fact. 
> It is not even close to significant - 0.886!!
>  > anova(model, test="F")
> Analysis of Deviance Table
> Model: quasipoisson, link: log
> Response: tot.fruit
> Terms added sequentially (first to last)
> 
> Df Deviance Resid. Df Resid. Dev F Pr(>F)
> NULL 333 3261.5
> infl.treat 1 0.1 332 3261.4 0.0205 0.88632
> def.treat 3 58.3 329 3203.1 2.8376 0.03814 *
> initial.size 1 1071.4 328 2131.6 156.5253 < 2e-16 ***
> 
> ## If I do a ML test with a model with infl.treat and one without.
> The 
> infl.treat is also significant and should be in the model!!
>  > anova(model,model2, test="F")
> Analysis of Deviance Table
> 
> Model 1: tot.fruit ~ infl.treat + def.treat + initial.size
> Model 2: tot.fruit ~ def.treat + initial.size
> Resid. Df Resid. Dev Df Deviance F Pr(>F)
> 1 328 2131.65
> 2 329 2180.40 -1 -48.75 7.1221 0.007992 **
> ## What is going on here?? I thought the ANOVA table would give me a 
> quite similar result because I only have two levels (of the
> infl.treat 
> factor) and no interactions in the model.
> ## I'm afraid I have missed something trivial though so please, be
> gentle ;)
> 
> Cheers,
> 
> Gustaf Granath
> 
> -- 
> Gustaf Granath (PhD student)
> Dept of Plant Ecology
> Evolutionary Biology Centre (EBC)
> Uppsala University
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--------------------------------
John Fox, Professor
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox/

Gustaf Granath

2008-Oct-22 12:05 UTC

head link

[R] mgcv pack- gam() - question about the by argument

R fellows,
I hope my questions are not too much about statistical methodology.

I do lab experiments where y is measured at different x, using 
(plant)material originating from different sites (2-4). The response is 
non-linear in a way that I have turned to gam() (mgcv package) to 
investigate the shape of the response (maybe there are better ways). I 
want to test if the shape/pattern of the response differ between sites 
(i.e. should each site have its own curve). I was thinking (after I read 
Wood's reference manual) to use the by= argument and fit one model 
including the factor site and one without site, and then compare these 
models with AIC (or can the anova.gam() be trusted here?). Is this a 
correct interpretation and use of the by= argument??

I have also measured the same sample twice (weeks between) . What about 
including the factor Time using the by= argument to test if the curve 
differ between the two measurements (in a separate analysis looking each 
site individually)? I guess this will ignore the correlation within 
samples but maybe it can be used to get a rough idea?

Here is an example ( I know that sample size is low for a gam)
#Create data
y<-c(0.283,0.185,0.134,0.727,0.087,0.568,0.569,0.156,0.189,0.738,0.526,0.144,0.532,0.128,0.162,0.772,0.789,0.798,0.756,0.738,0.768,0.778,0.791,0.776,0.808,
0.756,0.742,0.744,0.758,0.774,0.768,0.780,0.782,0.789,0.756,0.805,0.800,0.804,0.790,0.781,0.757,0.780,0.744,0.783,0.774)
x<-c(45.1,43,48.5,40,47.1,42.8,40.3,45.3,45.6,37.5,38,46.1,40.5,40.2,46.5,27.3,24.9,27.7,24.8,26.6,25.1,23.5,26.3,26.6,27.6,
26.3,24.4,25.8,26.7,28,36,33.3,35.2,35.1,33.9,34.7,34.4,33.4,34.1,34.3,31,33.2,33.4,35.7,32.6)
Site<-rep(1:3,each=5,3)
mydata<-data.frame(y=y,x=x,Site=as.factor(Site))
library(mgcv)
#Model without site
mod.no.site<-gam(y~s(x),data=mydata)
#Model including the factor Site. id=1 to get the same smothing 
parameter at each factor level.
mod.with.site<-gam(y~Site+s(x,by=Site,id=1),data=mydata)
#AIC for the two models
AIC(mod.no.site,mod.with.site)

Thanks,

Gustaf Granath (phd student)

Gustaf Granath

2008-Oct-28 12:48 UTC

head link

[R] mgcv pack- gam() - question about the by argument

Thanks!! The R-community is just amazing.

Two follow-up questions:

1) IF, I had sufficient amount of data. How would that interaction term 
(Site x Time) be specified? Maybe like this? (adding stuff on the 
previous example)
And I should look at the "Parametric coefficients:" for significant 
interaction terms, right?

#Add two more measurements
time2<-data.frame(y=mydata$y*rnorm(1,mean=1,sd=0.2),x=x,Site=Site)
time3<-data.frame(y=mydata$y*rnorm(1,mean=1,sd=0.2),x=x,Site=Site)
mydata.rep<-rbind(mydata,time2,time3)
mydata.rep$Time<-as.factor(rep(1:3,each=45))
#Model with Site x Time - correct? Or should the Time factor be 
incorporated in the 'by' argument also (if possible?)??
mod.with.sitextime<-gam(y~Site*Time+s(x,by=Site,id=1),data=mydata.rep)
summary(mod.with.sitextime)

2) The Parametric coefficients estimates: are they like intercepts or 
what do they represent here (when we have factor levels involved)? Does 
a significant difference translate into different means for the factor 
levels compared?

Cheers,

Gustaf

Simon Wood wrote:> Gustaf,
>
> Your `by' variable use looks quite correct to me. The time issue is
slightly
> tricky with relatively few data. You could set up a factor variable for the
> site:time interaction (i.e. one level for each combination of site and
time)
> but that would be pushing it a bit for this number of data. 
>
> Simon
>
> On Wednesday 22 October 2008 13:05, Gustaf Granath wrote:
>   
>> R fellows,
>> I hope my questions are not too much about statistical methodology.
>>
>> I do lab experiments where y is measured at different x, using
>> (plant)material originating from different sites (2-4). The response is
>> non-linear in a way that I have turned to gam() (mgcv package) to
>> investigate the shape of the response (maybe there are better ways). I
>> want to test if the shape/pattern of the response differ between sites
>> (i.e. should each site have its own curve). I was thinking (after I
read
>> Wood's reference manual) to use the by= argument and fit one model
>> including the factor site and one without site, and then compare these
>> models with AIC (or can the anova.gam() be trusted here?). Is this a
>> correct interpretation and use of the by= argument??
>>
>> I have also measured the same sample twice (weeks between) . What about
>> including the factor Time using the by= argument to test if the curve
>> differ between the two measurements (in a separate analysis looking
each
>> site individually)? I guess this will ignore the correlation within
>> samples but maybe it can be used to get a rough idea?
>>
>> Here is an example ( I know that sample size is low for a gam)
>> #Create data
>>
y<-c(0.283,0.185,0.134,0.727,0.087,0.568,0.569,0.156,0.189,0.738,0.526,0.144
>>
,0.532,0.128,0.162,0.772,0.789,0.798,0.756,0.738,0.768,0.778,0.791,0.776,
>> 0.808,
>>
0.756,0.742,0.744,0.758,0.774,0.768,0.780,0.782,0.789,0.756,0.805,0.800,
>> 0.004,0.790,0.781,0.757,0.780,0.744,0.783,0.774)
>>
x<-c(45.1,43,48.5,40,47.1,42.8,40.3,45.3,45.6,37.5,38,46.1,40.5,40.2,46.5,
>> 27.3,24.9,27.7,24.8,26.6,25.1,23.5,26.3,26.6,27.6,
>>
26.3,24.4,25.8,26.7,28,36,33.3,35.2,35.1,33.9,34.7,34.4,33.4,34.1,34.3,31,
>> 33.2,33.4,35.7,32.6) Site<-rep(1:3,each=5,3)
>> mydata<-data.frame(y=y,x=x,Site=as.factor(Site))
>> library(mgcv)
>> #Model without site
>> mod.no.site<-gam(y~s(x),data=mydata)
>> #Model including the factor Site. id=1 to get the same smothing
>> parameter at each factor level.
>> mod.with.site<-gam(y~Site+s(x,by=Site,id=1),data=mydata)
>> #AIC for the two models
>> AIC(mod.no.site,mod.with.site)
>>
>> Thanks,
>>
>> Gustaf Granath (phd student)
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html and provide commented,
minimal,
>> self-contained, reproducible code.
>>     
>
>   
-- 
Gustaf Granath (PhD student)

Dept of Plant Ecology
Evolutionary Biology Centre (EBC)
Uppsala University
Villav?gen 14, SE - 752 36 Uppsala, Sweden

Tel: +46 (0)18-471 76 88
Fax: 018 55 34 19
Email: Gustaf.Granath at ebc.uu.se
http://www.vaxtbio.uu.se/resfold/granath.htm

Reasonably Related Threads

Search for more seemingly similar threads

R help - Oct 2007 - Strange results with anova.glm()

[R] Strange results with anova.glm()

[R] Strange results with anova.glm()

[R] mgcv pack- gam() - question about the by argument

[R] mgcv pack- gam() - question about the by argument

Reasonably Related Threads