Hi, I have been struggling with this problem for some time now. Internet, books haven't been able to help me. ## I have factorial design with counts (fruits) as response variable. > str(stubb) 'data.frame': 334 obs. of 5 variables: $ id : int 6 23 24 25 26 27 28 29 31 34 ... $ infl.treat : Factor w/ 2 levels "0","1": 2 2 2 2 1 1 1 2 1 1 ... $ def.treat : Factor w/ 4 levels "1","2","3","4": 2 3 1 3 3 3 2 2 3 1 ... $ initial.size : num 854 158 256 332 396 ... $ tot.fruit : int 45 8 19 8 10 7 4 7 2 6 ... ## I fit a glm model with initial.size as a covariate, and family=quasipoisson. I want to test if infl.treat or def.treat had an effect on tot.fruits. (i.e if different cutting treatments of the plants had an effect on total ## number of fruits). ## After model testing etc, I ended up with the following model: > model<-glm(tot.fruit ~ infl.treat + def.treat + initial.size, quasipoisson) > summary(model) Call: glm(formula = tot.fruit ~ infl.treat + def.treat + initial.size, family = quasipoisson) Deviance Residuals: Min 1Q Median 3Q Max -6.1541 -2.1872 -0.7045 0.9926 7.5629 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.8788185 0.1093306 17.185 < 2e-16 *** infl.treat1 -0.2456909 0.0924677 -2.657 0.00827 ** def.treat2 -0.1478382 0.1277139 -1.158 0.24788 def.treat3 -0.0780282 0.1207796 -0.646 0.51871 def.treat4 -0.2581576 0.1221538 -2.113 0.03532 * initial.size 0.0013834 0.0000993 13.931 < 2e-16 *** --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 (Dispersion parameter for quasipoisson family taken to be 6.845172) Null deviance: 3261.5 on 333 degrees of freedom Residual deviance: 2131.6 on 328 degrees of freedom AIC: NA Number of Fisher Scoring iterations: 5 ## It seems like both infl.treat and def.treat have an effect. However, when I do an ANOVA table. It shows no effect at all of infl.treat!!!! I find that very peculiar since I only have two levels of that factor. ## Shouldn't it be a significant effect also in the ANOVA table?? In fact. It is not even close to significant - 0.886!! > anova(model, test="F") Analysis of Deviance Table Model: quasipoisson, link: log Response: tot.fruit Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev F Pr(>F) NULL 333 3261.5 infl.treat 1 0.1 332 3261.4 0.0205 0.88632 def.treat 3 58.3 329 3203.1 2.8376 0.03814 * initial.size 1 1071.4 328 2131.6 156.5253 < 2e-16 *** ## If I do a ML test with a model with infl.treat and one without. The infl.treat is also significant and should be in the model!! > anova(model,model2, test="F") Analysis of Deviance Table Model 1: tot.fruit ~ infl.treat + def.treat + initial.size Model 2: tot.fruit ~ def.treat + initial.size Resid. Df Resid. Dev Df Deviance F Pr(>F) 1 328 2131.65 2 329 2180.40 -1 -48.75 7.1221 0.007992 ** ## What is going on here?? I thought the ANOVA table would give me a quite similar result because I only have two levels (of the infl.treat factor) and no interactions in the model. ## I'm afraid I have missed something trivial though so please, be gentle ;) Cheers, Gustaf Granath -- Gustaf Granath (PhD student) Dept of Plant Ecology Evolutionary Biology Centre (EBC) Uppsala University
Dear Gustaf, I can think of two reasons why the two tests can disagree. First, the t-test from the summary() output is based on the covariance matrix of the coefficients, while the F-test in the anova() output is based of fitting alternative models. The two are not in general the same. Second, and apparently more important here, the test in summary() is for each coefficient in the full model fit to the data, while the tests in anova() are sequential tests; thus infl.treat is tested ignoring everything else (but the intercept). When you use anova() to contrast the full model with one deleting infl.treat you test the same hypothesis as in the summary() output, and thus get similar (though not identical) results. You might take a look at the Anova() function in the car package which, in a model of this structure, will test each term "after" the others. I hope this helps, John On Mon, 29 Oct 2007 17:46:08 +0100 Gustaf Granath <Gustaf.Granath at ebc.uu.se> wrote:> Hi, > I have been struggling with this problem for some time now. Internet, > > books haven't been able to help me. > > ## I have factorial design with counts (fruits) as response variable. > > str(stubb) > 'data.frame': 334 obs. of 5 variables: > $ id : int 6 23 24 25 26 27 28 29 31 34 ... > $ infl.treat : Factor w/ 2 levels "0","1": 2 2 2 2 1 1 1 2 1 1 ... > $ def.treat : Factor w/ 4 levels "1","2","3","4": 2 3 1 3 3 3 2 2 3 1 > ... > $ initial.size : num 854 158 256 332 396 ... > $ tot.fruit : int 45 8 19 8 10 7 4 7 2 6 ... > > ## I fit a glm model with initial.size as a covariate, and > family=quasipoisson. I want to test if infl.treat or def.treat had an > > effect on tot.fruits. (i.e if different cutting treatments of the > plants > had an effect on total ## number of fruits). > ## After model testing etc, I ended up with the following model: > > > model<-glm(tot.fruit ~ infl.treat + def.treat + initial.size, > quasipoisson) > > summary(model) > Call: glm(formula = tot.fruit ~ infl.treat + def.treat + > initial.size, > family = quasipoisson) > > Deviance Residuals: > Min 1Q Median 3Q Max > -6.1541 -2.1872 -0.7045 0.9926 7.5629 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 1.8788185 0.1093306 17.185 < 2e-16 *** > infl.treat1 -0.2456909 0.0924677 -2.657 0.00827 ** > def.treat2 -0.1478382 0.1277139 -1.158 0.24788 > def.treat3 -0.0780282 0.1207796 -0.646 0.51871 > def.treat4 -0.2581576 0.1221538 -2.113 0.03532 * > initial.size 0.0013834 0.0000993 13.931 < 2e-16 *** > --- > Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > > (Dispersion parameter for quasipoisson family taken to be 6.845172) > > Null deviance: 3261.5 on 333 degrees of freedom > Residual deviance: 2131.6 on 328 degrees of freedom > AIC: NA > > Number of Fisher Scoring iterations: 5 > > ## It seems like both infl.treat and def.treat have an effect. > However, > when I do an ANOVA table. It shows no effect at all of infl.treat!!!! > I > find that very peculiar since I only have two levels of that factor. > ## > Shouldn't it be a significant effect also in the ANOVA table?? In > fact. > It is not even close to significant - 0.886!! > > anova(model, test="F") > Analysis of Deviance Table > Model: quasipoisson, link: log > Response: tot.fruit > Terms added sequentially (first to last) > > Df Deviance Resid. Df Resid. Dev F Pr(>F) > NULL 333 3261.5 > infl.treat 1 0.1 332 3261.4 0.0205 0.88632 > def.treat 3 58.3 329 3203.1 2.8376 0.03814 * > initial.size 1 1071.4 328 2131.6 156.5253 < 2e-16 *** > > ## If I do a ML test with a model with infl.treat and one without. > The > infl.treat is also significant and should be in the model!! > > anova(model,model2, test="F") > Analysis of Deviance Table > > Model 1: tot.fruit ~ infl.treat + def.treat + initial.size > Model 2: tot.fruit ~ def.treat + initial.size > Resid. Df Resid. Dev Df Deviance F Pr(>F) > 1 328 2131.65 > 2 329 2180.40 -1 -48.75 7.1221 0.007992 ** > ## What is going on here?? I thought the ANOVA table would give me a > quite similar result because I only have two levels (of the > infl.treat > factor) and no interactions in the model. > ## I'm afraid I have missed something trivial though so please, be > gentle ;) > > Cheers, > > Gustaf Granath > > -- > Gustaf Granath (PhD student) > Dept of Plant Ecology > Evolutionary Biology Centre (EBC) > Uppsala University > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-------------------------------- John Fox, Professor Department of Sociology McMaster University Hamilton, Ontario, Canada http://socserv.mcmaster.ca/jfox/
Gustaf Granath
2008-Oct-22 12:05 UTC
[R] mgcv pack- gam() - question about the by argument
R fellows, I hope my questions are not too much about statistical methodology. I do lab experiments where y is measured at different x, using (plant)material originating from different sites (2-4). The response is non-linear in a way that I have turned to gam() (mgcv package) to investigate the shape of the response (maybe there are better ways). I want to test if the shape/pattern of the response differ between sites (i.e. should each site have its own curve). I was thinking (after I read Wood's reference manual) to use the by= argument and fit one model including the factor site and one without site, and then compare these models with AIC (or can the anova.gam() be trusted here?). Is this a correct interpretation and use of the by= argument?? I have also measured the same sample twice (weeks between) . What about including the factor Time using the by= argument to test if the curve differ between the two measurements (in a separate analysis looking each site individually)? I guess this will ignore the correlation within samples but maybe it can be used to get a rough idea? Here is an example ( I know that sample size is low for a gam) #Create data y<-c(0.283,0.185,0.134,0.727,0.087,0.568,0.569,0.156,0.189,0.738,0.526,0.144,0.532,0.128,0.162,0.772,0.789,0.798,0.756,0.738,0.768,0.778,0.791,0.776,0.808, 0.756,0.742,0.744,0.758,0.774,0.768,0.780,0.782,0.789,0.756,0.805,0.800,0.804,0.790,0.781,0.757,0.780,0.744,0.783,0.774) x<-c(45.1,43,48.5,40,47.1,42.8,40.3,45.3,45.6,37.5,38,46.1,40.5,40.2,46.5,27.3,24.9,27.7,24.8,26.6,25.1,23.5,26.3,26.6,27.6, 26.3,24.4,25.8,26.7,28,36,33.3,35.2,35.1,33.9,34.7,34.4,33.4,34.1,34.3,31,33.2,33.4,35.7,32.6) Site<-rep(1:3,each=5,3) mydata<-data.frame(y=y,x=x,Site=as.factor(Site)) library(mgcv) #Model without site mod.no.site<-gam(y~s(x),data=mydata) #Model including the factor Site. id=1 to get the same smothing parameter at each factor level. mod.with.site<-gam(y~Site+s(x,by=Site,id=1),data=mydata) #AIC for the two models AIC(mod.no.site,mod.with.site) Thanks, Gustaf Granath (phd student)
Gustaf Granath
2008-Oct-28 12:48 UTC
[R] mgcv pack- gam() - question about the by argument
Thanks!! The R-community is just amazing. Two follow-up questions: 1) IF, I had sufficient amount of data. How would that interaction term (Site x Time) be specified? Maybe like this? (adding stuff on the previous example) And I should look at the "Parametric coefficients:" for significant interaction terms, right? #Add two more measurements time2<-data.frame(y=mydata$y*rnorm(1,mean=1,sd=0.2),x=x,Site=Site) time3<-data.frame(y=mydata$y*rnorm(1,mean=1,sd=0.2),x=x,Site=Site) mydata.rep<-rbind(mydata,time2,time3) mydata.rep$Time<-as.factor(rep(1:3,each=45)) #Model with Site x Time - correct? Or should the Time factor be incorporated in the 'by' argument also (if possible?)?? mod.with.sitextime<-gam(y~Site*Time+s(x,by=Site,id=1),data=mydata.rep) summary(mod.with.sitextime) 2) The Parametric coefficients estimates: are they like intercepts or what do they represent here (when we have factor levels involved)? Does a significant difference translate into different means for the factor levels compared? Cheers, Gustaf Simon Wood wrote:> Gustaf, > > Your `by' variable use looks quite correct to me. The time issue is slightly > tricky with relatively few data. You could set up a factor variable for the > site:time interaction (i.e. one level for each combination of site and time) > but that would be pushing it a bit for this number of data. > > Simon > > On Wednesday 22 October 2008 13:05, Gustaf Granath wrote: > >> R fellows, >> I hope my questions are not too much about statistical methodology. >> >> I do lab experiments where y is measured at different x, using >> (plant)material originating from different sites (2-4). The response is >> non-linear in a way that I have turned to gam() (mgcv package) to >> investigate the shape of the response (maybe there are better ways). I >> want to test if the shape/pattern of the response differ between sites >> (i.e. should each site have its own curve). I was thinking (after I read >> Wood's reference manual) to use the by= argument and fit one model >> including the factor site and one without site, and then compare these >> models with AIC (or can the anova.gam() be trusted here?). Is this a >> correct interpretation and use of the by= argument?? >> >> I have also measured the same sample twice (weeks between) . What about >> including the factor Time using the by= argument to test if the curve >> differ between the two measurements (in a separate analysis looking each >> site individually)? I guess this will ignore the correlation within >> samples but maybe it can be used to get a rough idea? >> >> Here is an example ( I know that sample size is low for a gam) >> #Create data >> y<-c(0.283,0.185,0.134,0.727,0.087,0.568,0.569,0.156,0.189,0.738,0.526,0.144 >> ,0.532,0.128,0.162,0.772,0.789,0.798,0.756,0.738,0.768,0.778,0.791,0.776, >> 0.808, >> 0.756,0.742,0.744,0.758,0.774,0.768,0.780,0.782,0.789,0.756,0.805,0.800, >> 0.004,0.790,0.781,0.757,0.780,0.744,0.783,0.774) >> x<-c(45.1,43,48.5,40,47.1,42.8,40.3,45.3,45.6,37.5,38,46.1,40.5,40.2,46.5, >> 27.3,24.9,27.7,24.8,26.6,25.1,23.5,26.3,26.6,27.6, >> 26.3,24.4,25.8,26.7,28,36,33.3,35.2,35.1,33.9,34.7,34.4,33.4,34.1,34.3,31, >> 33.2,33.4,35.7,32.6) Site<-rep(1:3,each=5,3) >> mydata<-data.frame(y=y,x=x,Site=as.factor(Site)) >> library(mgcv) >> #Model without site >> mod.no.site<-gam(y~s(x),data=mydata) >> #Model including the factor Site. id=1 to get the same smothing >> parameter at each factor level. >> mod.with.site<-gam(y~Site+s(x,by=Site,id=1),data=mydata) >> #AIC for the two models >> AIC(mod.no.site,mod.with.site) >> >> Thanks, >> >> Gustaf Granath (phd student) >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html and provide commented, minimal, >> self-contained, reproducible code. >> > >-- Gustaf Granath (PhD student) Dept of Plant Ecology Evolutionary Biology Centre (EBC) Uppsala University Villav?gen 14, SE - 752 36 Uppsala, Sweden Tel: +46 (0)18-471 76 88 Fax: 018 55 34 19 Email: Gustaf.Granath at ebc.uu.se http://www.vaxtbio.uu.se/resfold/granath.htm
Possibly Parallel Threads
- Weird SEs with effect()
- Djjlölkjhfyn kb noknkkkkokljjyikkk hyjjjjjkjjjjkjkkpooololåååååååååååååååääääkkuiivjkoööklopipållällnbbbn mml ömmmm
- How to get two y-axises in a bar plot?
- Surfaceplot3D with wireframe
- Interpretation of 'Intercept' in a 2-way factorial lm