thr3ads.net - R help - [R] testing non-linear component in mgcv:gam [Oct 2005]

If this information is useful, please help other people find it:
Share via:

Denis Chabot

2005-Oct-05 13:21 UTC

[R] testing non-linear component in mgcv:gam

Hi,

I need further help with my GAMs. Most models I test are very  
obviously non-linear. Yet, to be on the safe side, I report the  
significance of the smooth (default output of mgcv's summary.gam) and  
confirm it deviates significantly from linearity.

I do the latter by fitting a second model where the same predictor is  
entered without the s(), and then use anova.gam to compare the two. I  
thought this was the equivalent of the default output of anova.gam  
using package gam instead of mgcv.

I wonder if this procedure is correct because one of my models  
appears to be linear. In fact mgcv estimates df to be exactly 1.0 so  
I could have stopped there. However I inadvertently repeated the  
procedure outlined above. I would have thought in this case the  
anova.gam comparing the smooth and the linear fit would for sure have  
been not significant. To my surprise, P was 6.18e-09!

Am I doing something wrong when I attempt to confirm the non- 
parametric part a smoother is significant? Here is my example case  
where the relationship does appear to be linear:

library(mgcv)> This is mgcv 1.3-7Temp <- c(-1.38, -1.12, -0.88, -0.62, -0.38, -0.12, 0.12, 0.38, 0.62,  
0.88, 1.12,
            1.38, 1.62, 1.88, 2.12, 2.38, 2.62, 2.88, 3.12, 3.38,  
3.62, 3.88,
            4.12, 4.38, 4.62, 4.88, 5.12, 5.38, 5.62, 5.88, 6.12,  
6.38, 6.62, 6.88,
            7.12, 8.38, 13.62)
N.sets <- c(2, 6, 3, 9, 26, 15, 34, 21, 30, 18, 28, 27, 27, 29, 31,  
22, 26, 24, 23,
             15, 25, 24, 27, 19, 26, 24, 22, 13, 10, 2, 5, 3, 1, 1,  
1, 1, 1)
wm.sed <- c(0.000000000, 0.016129032, 0.000000000, 0.062046512,  
0.396459596, 0.189082949,
             0.054757925, 0.142810440, 0.168005168, 0.180804428,  
0.111439628, 0.128799505,
             0.193707937, 0.105921610, 0.103497845, 0.028591837,  
0.217894389, 0.020535469,
             0.080389068, 0.105234450, 0.070213450, 0.050771363,  
0.042074434, 0.102348837,
             0.049748344, 0.019100478, 0.005203125, 0.101711864,  
0.000000000, 0.000000000,
             0.014808824, 0.000000000, 0.222000000, 0.167000000,  
0.000000000, 0.000000000,
             0.000000000)

sed.gam <- gam(wm.sed~s(Temp),weight=N.sets)
summary.gam(sed.gam)> Family: gaussian
> Link function: identity
>
> Formula:
> wm.sed ~ s(Temp)
>
> Parametric coefficients:
>             Estimate Std. Error t value Pr(>|t|)
> (Intercept)  0.08403    0.01347   6.241 3.73e-07 ***
> ---
> Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
>
> Approximate significance of smooth terms:
>         edf Est.rank     F  p-value
> s(Temp)   1        1 13.95 0.000666 ***
> ---
> Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
>
> R-sq.(adj) =  0.554   Deviance explained = 28.5%
> GCV score = 0.09904   Scale est. = 0.093686  n = 37
# testing non-linear contribution
sed.lin <- gam(wm.sed~Temp,weight=N.sets)
summary.gam(sed.lin)> Family: gaussian
> Link function: identity
>
> Formula:
> wm.sed ~ Temp
>
> Parametric coefficients:
>              Estimate Std. Error t value Pr(>|t|)
> (Intercept)  0.162879   0.019847   8.207 1.14e-09 ***
> Temp        -0.023792   0.006369  -3.736 0.000666 ***
> ---
> Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
>
>
> R-sq.(adj) =  0.554   Deviance explained = 28.5%
> GCV score = 0.09904   Scale est. = 0.093686  n = 37
anova.gam(sed.lin, sed.gam, test="F")> Analysis of Deviance Table
>
> Model 1: wm.sed ~ Temp
> Model 2: wm.sed ~ s(Temp)
>    Resid. Df Resid. Dev         Df  Deviance      F   Pr(>F)
> 1 3.5000e+01      3.279
> 2 3.5000e+01      3.279 5.5554e-10 2.353e-11 0.4521 6.18e-09 ***

Thanks in advance,


Denis Chabot

Liaw, Andy

2005-Oct-05 13:38 UTC

head link

[R] testing non-linear component in mgcv:gam

I think you probably should state more clearly the goal of your
analysis.  In such situation, estimation and hypothesis testing are
quite different.  The procedure that gives you the `best' estimate is
not necessarily the `best' for testing linearity of components.  If your
goal is estimation/prediction, why test linearity of components?  If the
goal is testing linearity, then you probably need to find the procedure
that gives you a good test, rather than relying on what gam() gives you.

Just my $0.02...

Andy
> From: Denis Chabot
> 
> Hi,
> 
> I need further help with my GAMs. Most models I test are very  
> obviously non-linear. Yet, to be on the safe side, I report the  
> significance of the smooth (default output of mgcv's 
> summary.gam) and  
> confirm it deviates significantly from linearity.
> 
> I do the latter by fitting a second model where the same 
> predictor is  
> entered without the s(), and then use anova.gam to compare 
> the two. I  
> thought this was the equivalent of the default output of anova.gam  
> using package gam instead of mgcv.
> 
> I wonder if this procedure is correct because one of my models  
> appears to be linear. In fact mgcv estimates df to be exactly 1.0 so  
> I could have stopped there. However I inadvertently repeated the  
> procedure outlined above. I would have thought in this case the  
> anova.gam comparing the smooth and the linear fit would for 
> sure have  
> been not significant. To my surprise, P was 6.18e-09!
> 
> Am I doing something wrong when I attempt to confirm the non- 
> parametric part a smoother is significant? Here is my example case  
> where the relationship does appear to be linear:
> 
> library(mgcv)
> > This is mgcv 1.3-7
> Temp <- c(-1.38, -1.12, -0.88, -0.62, -0.38, -0.12, 0.12, 
> 0.38, 0.62,  
> 0.88, 1.12,
>             1.38, 1.62, 1.88, 2.12, 2.38, 2.62, 2.88, 3.12, 3.38,  
> 3.62, 3.88,
>             4.12, 4.38, 4.62, 4.88, 5.12, 5.38, 5.62, 5.88, 6.12,  
> 6.38, 6.62, 6.88,
>             7.12, 8.38, 13.62)
> N.sets <- c(2, 6, 3, 9, 26, 15, 34, 21, 30, 18, 28, 27, 27, 29, 31,  
> 22, 26, 24, 23,
>              15, 25, 24, 27, 19, 26, 24, 22, 13, 10, 2, 5, 3, 1, 1,  
> 1, 1, 1)
> wm.sed <- c(0.000000000, 0.016129032, 0.000000000, 0.062046512,  
> 0.396459596, 0.189082949,
>              0.054757925, 0.142810440, 0.168005168, 0.180804428,  
> 0.111439628, 0.128799505,
>              0.193707937, 0.105921610, 0.103497845, 0.028591837,  
> 0.217894389, 0.020535469,
>              0.080389068, 0.105234450, 0.070213450, 0.050771363,  
> 0.042074434, 0.102348837,
>              0.049748344, 0.019100478, 0.005203125, 0.101711864,  
> 0.000000000, 0.000000000,
>              0.014808824, 0.000000000, 0.222000000, 0.167000000,  
> 0.000000000, 0.000000000,
>              0.000000000)
> 
> sed.gam <- gam(wm.sed~s(Temp),weight=N.sets)
> summary.gam(sed.gam)
> > Family: gaussian
> > Link function: identity
> >
> > Formula:
> > wm.sed ~ s(Temp)
> >
> > Parametric coefficients:
> >             Estimate Std. Error t value Pr(>|t|)
> > (Intercept)  0.08403    0.01347   6.241 3.73e-07 ***
> > ---
> > Signif. codes:  0 '***' 0.001 '**' 0.01 '*'
0.05 '.' 0.1 ' ' 1
> >
> > Approximate significance of smooth terms:
> >         edf Est.rank     F  p-value
> > s(Temp)   1        1 13.95 0.000666 ***
> > ---
> > Signif. codes:  0 '***' 0.001 '**' 0.01 '*'
0.05 '.' 0.1 ' ' 1
> >
> > R-sq.(adj) =  0.554   Deviance explained = 28.5%
> > GCV score = 0.09904   Scale est. = 0.093686  n = 37
> 
> # testing non-linear contribution
> sed.lin <- gam(wm.sed~Temp,weight=N.sets)
> summary.gam(sed.lin)
> > Family: gaussian
> > Link function: identity
> >
> > Formula:
> > wm.sed ~ Temp
> >
> > Parametric coefficients:
> >              Estimate Std. Error t value Pr(>|t|)
> > (Intercept)  0.162879   0.019847   8.207 1.14e-09 ***
> > Temp        -0.023792   0.006369  -3.736 0.000666 ***
> > ---
> > Signif. codes:  0 '***' 0.001 '**' 0.01 '*'
0.05 '.' 0.1 ' ' 1
> >
> >
> > R-sq.(adj) =  0.554   Deviance explained = 28.5%
> > GCV score = 0.09904   Scale est. = 0.093686  n = 37
> anova.gam(sed.lin, sed.gam, test="F")
> > Analysis of Deviance Table
> >
> > Model 1: wm.sed ~ Temp
> > Model 2: wm.sed ~ s(Temp)
> >    Resid. Df Resid. Dev         Df  Deviance      F   Pr(>F)
> > 1 3.5000e+01      3.279
> > 2 3.5000e+01      3.279 5.5554e-10 2.353e-11 0.4521 6.18e-09 ***
> 
> 
> Thanks in advance,
> 
> 
> Denis Chabot
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
>

John Fox

2005-Oct-05 13:45 UTC

head link

[R] testing non-linear component in mgcv:gam

Dear Denis,

Take a closer look at the anova table: The models provide identical fits to
the data. The differences in degrees of freedom and deviance between the two
models are essentially zero, 5.5554e-10 and 2.353e-11 respectively.

I hope this helps,
 John

--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
-------------------------------- 
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Denis Chabot
> Sent: Wednesday, October 05, 2005 8:22 AM
> To: r-help at stat.math.ethz.ch
> Subject: [R] testing non-linear component in mgcv:gam
> 
> Hi,
> 
> I need further help with my GAMs. Most models I test are very 
> obviously non-linear. Yet, to be on the safe side, I report 
> the significance of the smooth (default output of mgcv's 
> summary.gam) and confirm it deviates significantly from linearity.
> 
> I do the latter by fitting a second model where the same 
> predictor is entered without the s(), and then use anova.gam 
> to compare the two. I thought this was the equivalent of the 
> default output of anova.gam using package gam instead of mgcv.
> 
> I wonder if this procedure is correct because one of my 
> models appears to be linear. In fact mgcv estimates df to be 
> exactly 1.0 so I could have stopped there. However I 
> inadvertently repeated the procedure outlined above. I would 
> have thought in this case the anova.gam comparing the smooth 
> and the linear fit would for sure have been not significant. 
> To my surprise, P was 6.18e-09!
> 
> Am I doing something wrong when I attempt to confirm the non- 
> parametric part a smoother is significant? Here is my example 
> case where the relationship does appear to be linear:
> 
> library(mgcv)
> > This is mgcv 1.3-7
> Temp <- c(-1.38, -1.12, -0.88, -0.62, -0.38, -0.12, 0.12, 
> 0.38, 0.62, 0.88, 1.12,
>             1.38, 1.62, 1.88, 2.12, 2.38, 2.62, 2.88, 3.12, 
> 3.38, 3.62, 3.88,
>             4.12, 4.38, 4.62, 4.88, 5.12, 5.38, 5.62, 5.88, 
> 6.12, 6.38, 6.62, 6.88,
>             7.12, 8.38, 13.62)
> N.sets <- c(2, 6, 3, 9, 26, 15, 34, 21, 30, 18, 28, 27, 27, 
> 29, 31, 22, 26, 24, 23,
>              15, 25, 24, 27, 19, 26, 24, 22, 13, 10, 2, 5, 3, 
> 1, 1, 1, 1, 1) wm.sed <- c(0.000000000, 0.016129032, 
> 0.000000000, 0.062046512, 0.396459596, 0.189082949,
>              0.054757925, 0.142810440, 0.168005168, 
> 0.180804428, 0.111439628, 0.128799505,
>              0.193707937, 0.105921610, 0.103497845, 
> 0.028591837, 0.217894389, 0.020535469,
>              0.080389068, 0.105234450, 0.070213450, 
> 0.050771363, 0.042074434, 0.102348837,
>              0.049748344, 0.019100478, 0.005203125, 
> 0.101711864, 0.000000000, 0.000000000,
>              0.014808824, 0.000000000, 0.222000000, 
> 0.167000000, 0.000000000, 0.000000000,
>              0.000000000)
> 
> sed.gam <- gam(wm.sed~s(Temp),weight=N.sets)
> summary.gam(sed.gam)
> > Family: gaussian
> > Link function: identity
> >
> > Formula:
> > wm.sed ~ s(Temp)
> >
> > Parametric coefficients:
> >             Estimate Std. Error t value Pr(>|t|)
> > (Intercept)  0.08403    0.01347   6.241 3.73e-07 ***
> > ---
> > Signif. codes:  0 '***' 0.001 '**' 0.01 '*'
0.05 '.' 0.1 ' ' 1
> >
> > Approximate significance of smooth terms:
> >         edf Est.rank     F  p-value
> > s(Temp)   1        1 13.95 0.000666 ***
> > ---
> > Signif. codes:  0 '***' 0.001 '**' 0.01 '*'
0.05 '.' 0.1 ' ' 1
> >
> > R-sq.(adj) =  0.554   Deviance explained = 28.5%
> > GCV score = 0.09904   Scale est. = 0.093686  n = 37
> 
> # testing non-linear contribution
> sed.lin <- gam(wm.sed~Temp,weight=N.sets)
> summary.gam(sed.lin)
> > Family: gaussian
> > Link function: identity
> >
> > Formula:
> > wm.sed ~ Temp
> >
> > Parametric coefficients:
> >              Estimate Std. Error t value Pr(>|t|)
> > (Intercept)  0.162879   0.019847   8.207 1.14e-09 ***
> > Temp        -0.023792   0.006369  -3.736 0.000666 ***
> > ---
> > Signif. codes:  0 '***' 0.001 '**' 0.01 '*'
0.05 '.' 0.1 ' ' 1
> >
> >
> > R-sq.(adj) =  0.554   Deviance explained = 28.5%
> > GCV score = 0.09904   Scale est. = 0.093686  n = 37
> anova.gam(sed.lin, sed.gam, test="F")
> > Analysis of Deviance Table
> >
> > Model 1: wm.sed ~ Temp
> > Model 2: wm.sed ~ s(Temp)
> >    Resid. Df Resid. Dev         Df  Deviance      F   Pr(>F)
> > 1 3.5000e+01      3.279
> > 2 3.5000e+01      3.279 5.5554e-10 2.353e-11 0.4521 6.18e-09 ***
> 
> 
> Thanks in advance,
> 
> 
> Denis Chabot
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html

Simon Wood

2005-Oct-24 13:37 UTC

head link

[R] testing non-linear component in mgcv:gam

In this case the models being compared are really identical, and the
P-value is meaningless numerical noise.

If your main focus is hypothesis testing and you really need near exact
p-values, then do this sort of testing using unpenalized models. i.e.
don't have mgcv::gam estimate the EDF of the smooth, just use
`s(...,fx=TRUE)' to estimate it unpenalized. This gives horrible point
estimates and excellent p-values.

best,
Simon
> Hi,
>
> I need further help with my GAMs. Most models I test are very
> obviously non-linear. Yet, to be on the safe side, I report the
> significance of the smooth (default output of mgcv's summary.gam) and
> confirm it deviates significantly from linearity.
>
> I do the latter by fitting a second model where the same predictor is
> entered without the s(), and then use anova.gam to compare the two. I
> thought this was the equivalent of the default output of anova.gam
> using package gam instead of mgcv.
>
> I wonder if this procedure is correct because one of my models
> appears to be linear. In fact mgcv estimates df to be exactly 1.0 so
> I could have stopped there. However I inadvertently repeated the
> procedure outlined above. I would have thought in this case the
> anova.gam comparing the smooth and the linear fit would for sure have
> been not significant. To my surprise, P was 6.18e-09!
>
> Am I doing something wrong when I attempt to confirm the non-
> parametric part a smoother is significant? Here is my example case
> where the relationship does appear to be linear:
>
> library(mgcv)
> > This is mgcv 1.3-7
> Temp <- c(-1.38, -1.12, -0.88, -0.62, -0.38, -0.12, 0.12, 0.38, 0.62,
> 0.88, 1.12,
>             1.38, 1.62, 1.88, 2.12, 2.38, 2.62, 2.88, 3.12, 3.38,
> 3.62, 3.88,
>             4.12, 4.38, 4.62, 4.88, 5.12, 5.38, 5.62, 5.88, 6.12,
> 6.38, 6.62, 6.88,
>             7.12, 8.38, 13.62)
> N.sets <- c(2, 6, 3, 9, 26, 15, 34, 21, 30, 18, 28, 27, 27, 29, 31,
> 22, 26, 24, 23,
>              15, 25, 24, 27, 19, 26, 24, 22, 13, 10, 2, 5, 3, 1, 1,
> 1, 1, 1)
> wm.sed <- c(0.000000000, 0.016129032, 0.000000000, 0.062046512,
> 0.396459596, 0.189082949,
>              0.054757925, 0.142810440, 0.168005168, 0.180804428,
> 0.111439628, 0.128799505,
>              0.193707937, 0.105921610, 0.103497845, 0.028591837,
> 0.217894389, 0.020535469,
>              0.080389068, 0.105234450, 0.070213450, 0.050771363,
> 0.042074434, 0.102348837,
>              0.049748344, 0.019100478, 0.005203125, 0.101711864,
> 0.000000000, 0.000000000,
>              0.014808824, 0.000000000, 0.222000000, 0.167000000,
> 0.000000000, 0.000000000,
>              0.000000000)
>
> sed.gam <- gam(wm.sed~s(Temp),weight=N.sets)
> summary.gam(sed.gam)
> > Family: gaussian
> > Link function: identity
> >
> > Formula:
> > wm.sed ~ s(Temp)
> >
> > Parametric coefficients:
> >             Estimate Std. Error t value Pr(>|t|)
> > (Intercept)  0.08403    0.01347   6.241 3.73e-07 ***
> > ---
> > Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
> >
> > Approximate significance of smooth terms:
> >         edf Est.rank     F  p-value
> > s(Temp)   1        1 13.95 0.000666 ***
> > ---
> > Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
> >
> > R-sq.(adj) =  0.554   Deviance explained = 28.5%
> > GCV score = 0.09904   Scale est. = 0.093686  n = 37
>
> # testing non-linear contribution
> sed.lin <- gam(wm.sed~Temp,weight=N.sets)
> summary.gam(sed.lin)
> > Family: gaussian
> > Link function: identity
> >
> > Formula:
> > wm.sed ~ Temp
> >
> > Parametric coefficients:
> >              Estimate Std. Error t value Pr(>|t|)
> > (Intercept)  0.162879   0.019847   8.207 1.14e-09 ***
> > Temp        -0.023792   0.006369  -3.736 0.000666 ***
> > ---
> > Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
> >
> >
> > R-sq.(adj) =  0.554   Deviance explained = 28.5%
> > GCV score = 0.09904   Scale est. = 0.093686  n = 37
> anova.gam(sed.lin, sed.gam, test="F")
> > Analysis of Deviance Table
> >
> > Model 1: wm.sed ~ Temp
> > Model 2: wm.sed ~ s(Temp)
> >    Resid. Df Resid. Dev         Df  Deviance      F   Pr(>F)
> > 1 3.5000e+01      3.279
> > 2 3.5000e+01      3.279 5.5554e-10 2.353e-11 0.4521 6.18e-09 ***
>
>
> Thanks in advance,
>
>
> Denis Chabot
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>

Reasonably Related Threads

Search for more seemingly similar threads

R help - Oct 2005 - testing non-linear component in mgcv:gam

[R] testing non-linear component in mgcv:gam

[R] testing non-linear component in mgcv:gam

[R] testing non-linear component in mgcv:gam

[R] testing non-linear component in mgcv:gam

Reasonably Related Threads