thr3ads.net - R help - [R] mgcv: how select significant predictor vars when using gam(...select=TRUE) using automatic optimization [Apr 2013]

If this information is useful, please help other people find it:
Share via:

Jan Holstein

2013-Apr-17 14:50 UTC

[R] mgcv: how select significant predictor vars when using gam(...select=TRUE) using automatic optimization

I have 11 possible predictor variables and use them to model quite a few
target variables. 
In search for a consistent manner and possibly non-manual manner to identify
the significant predictor vars out of the eleven I thought the option
"select=T" might do.

Example: (here only 4 pedictors) 
first is vanilla with "select=F"
>
fit1<-gam(target~s(mgs)+s(gsd)+s(mud)+s(ssCmax),family=quasi(link=log),data=wspe1,select=F)
> summary(fit1)
Family: quasi 
Link function: log 
Formula:
target ~ s(mgs) + s(gsd) + s(mud) + s(ssCmax)
Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   -34.57      20.47  -1.689   0.0913 .
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 
Approximate significance of smooth terms:
            edf Ref.df      F  p-value    
s(mgs)    2.335  2.623  0.260    0.829    
s(gsd)    6.868  7.506 13.955  < 2e-16 ***
s(mud)    8.990  9.000 11.727  < 2e-16 ***
s(ssCmax) 6.770  6.978  6.664 7.68e-08 ***
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 

R-sq.(adj) =  0.402   Deviance explained = 40.4%
GCV score = 8.8563e+05  Scale est. = 8.8053e+05  n = 4511



then turn select=TRUE




fit2<-gam(target~s(mgs)+s(gsd)+s(mud)+s(ssCmax),family=quasi(link=log),data=wspe1,select=TRUE)> summary(fit2)
Family: quasi 
Link function: log 

Formula:
target ~ s(mgs) + s(gsd) + s(mud) + s(ssCmax)
Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   0.1585     1.7439   0.091    0.928
Approximate significance of smooth terms:
            edf Ref.df     F p-value    
s(mgs)    2.456      8 24.50  <2e-16 ***
s(gsd)    7.272      9 14.33  <2e-16 ***
s(mud)    7.678      9 20.38  <2e-16 ***
s(ssCmax) 6.556      9 14.36  <2e-16 ***
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 
R-sq.(adj) =  0.397   Deviance explained =   40%
GCV score = 8.9209e+05  Scale est. = 8.8715e+05  n = 4511

I seem to not fully understand how to work with "select".
The predictor "mgs" is obviously not significant, as seen from
"fit"
(above), yet here it appears as significant. Why was it not dropped? How are
not-significant predictors are identified? 





--
View this message in context:
http://r.789695.n4.nabble.com/mgcv-how-select-significant-predictor-vars-when-using-gam-select-TRUE-using-automatic-optimization-tp4664510.html
Sent from the R help mailing list archive at Nabble.com.

Simon Wood

2013-Apr-17 17:42 UTC

head link

[R] mgcv: how select significant predictor vars when using gam(...select=TRUE) using automatic optimization

Jan,

What mgcv version are you using, please? (Older versions have a poor 
p-value approximation when select=TRUE, but of course it's possible that 
you've managed to break the newer approximation as well)

The 'select=TRUE' option adds a penalty to each smooth, to allow it to 
be penalized out of the model altogether via optimization of the 
smoothing parameter selection criterion. Usually it is better to use 
REML for smoothing parameter selection in this case using 
'method="REML"' as an option to gam. This is because REML is
less prone
to undersmoothing than GCV. So 'select=TRUE' is not selecting on the 
basis of the p-values, themselves, but obviously this sort of 
discrepancy should not be happening.

best,
Simon

On 17/04/13 15:50, Jan Holstein wrote:> I have 11 possible predictor variables and use them to model quite a few
> target variables.
> In search for a consistent manner and possibly non-manual manner to
identify
> the significant predictor vars out of the eleven I thought the option
> "select=T" might do.
>
> Example: (here only 4 pedictors)
> first is vanilla with "select=F"
>
>>
fit1<-gam(target~s(mgs)+s(gsd)+s(mud)+s(ssCmax),family=quasi(link=log),data=wspe1,select=F)
>> summary(fit1)
>
> Family: quasi
> Link function: log
> Formula:
> target ~ s(mgs) + s(gsd) + s(mud) + s(ssCmax)
> Parametric coefficients:
>              Estimate Std. Error t value Pr(>|t|)
> (Intercept)   -34.57      20.47  -1.689   0.0913 .
> ---
> Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
> Approximate significance of smooth terms:
>              edf Ref.df      F  p-value
> s(mgs)    2.335  2.623  0.260    0.829
> s(gsd)    6.868  7.506 13.955  < 2e-16 ***
> s(mud)    8.990  9.000 11.727  < 2e-16 ***
> s(ssCmax) 6.770  6.978  6.664 7.68e-08 ***
> ---
> Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
>
> R-sq.(adj) =  0.402   Deviance explained = 40.4%
> GCV score = 8.8563e+05  Scale est. = 8.8053e+05  n = 4511
>
>
>
> then turn select=TRUE
>
>
>
>
>
fit2<-gam(target~s(mgs)+s(gsd)+s(mud)+s(ssCmax),family=quasi(link=log),data=wspe1,select=TRUE)
>> summary(fit2)
>
> Family: quasi
> Link function: log
>
> Formula:
> target ~ s(mgs) + s(gsd) + s(mud) + s(ssCmax)
> Parametric coefficients:
>              Estimate Std. Error t value Pr(>|t|)
> (Intercept)   0.1585     1.7439   0.091    0.928
> Approximate significance of smooth terms:
>              edf Ref.df     F p-value
> s(mgs)    2.456      8 24.50  <2e-16 ***
> s(gsd)    7.272      9 14.33  <2e-16 ***
> s(mud)    7.678      9 20.38  <2e-16 ***
> s(ssCmax) 6.556      9 14.36  <2e-16 ***
> ---
> Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
> R-sq.(adj) =  0.397   Deviance explained =   40%
> GCV score = 8.9209e+05  Scale est. = 8.8715e+05  n = 4511
>
> I seem to not fully understand how to work with "select".
> The predictor "mgs" is obviously not significant, as seen from
"fit"
> (above), yet here it appears as significant. Why was it not dropped? How
are
> not-significant predictors are identified?
>
>
>
>
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/mgcv-how-select-significant-predictor-vars-when-using-gam-select-TRUE-using-automatic-optimization-tp4664510.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
+44 (0)1225 386603               http://people.bath.ac.uk/sw283

Seemingly Similar Threads

Search for more seemingly similar threads

R help - Apr 2013 - mgcv: how select significant predictor vars when using gam(...select=TRUE) using automatic optimization

[R] mgcv: how select significant predictor vars when using gam(...select=TRUE) using automatic optimization

[R] mgcv: how select significant predictor vars when using gam(...select=TRUE) using automatic optimization

Seemingly Similar Threads