thr3ads.net - R help - [R] gam() [Jun 2003]

If this information is useful, please help other people find it:
Share via:

Henric Nilsson

2003-Jun-04 15:01 UTC

[R] gam()

Dear all,

I've now spent a couple of days trying to learn R and, in particular, the 
gam() function, and I now have a few questions and reflections regarding 
the latter. Maybe these things are implemented in some way that I'm not yet 
aware of or have perhaps been decided by the R community to not be what's 
wanted. Of course, my lack of complete theoretical understanding of what 
mgcv really does may also show...

1. When fitting models where a factor interacts with a smooth term, say 
y~a+s(x,by=a.1)+s(x,by=a.2), I noticed that the rug in the plot of each of 
the smooth terms is identical. I expected the rug in the plot of e.g. 
s(x,by=a.1) to only include those x for which a.1=1 to be able to judge if 
observations of x where a.1=1 are sparse in any region. Also, it would be 
really if nice the "by=..." was included in the output of the
plot.gam()
and the "Approximate significance of smooth terms:" part of the
summary.gam().

2. John Fox has modified anova.glm() into anova.gam() 
(socsci.mcmaster.ca/jfox/Books/Companion/nonparametric-regression.txt)
for comparison of two or more fitted models based on the difference between 
residual deviances. Indiscriminate use of such a procedure shouldn't 
perhaps be encouraged, but I think that many users expect it to be part of 
the mgcv package since this model selection idea is covered in several 
texts and also implemented in S-plus (and may be OK for truly nested 
models). And even if it's been decided that this functionality is not 
wanted in mgcv, perhaps another function comparing several models by the 
GCV/UBRE score and other useful statistics can be implemented?

3. Some authors [1, 2] suggests pointwise estimation of odds ratios and 
corresponding confidence intervals based on the smooth terms in a GAM. 
Maybe something for mgcv?
[1] Figueiras, A. & Cadarso-Su?rez C. (2001) "Application of
Nonparametric
Models for calculating Odds Ratios and Their Confidence Intervals for 
Continuous Exposures", American Journal of Epidemiology, 154(3), 264-275.
[2] Saez, M., Cadarso-Su?rez C. & Figueiras, A. (2003) "np.OR: an
S-Plus
function for pointwise nonparametric estimation of odds-ratios of 
continuous predictors", Computer Methods and Programs in Biomedicine, 71, 
175-179.

4. For each purely parametric covariate a t-test is produced; I'd like to 
have something like S-plus' anova.gam() to get an overall test. (Perhaps 
with the addition of a choice between Type I and Type III tests, but I 
guess that may be controversial). Is it possible?

//Henric

---------------------------------------------------------------------------------------
Henric Nilsson, Statistician

Statisticon AB, ?stra ?gatan 31, SE-753 22 UPPSALA
Phone (Direct): +46 (0)18 18 22 37
Mobile: +46 (0)70 211 68 36
Fax: +46 (0)18 18 22 33

<statisticon.se>

John Fox

2003-Jun-05 15:12 UTC

head link

[R] gam()

Dear Henric,

At 05:01 PM 6/4/2003 +0200, Henric Nilsson wrote:
>I've now spent a couple of days trying to learn R and, in particular,
the
>gam() function, and I now have a few questions and reflections regarding 
>the latter. Maybe these things are implemented in some way that I'm not 
>yet aware of or have perhaps been decided by the R community to not be 
>what's wanted. Of course, my lack of complete theoretical understanding
of
>what mgcv really does may also show...
>
>1. When fitting models where a factor interacts with a smooth term, say 
>y~a+s(x,by=a.1)+s(x,by=a.2), I noticed that the rug in the plot of each of 
>the smooth terms is identical. I expected the rug in the plot of e.g. 
>s(x,by=a.1) to only include those x for which a.1=1 to be able to judge if 
>observations of x where a.1=1 are sparse in any region. Also, it would be 
>really if nice the "by=..." was included in the output of the
plot.gam()
>and the "Approximate significance of smooth terms:" part of the
summary.gam().
>
>2. John Fox has modified anova.glm() into anova.gam() 
>(socsci.mcmaster.ca/jfox/Books/Companion/nonparametric-regression.txt)
>for comparison of two or more fitted models based on the difference 
>between residual deviances. Indiscriminate use of such a procedure 
>shouldn't perhaps be encouraged, but I think that many users expect it
to
>be part of the mgcv package since this model selection idea is covered in 
>several texts and also implemented in S-plus (and may be OK for truly 
>nested models). And even if it's been decided that this functionality is
>not wanted in mgcv, perhaps another function comparing several models by 
>the GCV/UBRE score and other useful statistics can be implemented?
The problem with comparing two gams in R fit with mgcv is that, by default, 
the degree of smoothing for terms is selected independently for each model. 
Simon Wood previously posted a message to the R-help list discussing this 
issue and making some suggestions. The issue doesn't arise in the same way 
with models fit by the gam function in S-PLUS because the degree of 
smoothing there is instead selected by the user. I should update my 
appendix on nonparametric regression to discuss this question -- the 
current presentation isn't really adequate.

>3. Some authors [1, 2] suggests pointwise estimation of odds ratios and 
>corresponding confidence intervals based on the smooth terms in a GAM. 
>Maybe something for mgcv?
>[1] Figueiras, A. & Cadarso-Su?rez C. (2001) "Application of
Nonparametric
>Models for calculating Odds Ratios and Their Confidence Intervals for 
>Continuous Exposures", American Journal of Epidemiology, 154(3),
264-275.
>[2] Saez, M., Cadarso-Su?rez C. & Figueiras, A. (2003) "np.OR: an
S-Plus
>function for pointwise nonparametric estimation of odds-ratios of 
>continuous predictors", Computer Methods and Programs in Biomedicine,
71,
>175-179.
>
>4. For each purely parametric covariate a t-test is produced; I'd like
to
>have something like S-plus' anova.gam() to get an overall test. (Perhaps
>with the addition of a choice between Type I and Type III tests, but I 
>guess that may be controversial). Is it possible?

John

-----------------------------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario, Canada L8S 4M4
email: jfox at mcmaster.ca
phone: 905-525-9140x23604
web: socsci.mcmaster.ca/jfox

Henric Nilsson

2003-Jun-05 16:57 UTC

head link

[R] gam()

At 11:12 2003-06-05 -0400, John Fox wrote:
>>2. John Fox has modified anova.glm() into anova.gam() 
>>(socsci.mcmaster.ca/jfox/Books/Companion/nonparametric-regression.txt)
>>for comparison of two or more fitted models based on the difference 
>>between residual deviances. Indiscriminate use of such a procedure 
>>shouldn't perhaps be encouraged, but I think that many users expect
it to
>>be part of the mgcv package since this model selection idea is covered
in
>>several texts and also implemented in S-plus (and may be OK for truly 
>>nested models). And even if it's been decided that this
functionality is
>>not wanted in mgcv, perhaps another function comparing several models by
>>the GCV/UBRE score and other useful statistics can be implemented?
>
>The problem with comparing two gams in R fit with mgcv is that, by 
>default, the degree of smoothing for terms is selected independently for 
>each model. Simon Wood previously posted a message to the R-help list 
>discussing this issue and making some suggestions. The issue doesn't
arise
>in the same way with models fit by the gam function in S-PLUS because the 
>degree of smoothing there is instead selected by the user. I should update 
>my appendix on nonparametric regression to discuss this question -- the 
>current presentation isn't really adequate.
I'm aware of this difference between gam() in R and S-Plus, which is why I 
proposed a function listing relevant statistics for every fitted model so 
the analyst can use these to judge, without hypothesis testing, which model 
to prefer. Still, for models where the analyst has made sure that the 
models are truly nested, the use of your anova.gam can be justified by the 
simulation results reported by Hastie & Tibshirani (1990, p. 155); maybe I 
just want it for purely nostalgic reasons?! ;-)

Admittedly, I like the more attractive way of chosing the degrees of 
freedom that mgcv provides. However, I must admit that since most text 
books covering GAMs are more or less Splus based, and the possibilities 
that mgcv offers are so vast, I'm feeling a bit lost at times; it's
great
to have to new more flexible tools, but on the downside that means more 
choices to be made. So, anyone got any essential literature tips? I've read 
(and re-read, and read again) Simon Wood's articles in JRSS, R News and 
Ecological Modelling, and, of course, the mgcv manual.

//Henric

---------------------------------------------------------------------------------------
Henric Nilsson, Statistician

Statisticon AB, ?stra ?gatan 31, SE-753 22 UPPSALA
Phone (Direct): +46 (0)18 18 22 37
Mobile: +46 (0)70 211 68 36
Fax: +46 (0)18 18 22 33

<statisticon.se>

Seemingly Similar Threads

Search for more maybe matching threads

R help - Jun 2003 - gam()

[R] gam()

[R] gam()

[R] gam()

Seemingly Similar Threads