thr3ads.net - R help - [R] GAM with the negative binomial distribution: why do predictions no match with original values? [Nov 2016]

If this information is useful, please help other people find it:
Share via:

Marine Regis

2016-Nov-22 21:29 UTC

[R] GAM with the negative binomial distribution: why do predictions no match with original values?

Hello,
>From capture data, I would like to assess the effect of longitudinal changes
in proportion of forests on abundance of skunks. To test this, I built this GAM
where the dependent variable is the number of unique skunks and the independent
variables are the X coordinates of the centroids of trapping sites (called
"X" in the GAM) and the proportion of forests within the trapping
sites (called "prop_forest" in the GAM):
    mod <- gam(nb_unique ~ s(x,prop_forest), offset=log_trap_eff,
family=nb(theta=NULL, link="log"), data=succ_capt_skunk, method =
"REML", select = TRUE)
    summary(mod)

    Family: Negative Binomial(13.446)
    Link function: log

    Formula:
    nb_unique ~ s(x, prop_forest)

    Parametric coefficients:
                Estimate Std. Error z value Pr(>|z|)
    (Intercept) -2.02095    0.03896  -51.87   <2e-16 ***
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1

    Approximate significance of smooth terms:
                       edf Ref.df Chi.sq  p-value
    s(x,prop_forest) 3.182     29  17.76 0.000102 ***
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1

    R-sq.(adj) =   0.37   Deviance explained =   49%
    -REML = 268.61  Scale est. = 1         n = 58


I built a GAM  for the negative binomial family. When I use the function
`predict.gam`, the predictions of capture success from the GAM and the values of
capture success from original data are very different. What is the reason for
differences occur?

**With GAM:**

    modPred <- predict.gam(mod, se.fit=TRUE,type="response")
    summary(modPred$fit)
       Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
     0.1026  0.1187  0.1333  0.1338  0.1419  0.1795

 **With original data:**

    summary(succ_capt_skunk$nb_unique)
       Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
      17.00   59.00   82.00   81.83  106.80  147.00

The question has already been posted on Cross validated
(http://stats.stackexchange.com/questions/247347/gam-with-the-negative-binomial-distribution-why-do-predictions-no-match-with-or)
without success.

Thanks a lot for your time.
Have a nice day
Marine


	[[alternative HTML version deleted]]

Bert Gunter

2016-Nov-22 22:07 UTC

head link

[R] GAM with the negative binomial distribution: why do predictions no match with original values?

Define "very different."  Sounds like a subjective opinion to me, for
which I have no response. Apparently others are similarly flummoxed.
Of course they would not in general be identical.

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Nov 22, 2016 at 1:29 PM, Marine Regis <marine.regis at hotmail.fr>
wrote:> Hello,
>
> >From capture data, I would like to assess the effect of longitudinal
changes in proportion of forests on abundance of skunks. To test this, I built
this GAM where the dependent variable is the number of unique skunks and the
independent variables are the X coordinates of the centroids of trapping sites
(called "X" in the GAM) and the proportion of forests within the
trapping sites (called "prop_forest" in the GAM):
>
>     mod <- gam(nb_unique ~ s(x,prop_forest), offset=log_trap_eff,
family=nb(theta=NULL, link="log"), data=succ_capt_skunk, method =
"REML", select = TRUE)
>     summary(mod)
>
>     Family: Negative Binomial(13.446)
>     Link function: log
>
>     Formula:
>     nb_unique ~ s(x, prop_forest)
>
>     Parametric coefficients:
>                 Estimate Std. Error z value Pr(>|z|)
>     (Intercept) -2.02095    0.03896  -51.87   <2e-16 ***
>     ---
>     Signif. codes:  0 '***' 0.001 '**' 0.01 '*'
0.05 '.' 0.1 ' ' 1
>
>     Approximate significance of smooth terms:
>                        edf Ref.df Chi.sq  p-value
>     s(x,prop_forest) 3.182     29  17.76 0.000102 ***
>     ---
>     Signif. codes:  0 '***' 0.001 '**' 0.01 '*'
0.05 '.' 0.1 ' ' 1
>
>     R-sq.(adj) =   0.37   Deviance explained =   49%
>     -REML = 268.61  Scale est. = 1         n = 58
>
>
> I built a GAM  for the negative binomial family. When I use the function
`predict.gam`, the predictions of capture success from the GAM and the values of
capture success from original data are very different. What is the reason for
differences occur?
>
> **With GAM:**
>
>     modPred <- predict.gam(mod, se.fit=TRUE,type="response")
>     summary(modPred$fit)
>        Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>      0.1026  0.1187  0.1333  0.1338  0.1419  0.1795
>
>  **With original data:**
>
>     summary(succ_capt_skunk$nb_unique)
>        Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>       17.00   59.00   82.00   81.83  106.80  147.00
>
> The question has already been posted on Cross validated
(http://stats.stackexchange.com/questions/247347/gam-with-the-negative-binomial-distribution-why-do-predictions-no-match-with-or)
without success.
>
> Thanks a lot for your time.
> Have a nice day
> Marine
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius

2016-Nov-22 22:33 UTC

head link

[R] GAM with the negative binomial distribution: why do predictions no match with original values?

> On Nov 22, 2016, at 1:29 PM, Marine Regis <marine.regis at
hotmail.fr> wrote:
> 
> Hello,
> 
>> From capture data, I would like to assess the effect of longitudinal
changes in proportion of forests on abundance of skunks. To test this, I built
this GAM where the dependent variable is the number of unique skunks and the
independent variables are the X coordinates of the centroids of trapping sites
(called "X" in the GAM) and the proportion of forests within the
trapping sites (called "prop_forest" in the GAM):
> 
>    mod <- gam(nb_unique ~ s(x,prop_forest), offset=log_trap_eff,
family=nb(theta=NULL, link="log"), data=succ_capt_skunk, method =
"REML", select = TRUE)
>    summary(mod)
> 
>    Family: Negative Binomial(13.446)
>    Link function: log
> 
>    Formula:
>    nb_unique ~ s(x, prop_forest)
> 
>    Parametric coefficients:
>                Estimate Std. Error z value Pr(>|z|)
>    (Intercept) -2.02095    0.03896  -51.87   <2e-16 ***
>    ---
>    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1
> 
>    Approximate significance of smooth terms:
>                       edf Ref.df Chi.sq  p-value
>    s(x,prop_forest) 3.182     29  17.76 0.000102 ***
>    ---
>    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1
> 
>    R-sq.(adj) =   0.37   Deviance explained =   49%
>    -REML = 268.61  Scale est. = 1         n = 58
> 
> 
> I built a GAM  for the negative binomial family. When I use the function
`predict.gam`, the predictions of capture success from the GAM and the values of
capture success from original data are very different. What is the reason for
differences occur?
You have an offset that is not described. And `gam` suppresses the Intercept.
These would seem to be likely sources of confusion. For the best answers either
on Rhelp or on CrossValidated.com you should be offering a working example.
It's not our responsibility to build these for you.

I found that others had included offsets and then had questions about
prediction. I haven't reviewed these candidates but perhaps you can find one
in this modest listing that comes up from the MarkMail search engine:

http://markmail.org/search/?q=list%3Aorg.r-project.r-help+mgcv+gam+offset+predict

library(mgcv) 
x<-seq(0,10,length=100) 
y<-x^2+rnorm(100) 
m1<-gam(y~s(x,k=10,bs='cs')) 
m2<-gam(y~s(x,k=10,bs='cs'), offset= rep(10,100) ) 
x1<-seq(0,10,0.1) 
y1<-predict(m1,newdata=list(x=x1)) 
y2<-predict(m2,newdata=list(x=x1))

plot(x,y,ylim=c(0,100)) 
lines(x1,y1,lwd=4,col='red') 
lines(x1,y2,lwd=4,col='blue')


-- 
David.

> 
> **With GAM:**
> 
>    modPred <- predict.gam(mod, se.fit=TRUE,type="response")
>    summary(modPred$fit)
>       Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>     0.1026  0.1187  0.1333  0.1338  0.1419  0.1795
> 
> **With original data:**
> 
>    summary(succ_capt_skunk$nb_unique)
>       Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>      17.00   59.00   82.00   81.83  106.80  147.00
> 
> The question has already been posted on Cross validated
(http://stats.stackexchange.com/questions/247347/gam-with-the-negative-binomial-distribution-why-do-predictions-no-match-with-or)
without success.
> 
> Thanks a lot for your time.
> Have a nice day
> Marine
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA

Cade, Brian

2016-Nov-22 22:39 UTC

head link

[R] GAM with the negative binomial distribution: why do predictions no match with original values?

Well part of the issue is that the negative binomial estimates are for
means and they can differ a fair bit from the raw counts, but I'm also
guessing that part of the issue is that the offset may not be accounted for
with the predict.gam() function.

Brian

Brian S. Cade, PhD

U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO  80526-8818

email:  cadeb at usgs.gov <brian_cade at usgs.gov>
tel:  970 226-9326


On Tue, Nov 22, 2016 at 2:29 PM, Marine Regis <marine.regis at hotmail.fr>
wrote:
> Hello,
>
> >From capture data, I would like to assess the effect of longitudinal
> changes in proportion of forests on abundance of skunks. To test this, I
> built this GAM where the dependent variable is the number of unique skunks
> and the independent variables are the X coordinates of the centroids of
> trapping sites (called "X" in the GAM) and the proportion of
forests within
> the trapping sites (called "prop_forest" in the GAM):
>
>     mod <- gam(nb_unique ~ s(x,prop_forest), offset=log_trap_eff,
> family=nb(theta=NULL, link="log"), data=succ_capt_skunk, method =
"REML",
> select = TRUE)
>     summary(mod)
>
>     Family: Negative Binomial(13.446)
>     Link function: log
>
>     Formula:
>     nb_unique ~ s(x, prop_forest)
>
>     Parametric coefficients:
>                 Estimate Std. Error z value Pr(>|z|)
>     (Intercept) -2.02095    0.03896  -51.87   <2e-16 ***
>     ---
>     Signif. codes:  0 '***' 0.001 '**' 0.01 '*'
0.05 '.' 0.1 ' ' 1
>
>     Approximate significance of smooth terms:
>                        edf Ref.df Chi.sq  p-value
>     s(x,prop_forest) 3.182     29  17.76 0.000102 ***
>     ---
>     Signif. codes:  0 '***' 0.001 '**' 0.01 '*'
0.05 '.' 0.1 ' ' 1
>
>     R-sq.(adj) =   0.37   Deviance explained =   49%
>     -REML = 268.61  Scale est. = 1         n = 58
>
>
> I built a GAM  for the negative binomial family. When I use the function
> `predict.gam`, the predictions of capture success from the GAM and the
> values of capture success from original data are very different. What is
> the reason for differences occur?
>
> **With GAM:**
>
>     modPred <- predict.gam(mod, se.fit=TRUE,type="response")
>     summary(modPred$fit)
>        Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>      0.1026  0.1187  0.1333  0.1338  0.1419  0.1795
>
>  **With original data:**
>
>     summary(succ_capt_skunk$nb_unique)
>        Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>       17.00   59.00   82.00   81.83  106.80  147.00
>
> The question has already been posted on Cross validated (
> http://stats.stackexchange.com/questions/247347/gam-with-
> the-negative-binomial-distribution-why-do-predictions-no-match-with-or)
> without success.
>
> Thanks a lot for your time.
> Have a nice day
> Marine
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
	[[alternative HTML version deleted]]

peter dalgaard

2016-Nov-22 22:52 UTC

head link

[R] GAM with the negative binomial distribution: why do predictions no match with original values?

> On 22 Nov 2016, at 23:07 , Bert Gunter <bgunter.4567 at gmail.com>
wrote:
> 
> Define "very different."  Sounds like a subjective opinion to me,
for
> which I have no response. Apparently others are similarly flummoxed.
> Of course they would not in general be identical.
Er? I don't see much reason to disagree that a range 0.10-0.18 is different
from 17-147.

However, other bits of information are missing: We don't know which gam()
function is being used (to my knowledge there is one in package gam but also one
in mgcv). We don't have the data, so we cannot reproduce and try to find the
root of the problem.

Offhand, it looks like the predict.gam() function is misbehaving, which could
have something to do with the offset term and/or the nb dispersion parameter. On
a hunch, does anything change if you use

nb_unique ~ s(x,prop_forest) + offset(log_trap_eff)

instead of the offset= argument? And, by the way, does fitted(mod,...) change
anything?

-pd 
> 
> Cheers,
> Bert
> 
> 
> Bert Gunter
> 
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip
)
> 
> 
> On Tue, Nov 22, 2016 at 1:29 PM, Marine Regis <marine.regis at
hotmail.fr> wrote:
>> Hello,
>> 
>>> From capture data, I would like to assess the effect of
longitudinal changes in proportion of forests on abundance of skunks. To test
this, I built this GAM where the dependent variable is the number of unique
skunks and the independent variables are the X coordinates of the centroids of
trapping sites (called "X" in the GAM) and the proportion of forests
within the trapping sites (called "prop_forest" in the GAM):
>> 
>>    mod <- gam(nb_unique ~ s(x,prop_forest), offset=log_trap_eff,
family=nb(theta=NULL, link="log"), data=succ_capt_skunk, method =
"REML", select = TRUE)
>>    summary(mod)
>> 
>>    Family: Negative Binomial(13.446)
>>    Link function: log
>> 
>>    Formula:
>>    nb_unique ~ s(x, prop_forest)
>> 
>>    Parametric coefficients:
>>                Estimate Std. Error z value Pr(>|z|)
>>    (Intercept) -2.02095    0.03896  -51.87   <2e-16 ***
>>    ---
>>    Signif. codes:  0 '***' 0.001 '**' 0.01 '*'
0.05 '.' 0.1 ' ' 1
>> 
>>    Approximate significance of smooth terms:
>>                       edf Ref.df Chi.sq  p-value
>>    s(x,prop_forest) 3.182     29  17.76 0.000102 ***
>>    ---
>>    Signif. codes:  0 '***' 0.001 '**' 0.01 '*'
0.05 '.' 0.1 ' ' 1
>> 
>>    R-sq.(adj) =   0.37   Deviance explained =   49%
>>    -REML = 268.61  Scale est. = 1         n = 58
>> 
>> 
>> I built a GAM  for the negative binomial family. When I use the
function `predict.gam`, the predictions of capture success from the GAM and the
values of capture success from original data are very different. What is the
reason for differences occur?
>> 
>> **With GAM:**
>> 
>>    modPred <- predict.gam(mod,
se.fit=TRUE,type="response")
>>    summary(modPred$fit)
>>       Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>>     0.1026  0.1187  0.1333  0.1338  0.1419  0.1795
>> 
>> **With original data:**
>> 
>>    summary(succ_capt_skunk$nb_unique)
>>       Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>>      17.00   59.00   82.00   81.83  106.80  147.00
>> 
>> The question has already been posted on Cross validated
(http://stats.stackexchange.com/questions/247347/gam-with-the-negative-binomial-distribution-why-do-predictions-no-match-with-or)
without success.
>> 
>> Thanks a lot for your time.
>> Have a nice day
>> Marine
>> 
>> 
>>        [[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

R help - Nov 2016 - GAM with the negative binomial distribution: why do predictions no match with original values?

[R] GAM with the negative binomial distribution: why do predictions no match with original values?

[R] GAM with the negative binomial distribution: why do predictions no match with original values?

[R] GAM with the negative binomial distribution: why do predictions no match with original values?

[R] GAM with the negative binomial distribution: why do predictions no match with original values?

[R] GAM with the negative binomial distribution: why do predictions no match with original values?