peter dalgaard
2016-Nov-22 22:52 UTC
[R] GAM with the negative binomial distribution: why do predictions no match with original values?
> On 22 Nov 2016, at 23:07 , Bert Gunter <bgunter.4567 at gmail.com> wrote: > > Define "very different." Sounds like a subjective opinion to me, for > which I have no response. Apparently others are similarly flummoxed. > Of course they would not in general be identical.Er? I don't see much reason to disagree that a range 0.10-0.18 is different from 17-147. However, other bits of information are missing: We don't know which gam() function is being used (to my knowledge there is one in package gam but also one in mgcv). We don't have the data, so we cannot reproduce and try to find the root of the problem. Offhand, it looks like the predict.gam() function is misbehaving, which could have something to do with the offset term and/or the nb dispersion parameter. On a hunch, does anything change if you use nb_unique ~ s(x,prop_forest) + offset(log_trap_eff) instead of the offset= argument? And, by the way, does fitted(mod,...) change anything? -pd> > Cheers, > Bert > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Tue, Nov 22, 2016 at 1:29 PM, Marine Regis <marine.regis at hotmail.fr> wrote: >> Hello, >> >>> From capture data, I would like to assess the effect of longitudinal changes in proportion of forests on abundance of skunks. To test this, I built this GAM where the dependent variable is the number of unique skunks and the independent variables are the X coordinates of the centroids of trapping sites (called "X" in the GAM) and the proportion of forests within the trapping sites (called "prop_forest" in the GAM): >> >> mod <- gam(nb_unique ~ s(x,prop_forest), offset=log_trap_eff, family=nb(theta=NULL, link="log"), data=succ_capt_skunk, method = "REML", select = TRUE) >> summary(mod) >> >> Family: Negative Binomial(13.446) >> Link function: log >> >> Formula: >> nb_unique ~ s(x, prop_forest) >> >> Parametric coefficients: >> Estimate Std. Error z value Pr(>|z|) >> (Intercept) -2.02095 0.03896 -51.87 <2e-16 *** >> --- >> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 >> >> Approximate significance of smooth terms: >> edf Ref.df Chi.sq p-value >> s(x,prop_forest) 3.182 29 17.76 0.000102 *** >> --- >> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 >> >> R-sq.(adj) = 0.37 Deviance explained = 49% >> -REML = 268.61 Scale est. = 1 n = 58 >> >> >> I built a GAM for the negative binomial family. When I use the function `predict.gam`, the predictions of capture success from the GAM and the values of capture success from original data are very different. What is the reason for differences occur? >> >> **With GAM:** >> >> modPred <- predict.gam(mod, se.fit=TRUE,type="response") >> summary(modPred$fit) >> Min. 1st Qu. Median Mean 3rd Qu. Max. >> 0.1026 0.1187 0.1333 0.1338 0.1419 0.1795 >> >> **With original data:** >> >> summary(succ_capt_skunk$nb_unique) >> Min. 1st Qu. Median Mean 3rd Qu. Max. >> 17.00 59.00 82.00 81.83 106.80 147.00 >> >> The question has already been posted on Cross validated (http://stats.stackexchange.com/questions/247347/gam-with-the-negative-binomial-distribution-why-do-predictions-no-match-with-or) without success. >> >> Thanks a lot for your time. >> Have a nice day >> Marine >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
Marine Regis
2016-Nov-23 00:24 UTC
[R] GAM with the negative binomial distribution: why do predictions no match with original values?
Thanks a lot for your answers. Peter: sorry, here is the missing information: * I use the function gam() of the package ?mgcv? * Yes, the output changes when I use offset(log_trap_eff) instead of offset=log_trap_eff. By using offset(log_trap_eff), the output is more coherent with the observed values. Here are the new predictions:> summary(mod$fit)Min. 1st Qu. Median Mean 3rd Qu. Max. 10.01 68.14 85.71 83.16 101.00 130.20 * I have tried to create a reproductive example to show the difference between offset(log_trap_eff) and offset=log_trap_eff. nb_unique <- rnegbin(58, mu=82, theta=13.446) x <- runif(58,min=-465300,max=435200) prop_forest <- runif(58,min=0,max=1) log_trap_eff <- runif(58,min=4,max=6) With offset=log_trap_eff: mod1 <- gam(nb_unique ~ s(x,prop_forest), offset=log_trap_eff, family=nb(theta=NULL, link="log"), method = "REML", select = TRUE)> mod1Pred <- predict.gam(mod1, se.fit=TRUE, type="response")> summary(mod1Pred$fit)Min. 1st Qu. Median Mean 3rd Qu. Max. 0.5852 0.5852 0.5852 0.5852 0.5852 0.5852 With offset(log_trap_eff): mod2 <- gam(nb_unique ~ s(x,prop_forest) + offset(log_trap_eff), family=nb(theta=NULL, link="log"), method = "REML", select = TRUE)> mod2Pred <- predict.gam(mod2, se.fit=TRUE, type="response")> summary(mod2Pred$fit)Min. 1st Qu. Median Mean 3rd Qu. Max. 32.03 61.18 97.20 112.20 165.00 226.00 Value range of observed data:> summary(nb_unique)Min. 1st Qu. Median Mean 3rd Qu. Max. 43.00 67.00 81.00 84.16 92.75 153.00 * By using fitted(mod), I obtain NULL. I am a novice in GAMs. So, I don?t know why the results are different between models with offset=argument and offset(). Thanks a lot for your help. Have a nice day Marine ________________________________ De : peter dalgaard <pdalgd at gmail.com> Envoy? : mardi 22 novembre 2016 23:52 ? : Bert Gunter Cc : Marine Regis; r-help at r-project.org Objet : Re: [R] GAM with the negative binomial distribution: why do predictions no match with original values?> On 22 Nov 2016, at 23:07 , Bert Gunter <bgunter.4567 at gmail.com> wrote: > > Define "very different." Sounds like a subjective opinion to me, for > which I have no response. Apparently others are similarly flummoxed. > Of course they would not in general be identical.Er? I don't see much reason to disagree that a range 0.10-0.18 is different from 17-147. However, other bits of information are missing: We don't know which gam() function is being used (to my knowledge there is one in package gam but also one in mgcv). We don't have the data, so we cannot reproduce and try to find the root of the problem. Offhand, it looks like the predict.gam() function is misbehaving, which could have something to do with the offset term and/or the nb dispersion parameter. On a hunch, does anything change if you use nb_unique ~ s(x,prop_forest) + offset(log_trap_eff) instead of the offset= argument? And, by the way, does fitted(mod,...) change anything? -pd> > Cheers, > Bert > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Tue, Nov 22, 2016 at 1:29 PM, Marine Regis <marine.regis at hotmail.fr> wrote: >> Hello, >> >>> From capture data, I would like to assess the effect of longitudinal changes in proportion of forests on abundance of skunks. To test this, I built this GAM where the dependent variable is the number of unique skunks and the independent variables are the X coordinates of the centroids of trapping sites (called "X" in the GAM) and the proportion of forests within the trapping sites (called "prop_forest" in the GAM): >> >> mod <- gam(nb_unique ~ s(x,prop_forest), offset=log_trap_eff, family=nb(theta=NULL, link="log"), data=succ_capt_skunk, method = "REML", select = TRUE) >> summary(mod) >> >> Family: Negative Binomial(13.446) >> Link function: log >> >> Formula: >> nb_unique ~ s(x, prop_forest) >> >> Parametric coefficients: >> Estimate Std. Error z value Pr(>|z|) >> (Intercept) -2.02095 0.03896 -51.87 <2e-16 *** >> --- >> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 >> >> Approximate significance of smooth terms: >> edf Ref.df Chi.sq p-value >> s(x,prop_forest) 3.182 29 17.76 0.000102 *** >> --- >> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 >> >> R-sq.(adj) = 0.37 Deviance explained = 49% >> -REML = 268.61 Scale est. = 1 n = 58 >> >> >> I built a GAM for the negative binomial family. When I use the function `predict.gam`, the predictions of capture success from the GAM and the values of capture success from original data are very different. What is the reason for differences occur? >> >> **With GAM:** >> >> modPred <- predict.gam(mod, se.fit=TRUE,type="response") >> summary(modPred$fit) >> Min. 1st Qu. Median Mean 3rd Qu. Max. >> 0.1026 0.1187 0.1333 0.1338 0.1419 0.1795 >> >> **With original data:** >> >> summary(succ_capt_skunk$nb_unique) >> Min. 1st Qu. Median Mean 3rd Qu. Max. >> 17.00 59.00 82.00 81.83 106.80 147.00 >> >> The question has already been posted on Cross validated (http://stats.stackexchange.com/questions/247347/gam-with-the-negative-binomial-distribution-why-do-predictions-no-match-with-or) without success.[http://cdn.sstatic.net/Sites/stats/img/apple-touch-icon at 2.png?v=344f57aa10cc&a]<http://stats.stackexchange.com/questions/247347/gam-with-the-negative-binomial-distribution-why-do-predictions-no-match-with-or> GAM with the negative binomial distribution: why do predictions no match with original values?<http://stats.stackexchange.com/questions/247347/gam-with-the-negative-binomial-distribution-why-do-predictions-no-match-with-or> stats.stackexchange.com>From capture data, I would like to assess the effect of longitudinal changes in proportion of forests on abundance of skunks. To test this, I built this GAM where the dependent variable is the numb...>> >> Thanks a lot for your time. >> Have a nice day >> Marine >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-helpthz.ch/mailman/listinfo/r-help> stat.ethz.ch The main R mailing list, for announcements about the development of R and the availability of new code, questions and answers about problems and solutions using R ...>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-helpthz.ch/mailman/listinfo/r-help> stat.ethz.ch The main R mailing list, for announcements about the development of R and the availability of new code, questions and answers about problems and solutions using R ...> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com [[alternative HTML version deleted]]
Simon Wood
2016-Nov-23 22:32 UTC
[R] GAM with the negative binomial distribution: why do predictions no match with original values?
?predict.gam (mgcv) says.... "Note that, in common with other prediction functions, any offset supplied to ?gam? as an argument is always ignored when predicting, unlike offsets specified in the gam model formula." .... which was originally implemented to prevent surprises to people familiar with predict.lm (which used to behave that way, and was documented to behave that way).... the problem is that predict.lm doesn't behave like that any more, so I guess at some point I should remove this feature... best, Simon On 23/11/16 00:24, Marine Regis wrote:> Thanks a lot for your answers. > Peter: sorry, here is the missing information: > > * I use the function gam() of the package ?mgcv? > * Yes, the output changes when I use offset(log_trap_eff) instead of offset=log_trap_eff. By using offset(log_trap_eff), the output is more coherent with the observed values. Here are the new predictions: >> summary(mod$fit) > Min. 1st Qu. Median Mean 3rd Qu. Max. > 10.01 68.14 85.71 83.16 101.00 130.20 > > > * I have tried to create a reproductive example to show the difference between offset(log_trap_eff) and offset=log_trap_eff. > nb_unique <- rnegbin(58, mu=82, theta=13.446) > x <- runif(58,min=-465300,max=435200) > prop_forest <- runif(58,min=0,max=1) > log_trap_eff <- runif(58,min=4,max=6) > > With offset=log_trap_eff: > > mod1 <- gam(nb_unique ~ s(x,prop_forest), offset=log_trap_eff, family=nb(theta=NULL, link="log"), method = "REML", select = TRUE) > >> mod1Pred <- predict.gam(mod1, se.fit=TRUE, type="response") >> summary(mod1Pred$fit) > Min. 1st Qu. Median Mean 3rd Qu. Max. > > 0.5852 0.5852 0.5852 0.5852 0.5852 0.5852 > > > With offset(log_trap_eff): > mod2 <- gam(nb_unique ~ s(x,prop_forest) + offset(log_trap_eff), family=nb(theta=NULL, link="log"), method = "REML", select = TRUE) > >> mod2Pred <- predict.gam(mod2, se.fit=TRUE, type="response") >> summary(mod2Pred$fit) > Min. 1st Qu. Median Mean 3rd Qu. Max. > > 32.03 61.18 97.20 112.20 165.00 226.00 > > > > Value range of observed data: > >> summary(nb_unique) > Min. 1st Qu. Median Mean 3rd Qu. Max. > > 43.00 67.00 81.00 84.16 92.75 153.00 > > > * By using fitted(mod), I obtain NULL. > I am a novice in GAMs. So, I don?t know why the results are different between models with offset=argument and offset(). > Thanks a lot for your help. > Have a nice day > Marine > > > > ________________________________ > De : peter dalgaard <pdalgd at gmail.com> > Envoy? : mardi 22 novembre 2016 23:52 > ? : Bert Gunter > Cc : Marine Regis; r-help at r-project.org > Objet : Re: [R] GAM with the negative binomial distribution: why do predictions no match with original values? > > >> On 22 Nov 2016, at 23:07 , Bert Gunter <bgunter.4567 at gmail.com> wrote: >> >> Define "very different." Sounds like a subjective opinion to me, for >> which I have no response. Apparently others are similarly flummoxed. >> Of course they would not in general be identical. > Er? I don't see much reason to disagree that a range 0.10-0.18 is different from 17-147. > > However, other bits of information are missing: We don't know which gam() function is being used (to my knowledge there is one in package gam but also one in mgcv). We don't have the data, so we cannot reproduce and try to find the root of the problem. > > Offhand, it looks like the predict.gam() function is misbehaving, which could have something to do with the offset term and/or the nb dispersion parameter. On a hunch, does anything change if you use > > nb_unique ~ s(x,prop_forest) + offset(log_trap_eff) > > instead of the offset= argument? And, by the way, does fitted(mod,...) change anything? > > -pd > >> Cheers, >> Bert >> >> >> Bert Gunter >> >> "The trouble with having an open mind is that people keep coming along >> and sticking things into it." >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> >> On Tue, Nov 22, 2016 at 1:29 PM, Marine Regis <marine.regis at hotmail.fr> wrote: >>> Hello, >>> >>>> From capture data, I would like to assess the effect of longitudinal changes in proportion of forests on abundance of skunks. To test this, I built this GAM where the dependent variable is the number of unique skunks and the independent variables are the X coordinates of the centroids of trapping sites (called "X" in the GAM) and the proportion of forests within the trapping sites (called "prop_forest" in the GAM): >>> mod <- gam(nb_unique ~ s(x,prop_forest), offset=log_trap_eff, family=nb(theta=NULL, link="log"), data=succ_capt_skunk, method = "REML", select = TRUE) >>> summary(mod) >>> >>> Family: Negative Binomial(13.446) >>> Link function: log >>> >>> Formula: >>> nb_unique ~ s(x, prop_forest) >>> >>> Parametric coefficients: >>> Estimate Std. Error z value Pr(>|z|) >>> (Intercept) -2.02095 0.03896 -51.87 <2e-16 *** >>> --- >>> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 >>> >>> Approximate significance of smooth terms: >>> edf Ref.df Chi.sq p-value >>> s(x,prop_forest) 3.182 29 17.76 0.000102 *** >>> --- >>> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 >>> >>> R-sq.(adj) = 0.37 Deviance explained = 49% >>> -REML = 268.61 Scale est. = 1 n = 58 >>> >>> >>> I built a GAM for the negative binomial family. When I use the function `predict.gam`, the predictions of capture success from the GAM and the values of capture success from original data are very different. What is the reason for differences occur? >>> >>> **With GAM:** >>> >>> modPred <- predict.gam(mod, se.fit=TRUE,type="response") >>> summary(modPred$fit) >>> Min. 1st Qu. Median Mean 3rd Qu. Max. >>> 0.1026 0.1187 0.1333 0.1338 0.1419 0.1795 >>> >>> **With original data:** >>> >>> summary(succ_capt_skunk$nb_unique) >>> Min. 1st Qu. Median Mean 3rd Qu. Max. >>> 17.00 59.00 82.00 81.83 106.80 147.00 >>> >>> The question has already been posted on Cross validated (http://stats.stackexchange.com/questions/247347/gam-with-the-negative-binomial-distribution-why-do-predictions-no-match-with-or) without success. > [http://cdn.sstatic.net/Sites/stats/img/apple-touch-icon at 2.png?v=344f57aa10cc&a]<http://stats.stackexchange.com/questions/247347/gam-with-the-negative-binomial-distribution-why-do-predictions-no-match-with-or> > > GAM with the negative binomial distribution: why do predictions no match with original values?<http://stats.stackexchange.com/questions/247347/gam-with-the-negative-binomial-distribution-why-do-predictions-no-match-with-or> > stats.stackexchange.com > >From capture data, I would like to assess the effect of longitudinal changes in proportion of forests on abundance of skunks. To test this, I built this GAM where the dependent variable is the numb... > > > >>> Thanks a lot for your time. >>> Have a nice day >>> Marine >>> >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help > thz.ch/mailman/listinfo/r-help> > stat.ethz.ch > The main R mailing list, for announcements about the development of R and the availability of new code, questions and answers about problems and solutions using R ... > > > >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help > thz.ch/mailman/listinfo/r-help> > stat.ethz.ch > The main R mailing list, for announcements about the development of R and the availability of new code, questions and answers about problems and solutions using R ... > > > >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Office: A 4.23 > Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com > > > > > > > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Simon Wood, School of Mathematics, University of Bristol BS8 1TW UK +44 (0)117 33 18273 http://www.maths.bris.ac.uk/~sw15190 [[alternative HTML version deleted]]