Hi, I wonder if predict.gam is supposed to work with family=negbin() definition? It seems to me that the values returned by type="response" are far off the observed values. Here is an example output from the negbin examples: > set.seed(3) > n<-400 > dat<-gamSim(1,n=n) > g<-exp(dat$f/5) > dat$y<-rnbinom(g,size=3,mu=g) > b<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=negbin(3),data=dat) > summary(y) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.6061 1.6340 2.8120 2.7970 3.9250 4.9830 > summary(predict(b,type="response")) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.8972 3.1610 4.8140 6.1170 8.1300 28.0100 I.e. the range and mean of observed values (y) are smaller than those of the predictions from the gam model. Should I somehow apply the estimated theta on these predictions? regards, Kari
On Wed, 26 Oct 2011, Kari Ruohonen wrote:> Hi, > I wonder if predict.gam is supposed to work with family=negbin() definition? > It seems to me that the values returned by type="response" are far off the > observed values. Here is an example output from the negbin examples: > >> set.seed(3) >> n<-400 >> dat<-gamSim(1,n=n) >> g<-exp(dat$f/5) >> dat$y<-rnbinom(g,size=3,mu=g) >> b<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=negbin(3),data=dat) >> summary(y) > Min. 1st Qu. Median Mean 3rd Qu. Max. > 0.6061 1.6340 2.8120 2.7970 3.9250 4.9830 >> summary(predict(b,type="response")) > Min. 1st Qu. Median Mean 3rd Qu. Max. > 0.8972 3.1610 4.8140 6.1170 8.1300 28.0100 > > I.e. the range and mean of observed values (y)What exactly is "y" in the code above? I guess you mean dat$y: R> summary(dat$y) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.000 2.000 4.000 6.235 8.000 68.000 which looks rather reasonable... Z> are smaller than those of the > predictions from the gam model. Should I somehow apply the estimated theta on > these predictions? > > regards, Kari > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
On 26/10/11 12:10, Achim Zeileis wrote:> On Wed, 26 Oct 2011, Kari Ruohonen wrote: > >> Hi, >> I wonder if predict.gam is supposed to work with family=negbin() >> definition? It seems to me that the values returned by >> type="response" are far off the observed values. Here is an example >> output from the negbin examples: >> >>> set.seed(3) >>> n<-400 >>> dat<-gamSim(1,n=n) >>> g<-exp(dat$f/5) >>> dat$y<-rnbinom(g,size=3,mu=g) >>> b<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=negbin(3),data=dat) >>> summary(y) >> Min. 1st Qu. Median Mean 3rd Qu. Max. >> 0.6061 1.6340 2.8120 2.7970 3.9250 4.9830 >>> summary(predict(b,type="response")) >> Min. 1st Qu. Median Mean 3rd Qu. Max. >> 0.8972 3.1610 4.8140 6.1170 8.1300 28.0100 >> >> I.e. the range and mean of observed values (y) > > What exactly is "y" in the code above? I guess you mean dat$y: > > R> summary(dat$y) > Min. 1st Qu. Median Mean 3rd Qu. Max. > 0.000 2.000 4.000 6.235 8.000 68.000 > > which looks rather reasonable... > ZThanks - what a stupid mistake, an old .RData hanging around even if I started a new R instance. Terribly sorry and many apologies. Kari