077315q at acadiau.ca
2012-Sep-08 19:47 UTC
[R] Using predict() After Adding a Factor to a glm.nb() Model
# Hello, # I have a data set that looks something like the following: site<-c(rep('a',5),rep('b',2),rep('c',4),rep('d',11)) year<-c(1980, 1981, 1982, 1993, 1995, 1980, 1983, 1981, 1993, 1995, 1999, c(1980:1990)) count<-c(60,35,36,12,8,112,98,20,13,15,15,65,43,49,51,34,33,33,33,40,11,0) data<-data.frame(site, year, count) # > site year count # 1 a 1980 60 # 2 a 1981 35 # 3 a 1982 36 # 4 a 1993 12 # . # . # . # I ran a negative binomial glm using: library(MASS) model_a<-glm.nb(count~year, data=data) # then extracted predicted values using: py<-data.frame(seq(from=min(data$year), to=max(data$year), by=1)) p1<-predict(model_a, newdata=py, se.fit=TRUE, type='response') f1<-p1$fit plot(count~year, data=data) lines(py$year, f1) # Works perfectly. # Now, I want to add site as a factor: model_b<-glm.nb(count~year+as.factor(site), data=data) # I have tried a couple different ways of predicting the values based on model_b, but am having trouble. py<-data.frame(seq(from=min(data$year), to=max(data$year), by=1)) p1<-predict(model_b, newdata=py, se.fit=TRUE, type='response') # gives: # >Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : # variable lengths differ (found for 'as.factor(site)') # In addition: Warning message: # 'newdata' had 20 rows but variable(s) found have 22 rows py<-expand.grid(data$site, data$year) names(py)<-c('site','year') p1<-predict(model_b, newdata=py) # This works, but results in 484 values, and I can't plot a line over my points. # There is probably a simple solution, but I'm having trouble wrapping my mind around it. Mind you, this is also a last # minute change to my thesis, and I haven't slept in about three days. # Any suggestions? I would be extremely grateful... # Banging my head against the wall, # A stressed out grad student
David L Carlson
2012-Sep-08 21:21 UTC
[R] Using predict() After Adding a Factor to a glm.nb() Model
Actually the first one did not work. You left out the warning message:> py<-data.frame(seq(from=min(data$year), to=max(data$year), by=1)) > p1<-predict(model_a, newdata=py, se.fit=TRUE, type='response')Warning message: 'newdata' had 20 rows but variable(s) found have 22 rows Since py did not contain the explanatory variable "year," predict() threw a warning and then just used the values in data giving you the same results that you already have stored in model_a. Compare> model_a$fitted.values1 2 3 4 5 6 7 8 61.444110 55.163093 49.524141 15.124074 12.190050 61.444110 44.461622 55.163093 9 10 11 12 13 14 15 16 15.124074 12.190050 7.919156 61.444110 55.163093 49.524141 44.461622 39.916610 17 18 19 20 21 22 35.836204 32.172911 28.884091 25.931465 23.280666 20.900841> p1$fit1 2 3 4 5 6 7 8 61.444110 55.163093 49.524141 15.124074 12.190050 61.444110 44.461622 55.163093 9 10 11 12 13 14 15 16 15.124074 12.190050 7.919156 61.444110 55.163093 49.524141 44.461622 39.916610 17 18 19 20 21 22 35.836204 32.172911 28.884091 25.931465 23.280666 20.900841 Set the variable name to year in py and then use predict():> py<-data.frame(year=seq(from=min(data$year), to=max(data$year), by=1)) > p1<-predict(model_a, newdata=py, se.fit=TRUE, type='response')Now run your plotting commands. The plot may not look much different, but now it is based on the 20 evenly spaced values in py instead of the 22 original data values in data. Now you have the right idea for adding site to py but instead of using the irregularly spaced data, use evenly spaced values. Then you will have to plot separate lines for each site:> model_b<-glm.nb(count~year+site, data=data) > py2 <- expand.grid(site=letters[1:4], year=1980:1999) > p2<-predict(model_b, newdata=py2, se.fit=TRUE, type='response') > py2 <- data.frame(py2, p2) > plot(count~year, data=data, pch=as.character(site)) > x <- unstack(py2, py2$year~py2$site) > y <- unstack(py2, py2$fit~py2$site) > matlines(x, y) > legend("topright", letters[1:4], col=1:4, lty=1:4)---------------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77843-4352> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of 077315q at acadiau.ca > Sent: Saturday, September 08, 2012 2:47 PM > To: r-help at r-project.org > Subject: [R] Using predict() After Adding a Factor to a glm.nb() Model > > # Hello, > > # I have a data set that looks something like the following: > > site<-c(rep('a',5),rep('b',2),rep('c',4),rep('d',11)) > year<-c(1980, 1981, 1982, 1993, 1995, 1980, 1983, 1981, 1993, 1995, > 1999, c(1980:1990)) > count<- > c(60,35,36,12,8,112,98,20,13,15,15,65,43,49,51,34,33,33,33,40,11,0) > > data<-data.frame(site, year, count) > > # > site year count > # 1 a 1980 60 > # 2 a 1981 35 > # 3 a 1982 36 > # 4 a 1993 12 > # . > # . > # . > > # I ran a negative binomial glm using: > > library(MASS) > model_a<-glm.nb(count~year, data=data) > > # then extracted predicted values using: > > py<-data.frame(seq(from=min(data$year), to=max(data$year), by=1)) > p1<-predict(model_a, newdata=py, se.fit=TRUE, type='response') > f1<-p1$fit > > plot(count~year, data=data) > lines(py$year, f1) > > # Works perfectly. > > # Now, I want to add site as a factor: > model_b<-glm.nb(count~year+as.factor(site), data=data) > > # I have tried a couple different ways of predicting the values based > on model_b, but am having trouble. > > py<-data.frame(seq(from=min(data$year), to=max(data$year), by=1)) > p1<-predict(model_b, newdata=py, se.fit=TRUE, type='response') > > # gives: > # >Error in model.frame.default(Terms, newdata, na.action = na.action, > xlev = object$xlevels) : > # variable lengths differ (found for 'as.factor(site)') > # In addition: Warning message: > # 'newdata' had 20 rows but variable(s) found have 22 rows > > py<-expand.grid(data$site, data$year) > names(py)<-c('site','year') > p1<-predict(model_b, newdata=py) > > # This works, but results in 484 values, and I can't plot a line over > my points. > > # There is probably a simple solution, but I'm having trouble wrapping > my mind around it. Mind you, this is also a last > # minute change to my thesis, and I haven't slept in about three days. > > # Any suggestions? I would be extremely grateful... > > # Banging my head against the wall, > # A stressed out grad student > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.