thr3ads.net - R help - [R] choosing between Poisson regression models: no interactions vs. interactions [Jul 2007]

If this information is useful, please help other people find it:
Share via:

James Milks

2007-Jul-31 18:13 UTC

[R] choosing between Poisson regression models: no interactions vs. interactions

R gurus,

I'm working on data analysis for a small project.  My response  
variable is total vines per tree (median = 0, mean = 1.65, min = 0,  
max = 24).  My predictors are two categorical variables (four sites  
and four species) and one continuous (tree diameter at breast height  
(DBH)).  The main question I'm attempting to answer is whether or not  
the species identity of a tree has any effects on the number of vines  
clinging to the trunk.  Given that the response variable is count  
data, I decided to use Poisson regression, even though I'm not as  
familiar with it as linear or logit regression.

My problem is deciding which model to use.  I have created several,  
one without interaction terms (Total.vines~Site+Species+DBH), one  
with an interaction term between Site and Species  
(Total.vines~Site*Species+DBH), and one with interactions between all  
variables (Total.vines~Site*Species*DBH).  Here is my output from R  
for the first two models (the last model has the same number (and  
identity) of significant variables as the second model, even though  
the last model had more interaction terms overall):

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Call:
glm(formula = Total.vines ~ Site + Species + DBH, family = poisson)

Deviance Residuals:
     Min       1Q   Median       3Q      Max
-5.2067  -1.2915  -0.7095  -0.3525   6.3756

Coefficients:
                  Estimate Std. Error z value Pr(>|z|)
(Intercept)     -2.987695   0.231428 -12.910  < 2e-16 ***
SiteHuffman Dam  2.725193   0.249423  10.926  < 2e-16 ***
SiteNarrows      1.902987   0.227599   8.361  < 2e-16 ***
SiteSugar Creek  1.752754   0.242186   7.237 4.58e-13 ***
SpeciesFRAM      0.955468   0.157423   6.069 1.28e-09 ***
SpeciesPLOC      1.187903   0.141707   8.383  < 2e-16 ***
SpeciesULAM      0.340792   0.184615   1.846   0.0649 .
DBH              0.020708   0.001292  16.026  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

     Null deviance: 1972.3  on 544  degrees of freedom
Residual deviance: 1290.0  on 537  degrees of freedom
AIC: 1796.0

Number of Fisher Scoring iterations: 6

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Call:
glm(formula = Total.vines ~ Site * Species + DBH, family = poisson,
     data = sycamores.1)

Deviance Residuals:
     Min       1Q   Median       3Q      Max
-4.9815  -1.2370  -0.6339  -0.3403   6.5664

Coefficients: (3 not defined because of singularities)
                               Estimate Std. Error z value Pr(>|z|)
(Intercept)                  -2.788243   0.303064  -9.200  < 2e-16 ***
SiteHuffman Dam               1.838952   0.354127   5.193 2.07e-07 ***
SiteNarrows                   2.252716   0.323184   6.970 3.16e-12 ***
SiteSugar Creek             -12.961519 519.152077  -0.025 0.980082
SpeciesFRAM                  13.938716 519.152230   0.027 0.978580
SpeciesPLOC                   0.240223   0.540676   0.444 0.656824
SpeciesULAM                   1.919586   0.540246   3.553 0.000381 ***
DBH                           0.019984   0.001337  14.946  < 2e-16 ***
SiteHuffman Dam:SpeciesFRAM -11.513823 519.152294  -0.022 0.982306
SiteNarrows:SpeciesFRAM     -13.593127 519.152268  -0.026 0.979111
SiteSugar Creek:SpeciesFRAM         NA         NA      NA       NA
SiteHuffman Dam:SpeciesPLOC         NA         NA      NA       NA
SiteNarrows:SpeciesPLOC       0.397503   0.555218   0.716 0.474028
SiteSugar Creek:SpeciesPLOC  15.640450 519.152277   0.030 0.975966
SiteHuffman Dam:SpeciesULAM  -0.102841   0.610027  -0.169 0.866124
SiteNarrows:SpeciesULAM      -2.809092   0.606804  -4.629 3.67e-06 ***
SiteSugar Creek:SpeciesULAM         NA         NA      NA       NA
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

     Null deviance: 1972.3  on 544  degrees of freedom
Residual deviance: 1178.7  on 531  degrees of freedom
AIC: 1696.6

Number of Fisher Scoring iterations: 13
%%%%%%%%%%%%%%%%%%%%

As you can see, the two models give very different output, especially  
in regards to whether or not the individual species are significant.   
In the no-interaction model, the only species that was not  
significant was ULAM.  In the one-way interaction model, ULAM was the  
only significant species.  My question is this: which model should I  
use when I present this analysis?  I know that the one-way  
interaction model has the lower AIC.  Should I base my choice solely  
on AIC?  The reasons I'm asking is that the second model has only one  
significant interaction term, fewer significant terms overall, and  
three undefined terms.

Thanks for any guidance you can give to someone running his first  
Poisson regression.

Jim Milks

Graduate Student
Environmental Sciences Ph.D. Program
136 Biological Sciences
Wright State University
3640 Colonel Glenn Hwy
Dayton, OH 45435



	[[alternative HTML version deleted]]

Ben Bolker

2007-Jul-31 19:13 UTC

head link

[R] choosing between Poisson regression models: no interactions vs. interactions

James Milks <james.milks <at> wright.edu> writes:
> 
> My problem is deciding which model to use.  I have created several,  
> one without interaction terms (Total.vines~Site+Species+DBH), one  
> with an interaction term between Site and Species  
> (Total.vines~Site*Species+DBH), and one with interactions between all  
> variables (Total.vines~Site*Species*DBH).  Here is my output from R  
> for the first two models (the last model has the same number (and  
> identity) of significant variables as the second model, even though  
> the last model had more interaction terms overall):
> 
  A few comments:

 - the narrow answer to your question is to use the
interaction model: this would be the answer in several
different statistical frameworks.  In information-criteria-land,
the AIC is 10 points lower which constitutes a much better
expected predictive accuracy.  In classical likelihood-ratio-testing
land (try anova(model1,model2,test="Chisq")), you can probably also
reject
the null hypothesis that adding the interaction terms doesn't
improve the model (sorry about the convoluted language, but that's
what you get in LRT-land).  Also, the presence of *any* statistically
significant interaction suggests that you probably can't neglect
interactions.
The number of significant terms in each model is largely
irrelevant.

  - You should probably consider whether there is overdispersion
in your data (e.g. try fitting a quasipoisson or negative binomial
model, although you can't use AIC or LRT (=anova()) with quasipoisson
models), since there usually is in ecological data.

  - your question suggests you might want to read up a bit
more on generalized linear models  -- Agresti's intro to categorical
data analysis or Crawley's intro to data analysis with S-PLUS
would work.  (If you're familiar with logistic regression, Poisson
regression should follow almost exactly the same rules and conventions.)

  good luck,
    Ben Bolker

Kyle.

2007-Jul-31 19:23 UTC

head link

[R] choosing between Poisson regression models: no interactions vs. interactions

If you have access to it, chapter 7, section three of Venables and  
Ripley's "Modern Applied Statistics with S" is probably going to
tell
you everything you need to know.  In general, let's say you're  
wanting to look at the following:

	model 1:	Total.vines ~ Site + Species + DBH
	model 2:	Total.vines ~ Site * Species + DBH

Download the MASS package and load it into the library

 > library(MASS)
 > model1<-glm(Total.vines~Site+Specites+DBH,family=poisson)
 > model2<-addterm(model1, . ~ .*Species+DBH,test="Chisq")

This should return a table including the respective AICs, likelihood  
ratio tests, and the p-value associated with whether or not model2 is  
significantly different than model 1.  In general though, it's best  
to choose the model which minimizes the AIC.  Based on the output you  
included, it appears that the model including the interaction term is  
better, but the above might give you the information you seek.


Kyle H. Ambert
Graduate Student, Dept. Behavioral Neuroscience
Oregon Health & Science University
ambertk at ohsu.edu


On Jul 31, 2007, at 11:13 AM, James Milks wrote:
> R gurus,
>
> I'm working on data analysis for a small project.  My response
> variable is total vines per tree (median = 0, mean = 1.65, min = 0,
> max = 24).  My predictors are two categorical variables (four sites
> and four species) and one continuous (tree diameter at breast height
> (DBH)).  The main question I'm attempting to answer is whether or not
> the species identity of a tree has any effects on the number of vines
> clinging to the trunk.  Given that the response variable is count
> data, I decided to use Poisson regression, even though I'm not as
> familiar with it as linear or logit regression.
>
> My problem is deciding which model to use.  I have created several,
> one without interaction terms (Total.vines~Site+Species+DBH), one
> with an interaction term between Site and Species
> (Total.vines~Site*Species+DBH), and one with interactions between all
> variables (Total.vines~Site*Species*DBH).  Here is my output from R
> for the first two models (the last model has the same number (and
> identity) of significant variables as the second model, even though
> the last model had more interaction terms overall):
>
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> Call:
> glm(formula = Total.vines ~ Site + Species + DBH, family = poisson)
>
> Deviance Residuals:
>      Min       1Q   Median       3Q      Max
> -5.2067  -1.2915  -0.7095  -0.3525   6.3756
>
> Coefficients:
>                   Estimate Std. Error z value Pr(>|z|)
> (Intercept)     -2.987695   0.231428 -12.910  < 2e-16 ***
> SiteHuffman Dam  2.725193   0.249423  10.926  < 2e-16 ***
> SiteNarrows      1.902987   0.227599   8.361  < 2e-16 ***
> SiteSugar Creek  1.752754   0.242186   7.237 4.58e-13 ***
> SpeciesFRAM      0.955468   0.157423   6.069 1.28e-09 ***
> SpeciesPLOC      1.187903   0.141707   8.383  < 2e-16 ***
> SpeciesULAM      0.340792   0.184615   1.846   0.0649 .
> DBH              0.020708   0.001292  16.026  < 2e-16 ***
> ---
> Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
>
> (Dispersion parameter for poisson family taken to be 1)
>
>      Null deviance: 1972.3  on 544  degrees of freedom
> Residual deviance: 1290.0  on 537  degrees of freedom
> AIC: 1796.0
>
> Number of Fisher Scoring iterations: 6
>
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> Call:
> glm(formula = Total.vines ~ Site * Species + DBH, family = poisson,
>      data = sycamores.1)
>
> Deviance Residuals:
>      Min       1Q   Median       3Q      Max
> -4.9815  -1.2370  -0.6339  -0.3403   6.5664
>
> Coefficients: (3 not defined because of singularities)
>                                Estimate Std. Error z value Pr(>|z|)
> (Intercept)                  -2.788243   0.303064  -9.200  < 2e-16 ***
> SiteHuffman Dam               1.838952   0.354127   5.193 2.07e-07 ***
> SiteNarrows                   2.252716   0.323184   6.970 3.16e-12 ***
> SiteSugar Creek             -12.961519 519.152077  -0.025 0.980082
> SpeciesFRAM                  13.938716 519.152230   0.027 0.978580
> SpeciesPLOC                   0.240223   0.540676   0.444 0.656824
> SpeciesULAM                   1.919586   0.540246   3.553 0.000381 ***
> DBH                           0.019984   0.001337  14.946  < 2e-16 ***
> SiteHuffman Dam:SpeciesFRAM -11.513823 519.152294  -0.022 0.982306
> SiteNarrows:SpeciesFRAM     -13.593127 519.152268  -0.026 0.979111
> SiteSugar Creek:SpeciesFRAM         NA         NA      NA       NA
> SiteHuffman Dam:SpeciesPLOC         NA         NA      NA       NA
> SiteNarrows:SpeciesPLOC       0.397503   0.555218   0.716 0.474028
> SiteSugar Creek:SpeciesPLOC  15.640450 519.152277   0.030 0.975966
> SiteHuffman Dam:SpeciesULAM  -0.102841   0.610027  -0.169 0.866124
> SiteNarrows:SpeciesULAM      -2.809092   0.606804  -4.629 3.67e-06 ***
> SiteSugar Creek:SpeciesULAM         NA         NA      NA       NA
> ---
> Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
>
> (Dispersion parameter for poisson family taken to be 1)
>
>      Null deviance: 1972.3  on 544  degrees of freedom
> Residual deviance: 1178.7  on 531  degrees of freedom
> AIC: 1696.6
>
> Number of Fisher Scoring iterations: 13
> %%%%%%%%%%%%%%%%%%%%
>
> As you can see, the two models give very different output, especially
> in regards to whether or not the individual species are significant.
> In the no-interaction model, the only species that was not
> significant was ULAM.  In the one-way interaction model, ULAM was the
> only significant species.  My question is this: which model should I
> use when I present this analysis?  I know that the one-way
> interaction model has the lower AIC.  Should I base my choice solely
> on AIC?  The reasons I'm asking is that the second model has only one
> significant interaction term, fewer significant terms overall, and
> three undefined terms.
>
> Thanks for any guidance you can give to someone running his first
> Poisson regression.
>
> Jim Milks
>
> Graduate Student
> Environmental Sciences Ph.D. Program
> 136 Biological Sciences
> Wright State University
> 3640 Colonel Glenn Hwy
> Dayton, OH 45435
>
>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

Reasonably Related Threads

Search for more maybe matching threads

R help - Jul 2007 - choosing between Poisson regression models: no interactions vs. interactions

[R] choosing between Poisson regression models: no interactions vs. interactions

[R] choosing between Poisson regression models: no interactions vs. interactions

[R] choosing between Poisson regression models: no interactions vs. interactions

Reasonably Related Threads