thr3ads.net - R help - [R] offset and poisson regression [Jul 2009]

If this information is useful, please help other people find it:
Share via:

Renaud Scheifler

2009-Jul-27 14:27 UTC

[R] offset and poisson regression

Not sure that the list is the best place for this question, but we are 
going mad with this... We are trying to fit a poisson regression to 
count data, eg the number of fledged youngs of blue tits (NPe) as a 
function of the clutch size (GPc) and other environment variables. Here 
are the original data (dumped) (we just omit the environment variables 
to simplify):

tab<-
structure(list(NPe = c(3L, 5L, 2L, 6L, NA, 4L, 4L, 4L, 3L, NA,
NA, 4L, 5L, 2L, 0L, 5L, NA, 1L, NA, 2L, 5L, 4L, 0L, 4L, NA, NA,
6L, 4L, 0L, 4L, 4L, 0L, 6L, 5L, 6L, 3L, NA, 6L, 5L, 3L, 6L, 7L,
NA, 7L, 6L, 4L, NA, 1L, NA, NA, 7L, 6L, NA, 5L, NA, NA, NA, 0L,
0L, NA, NA, 5L, NA, 3L, NA, NA, NA, 5L, NA, NA, 6L, NA, NA, NA,
0L, 6L, NA, NA, NA, NA, 5L, 5L, 4L, NA, 4L, 0L, 4L, 5L, 5L, 4L,
0L, 0L, 5L, 6L, 5L, 1L, NA, 0L, 7L, 0L, 0L, 3L, 3L, 7L, NA, 0L,
6L, 4L, 4L, 5L, 0L, 5L, 4L, 7L, 4L, 7L, 5L, 5L, 0L, NA, 5L, 7L,
NA, 8L, 7L, 5L, 0L), GPc = c(5L, 6L, 6L, 7L, NA, 5L, 6L, 5L,
6L, 6L, 4L, 5L, 5L, 6L, 6L, 6L, 4L, 4L, 4L, 3L, 5L, 6L, 3L, 5L,
5L, 7L, 6L, 5L, 5L, 5L, 4L, 5L, 6L, 5L, 6L, 5L, 5L, 7L, 6L, 4L,
7L, 8L, 9L, 7L, 7L, 7L, 4L, 5L, 5L, 4L, 7L, 6L, 5L, 5L, 6L, 2L,
7L, 6L, 8L, NA, NA, 7L, 6L, 6L, NA, 6L, 6L, 5L, 5L, 5L, 7L, 7L,
6L, 6L, 6L, 6L, 7L, 5L, 5L, 7L, 7L, 6L, 6L, 8L, 6L, 7L, 5L, 5L,
8L, 8L, 7L, 7L, 6L, 7L, 6L, 5L, 6L, 7L, 8L, 6L, 7L, 7L, 5L, 7L,
6L, 5L, 9L, 5L, 4L, 7L, 6L, 6L, 5L, 8L, 5L, 7L, 6L, 7L, 7L, 7L,
6L, 7L, 5L, 8L, 7L, 7L, 6L)), .Names = c("NPe", "GPc"),
class =
"data.frame", row.names = c(NA,
-127L))

It seems logical to insert "clutch size" as an offset term, since we
are
actually interested in the ratio fledged youngs/clutch size. However, 
the final results are quite surprising:

modsr0<-glm(NPe~offset(GPc),family="poisson",data=tab)

if we compute the predictions, we get numbers which looks like a gross 
overestimation of the reality (eg 14.6, 39.7, etc...) -including the 
fact that it implies that one can have more fledged youngs than eggs !:

 [1]  0.7  2.0  2.0  5.4  0.7  2.0  0.7  2.0  0.7  0.7  2.0  2.0  2.0  
0.3  0.1  0.7  2.0
[18]  0.1  0.7  2.0  0.7  0.7  0.7  0.3  0.7  2.0  0.7  2.0  0.7  5.4  
2.0  0.3  5.4 14.6
[35]  5.4  5.4  5.4  0.7  5.4  2.0  0.7  2.0 14.6  5.4  2.0  0.7  5.4  
2.0  2.0  5.4  2.0
[52]  2.0  2.0  5.4  0.7  0.7 14.6 14.6  5.4  5.4  2.0  5.4  2.0  0.7  
5.4 14.6  2.0  5.4
[69]  5.4  0.7  5.4  0.7 39.7  0.7  0.3  5.4  2.0  2.0  0.7 14.6  0.7  
5.4  2.0  5.4  5.4
[86]  2.0  5.4 14.6  5.4  5.4  2.0

Otherwise, if clutch size is inserted as a variable (and not as an 
offset), predictions are much more realistic, with no extreme values :

modsr0<-glm(NPe~GPc,family="poisson",data=tab)
round(exp(predict(modsr0)),1)
 [1] 3.2 3.7 3.7 4.4 3.2 3.7 3.2 3.7 3.2 3.2 3.7 3.7 3.7 2.7 2.2 3.2 3.7 
2.2 3.2 3.7 3.2 3.2
[23] 3.2 2.7 3.2 3.7 3.2 3.7 3.2 4.4 3.7 2.7 4.4 5.3 4.4 4.4 4.4 3.2 4.4 
3.7 3.2 3.7 5.3 4.4
[45] 3.7 3.2 4.4 3.7 3.7 4.4 3.7 3.7 3.7 4.4 3.2 3.2 5.3 5.3 4.4 4.4 3.7 
4.4 3.7 3.2 4.4 5.3
[67] 3.7 4.4 4.4 3.2 4.4 3.2 6.2 3.2 2.7 4.4 3.7 3.7 3.2 5.3 3.2 4.4 3.7 
4.4 4.4 3.7 4.4 5.3
[89] 4.4 4.4 3.7

Can any sound statistician provide a hint about what to do or how to 
interprete this ?

Thanks in advance,

Renaud and Patrick

Renaud Lancelot

2009-Jul-27 15:46 UTC

head link

[R] offset and poisson regression

You should use offset(log(Gpc)) instead of offset(Gpc)
> options(width = 65)
> fm <- glm(NPe ~ 1 + offset(log(GPc)), family = poisson,data = tab)
> fitted(fm)       1        2        3        4        6        7        8
3.181818 3.818182 3.818182 4.454545 3.181818 3.818182 3.181818
       9       12       13       14       15       16       18
3.818182 3.181818 3.181818 3.818182 3.818182 3.818182 2.545455
      20       21       22       23       24       27       28
1.909091 3.181818 3.818182 1.909091 3.181818 3.818182 3.181818
      29       30       31       32       33       34       35
3.181818 3.181818 2.545455 3.181818 3.818182 3.181818 3.818182
      36       38       39       40       41       42       44
3.181818 4.454545 3.818182 2.545455 4.454545 5.090909 4.454545
      45       46       48       51       52       54       58
4.454545 4.454545 3.181818 4.454545 3.818182 3.181818 3.818182
      59       62       64       68       71       75       76
5.090909 4.454545 3.818182 3.181818 4.454545 3.818182 3.818182
      81       82       83       85       86       87       88
4.454545 3.818182 3.818182 3.818182 4.454545 3.181818 3.181818
      89       90       91       92       93       94       95
5.090909 5.090909 4.454545 4.454545 3.818182 4.454545 3.818182
      96       98       99      100      101      102      103
3.181818 4.454545 5.090909 3.818182 4.454545 4.454545 3.181818
     104      106      107      108      109      110      111
4.454545 3.181818 5.727273 3.181818 2.545455 4.454545 3.818182
     112      113      114      115      116      117      118
3.818182 3.181818 5.090909 3.181818 4.454545 3.818182 4.454545
     119      121      122      124      125      126      127
4.454545 3.818182 4.454545 5.090909 4.454545 4.454545 3.818182

All the best,

Renaud

2009/7/27 Renaud Scheifler <renaud.scheifler at
univ-fcomte.fr>:> Not sure that the list is the best place for this question, but we are
going
> mad with this... We are trying to fit a poisson regression to count data,
eg
> the number of fledged youngs of blue tits (NPe) as a function of the clutch
> size (GPc) and other environment variables. Here are the original data
> (dumped) (we just omit the environment variables to simplify):
>
> tab<-
> structure(list(NPe = c(3L, 5L, 2L, 6L, NA, 4L, 4L, 4L, 3L, NA,
> NA, 4L, 5L, 2L, 0L, 5L, NA, 1L, NA, 2L, 5L, 4L, 0L, 4L, NA, NA,
> 6L, 4L, 0L, 4L, 4L, 0L, 6L, 5L, 6L, 3L, NA, 6L, 5L, 3L, 6L, 7L,
> NA, 7L, 6L, 4L, NA, 1L, NA, NA, 7L, 6L, NA, 5L, NA, NA, NA, 0L,
> 0L, NA, NA, 5L, NA, 3L, NA, NA, NA, 5L, NA, NA, 6L, NA, NA, NA,
> 0L, 6L, NA, NA, NA, NA, 5L, 5L, 4L, NA, 4L, 0L, 4L, 5L, 5L, 4L,
> 0L, 0L, 5L, 6L, 5L, 1L, NA, 0L, 7L, 0L, 0L, 3L, 3L, 7L, NA, 0L,
> 6L, 4L, 4L, 5L, 0L, 5L, 4L, 7L, 4L, 7L, 5L, 5L, 0L, NA, 5L, 7L,
> NA, 8L, 7L, 5L, 0L), GPc = c(5L, 6L, 6L, 7L, NA, 5L, 6L, 5L,
> 6L, 6L, 4L, 5L, 5L, 6L, 6L, 6L, 4L, 4L, 4L, 3L, 5L, 6L, 3L, 5L,
> 5L, 7L, 6L, 5L, 5L, 5L, 4L, 5L, 6L, 5L, 6L, 5L, 5L, 7L, 6L, 4L,
> 7L, 8L, 9L, 7L, 7L, 7L, 4L, 5L, 5L, 4L, 7L, 6L, 5L, 5L, 6L, 2L,
> 7L, 6L, 8L, NA, NA, 7L, 6L, 6L, NA, 6L, 6L, 5L, 5L, 5L, 7L, 7L,
> 6L, 6L, 6L, 6L, 7L, 5L, 5L, 7L, 7L, 6L, 6L, 8L, 6L, 7L, 5L, 5L,
> 8L, 8L, 7L, 7L, 6L, 7L, 6L, 5L, 6L, 7L, 8L, 6L, 7L, 7L, 5L, 7L,
> 6L, 5L, 9L, 5L, 4L, 7L, 6L, 6L, 5L, 8L, 5L, 7L, 6L, 7L, 7L, 7L,
> 6L, 7L, 5L, 8L, 7L, 7L, 6L)), .Names = c("NPe", "GPc"),
class > "data.frame", row.names = c(NA,
> -127L))
>
> It seems logical to insert "clutch size" as an offset term, since
we are
> actually interested in the ratio fledged youngs/clutch size. However, the
> final results are quite surprising:
>
> modsr0<-glm(NPe~offset(GPc),family="poisson",data=tab)
>
> if we compute the predictions, we get numbers which looks like a gross
> overestimation of the reality (eg 14.6, 39.7, etc...) -including the fact
> that it implies that one can have more fledged youngs than eggs !:
>
> [1] ?0.7 ?2.0 ?2.0 ?5.4 ?0.7 ?2.0 ?0.7 ?2.0 ?0.7 ?0.7 ?2.0 ?2.0 ?2.0 ?0.3
> ?0.1 ?0.7 ?2.0
> [18] ?0.1 ?0.7 ?2.0 ?0.7 ?0.7 ?0.7 ?0.3 ?0.7 ?2.0 ?0.7 ?2.0 ?0.7 ?5.4 ?2.0
> ?0.3 ?5.4 14.6
> [35] ?5.4 ?5.4 ?5.4 ?0.7 ?5.4 ?2.0 ?0.7 ?2.0 14.6 ?5.4 ?2.0 ?0.7 ?5.4 ?2.0
> ?2.0 ?5.4 ?2.0
> [52] ?2.0 ?2.0 ?5.4 ?0.7 ?0.7 14.6 14.6 ?5.4 ?5.4 ?2.0 ?5.4 ?2.0 ?0.7 ?5.4
> 14.6 ?2.0 ?5.4
> [69] ?5.4 ?0.7 ?5.4 ?0.7 39.7 ?0.7 ?0.3 ?5.4 ?2.0 ?2.0 ?0.7 14.6 ?0.7 ?5.4
> ?2.0 ?5.4 ?5.4
> [86] ?2.0 ?5.4 14.6 ?5.4 ?5.4 ?2.0
>
> Otherwise, if clutch size is inserted as a variable (and not as an offset),
> predictions are much more realistic, with no extreme values :
>
> modsr0<-glm(NPe~GPc,family="poisson",data=tab)
> round(exp(predict(modsr0)),1)
> [1] 3.2 3.7 3.7 4.4 3.2 3.7 3.2 3.7 3.2 3.2 3.7 3.7 3.7 2.7 2.2 3.2 3.7 2.2
> 3.2 3.7 3.2 3.2
> [23] 3.2 2.7 3.2 3.7 3.2 3.7 3.2 4.4 3.7 2.7 4.4 5.3 4.4 4.4 4.4 3.2 4.4
3.7
> 3.2 3.7 5.3 4.4
> [45] 3.7 3.2 4.4 3.7 3.7 4.4 3.7 3.7 3.7 4.4 3.2 3.2 5.3 5.3 4.4 4.4 3.7
4.4
> 3.7 3.2 4.4 5.3
> [67] 3.7 4.4 4.4 3.2 4.4 3.2 6.2 3.2 2.7 4.4 3.7 3.7 3.2 5.3 3.2 4.4 3.7
4.4
> 4.4 3.7 4.4 5.3
> [89] 4.4 4.4 3.7
>
> Can any sound statistician provide a hint about what to do or how to
> interprete this ?
>
> Thanks in advance,
>
> Renaud and Patrick
>
>
>
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>


-- 
Renaud Lancelot
EDEN Project, coordinator
http://www.eden-fp6project.net/
<< EDEN International Conference, Montpellier,  10-12 May 2010 >>
<<   http://international-conference2010.eden-fp6project.net/  >>

UMR CIRAD-INRA "Contr?le des maladies animales exotiques et
?mergentes"
Joint research unit "Control of emerging and exotic animal diseases"

CIRAD, Campus International de Baillarguet TA A-DIR / B
F34398 Montpellier
http://www.cirad.fr  http://bluetongue.cirad.fr/

Tel.  +33 4 67 59 37 17  -  Fax  +33 4 67 59 37 95
Secr. +33 4 67 59 37 37  - Cell. +33 6 77 52 08 69

R help - Jul 2009 - offset and poisson regression

[R] offset and poisson regression

[R] offset and poisson regression