On Thu, Jan 31, 2013 at 2:13 PM, Wim Kreinen <wkreinen@gmail.com> wrote:
> Hello,
>
> I have a question about modelling via glm.
I think you are way off track. Either the data, glm, or both, are not what
you think they are.
> I have a dataset
skn300.tab <- structure(list(n = 1:97, freq = c(0L, 0L, 0L, 0L, 1L, 7L, 40L,
100L, 276L, 543L, 952L, 1414L, 1853L, 2199L, 2435L, 2270L, 2042L,
1679L, 1386L, 1108L, 922L, 792L, 642L, 597L, 453L, 424L, 370L,
297L, 278L, 218L, 208L, 172L, 174L, 149L, 124L, 98L, 98L, 67L,
78L, 67L, 46L, 34L, 31L, 42L, 34L, 21L, 28L, 18L, 18L, 18L, 10L,
19L, 6L, 9L, 10L, 6L, 6L, 5L, 3L, 9L, 4L, 3L, 4L, 5L, 2L, 6L,
4L, 2L, 2L, 3L, 3L, 0L, 0L, 0L, 0L, 2L, 1L, 0L, 0L, 0L, 0L, 0L,
1L, 0L, 0L, 1L, 2L, 1L, 0L, 0L, 0L, 0L, 2L, 0L, 0L, 0L, 1L),
kum = c(0L, 0L, 0L, 0L, 1L, 8L, 48L, 148L, 424L, 967L, 1919L,
3333L, 5186L, 7385L, 9820L, 12090L, 14132L, 15811L, 17197L,
18305L, 19227L, 20019L, 20661L, 21258L, 21711L, 22135L, 22505L,
22802L, 23080L, 23298L, 23506L, 23678L, 23852L, 24001L, 24125L,
24223L, 24321L, 24388L, 24466L, 24533L, 24579L, 24613L, 24644L,
24686L, 24720L, 24741L, 24769L, 24787L, 24805L, 24823L, 24833L,
24852L, 24858L, 24867L, 24877L, 24883L, 24889L, 24894L, 24897L,
24906L, 24910L, 24913L, 24917L, 24922L, 24924L, 24930L, 24934L,
24936L, 24938L, 24941L, 24944L, 24944L, 24944L, 24944L, 24944L,
24946L, 24947L, 24947L, 24947L, 24947L, 24947L, 24947L, 24948L,
24948L, 24948L, 24949L, 24951L, 24952L, 24952L, 24952L, 24952L,
24952L, 24954L, 24954L, 24954L, 24954L, 24955L)), .Names = c("n",
"freq", "kum"), row.names = c(NA, -97L), class =
"data.frame")
> that looks like as if it where poisson distributed (actually I would
> appreciate that) but it isnt
plot(skn300.tab)
My guess, we are looking at the pdf and cdf (maybe even of a Poisson
process), but not at any "data" that lends itself to a (generalized)
linear
model. Consult a statistician, post on stackexchange, read about
regression, or better define your actual R problem here, demonstrating this
is not homework - see the posting guide.
Cheers
> because mean unequals var.
>
>
> > mean (x)
> [1] 901.7827
> > var (x)
> [1] 132439.3
>
>
> Anyway, I tried to model it via poisson and quasipoisson. Actually, just to
> get an impression how glm works. But I dont know how to interprete the
> data. Of course this is the case because my knowledge concerning logistic
> regressions is rather limited. Hoping there is somebody with mercy I would
> like to understand which parameters are important, e.g. which paramter
> might give me a hint that a poisson model is a bad idea. For hints
> concerning some tutorials about reading glm-output I would appreciate as
> well.
>
> Thanks
> Wim
>
>
> > skn300.glmp <- glm (freq~n, data=skn300.tab, family=poisson)
> > summary (skn300.glmp)
>
> Call:
> glm(formula = freq ~ n, family = poisson, data = skn300.tab)
>
> Deviance Residuals:
> Min 1Q Median 3Q Max
> -51.332 -9.383 -6.599 -3.959 55.111
>
> Coefficients:
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) 7.2374375 0.0093285 775.8 <2e-16 ***
> n -0.0539424 0.0003699 -145.8 <2e-16 ***
> ---
> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> (Dispersion parameter for poisson family taken to be 1)
>
> Null deviance: 71731 on 96 degrees of freedom
> Residual deviance: 37383 on 95 degrees of freedom
> AIC: 37800
>
> Number of Fisher Scoring iterations: 6
>
> >
> > skn300.glmq <- glm (freq~n, data=skn300.tab, family=quasipoisson)
> > summary (skn300.glmq)
>
> Call:
> glm(formula = freq ~ n, family = quasipoisson, data = skn300.tab)
>
> Deviance Residuals:
> Min 1Q Median 3Q Max
> -51.332 -9.383 -6.599 -3.959 55.111
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 7.237438 0.186381 38.831 < 2e-16 ***
> n -0.053942 0.007391 -7.298 8.8e-11 ***
> ---
> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> (Dispersion parameter for quasipoisson family taken to be 399.1874)
>
> Null deviance: 71731 on 96 degrees of freedom
> Residual deviance: 37383 on 95 degrees of freedom
> AIC: NA
>
> Number of Fisher Scoring iterations: 6
>
>
> > dput (skn300.tab)
> structure(list(n = 1:97, freq = c(0L, 0L, 0L, 0L, 1L, 7L, 40L,
> 100L, 276L, 543L, 952L, 1414L, 1853L, 2199L, 2435L, 2270L, 2042L,
> 1679L, 1386L, 1108L, 922L, 792L, 642L, 597L, 453L, 424L, 370L,
> 297L, 278L, 218L, 208L, 172L, 174L, 149L, 124L, 98L, 98L, 67L,
> 78L, 67L, 46L, 34L, 31L, 42L, 34L, 21L, 28L, 18L, 18L, 18L, 10L,
> 19L, 6L, 9L, 10L, 6L, 6L, 5L, 3L, 9L, 4L, 3L, 4L, 5L, 2L, 6L,
> 4L, 2L, 2L, 3L, 3L, 0L, 0L, 0L, 0L, 2L, 1L, 0L, 0L, 0L, 0L, 0L,
> 1L, 0L, 0L, 1L, 2L, 1L, 0L, 0L, 0L, 0L, 2L, 0L, 0L, 0L, 1L),
> kum = c(0L, 0L, 0L, 0L, 1L, 8L, 48L, 148L, 424L, 967L, 1919L,
> 3333L, 5186L, 7385L, 9820L, 12090L, 14132L, 15811L, 17197L,
> 18305L, 19227L, 20019L, 20661L, 21258L, 21711L, 22135L, 22505L,
> 22802L, 23080L, 23298L, 23506L, 23678L, 23852L, 24001L, 24125L,
> 24223L, 24321L, 24388L, 24466L, 24533L, 24579L, 24613L, 24644L,
> 24686L, 24720L, 24741L, 24769L, 24787L, 24805L, 24823L, 24833L,
> 24852L, 24858L, 24867L, 24877L, 24883L, 24889L, 24894L, 24897L,
> 24906L, 24910L, 24913L, 24917L, 24922L, 24924L, 24930L, 24934L,
> 24936L, 24938L, 24941L, 24944L, 24944L, 24944L, 24944L, 24944L,
> 24946L, 24947L, 24947L, 24947L, 24947L, 24947L, 24947L, 24948L,
> 24948L, 24948L, 24949L, 24951L, 24952L, 24952L, 24952L, 24952L,
> 24952L, 24954L, 24954L, 24954L, 24954L, 24955L)), .Names =
c("n",
> "freq", "kum"), row.names = c(NA, -97L), class =
"data.frame")
>
> [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
[[alternative HTML version deleted]]