thr3ads.net - R help - [R] Systematically biased count data regression model [Aug 2007]

If this information is useful, please help other people find it:
Share via:

Matthew and Kim Bowser

2007-Aug-09 00:30 UTC

[R] Systematically biased count data regression model

Dear all,

I am attempting to explain patterns of arthropod family richness
(count data) using a regression model.  It seems to be able to do a
pretty good job as an explanatory model (i.e. demonstrating
relationships between dependent and independent variables), but it has
systematic problems as a predictive model:  It is biased high at low
observed values of family richness and biased low at high observed
values of family richness (see attached pdf).  I have tried diverse
kinds of reasonable regression models mostly as in Zeileis, et al.
(2007), as well as transforming my variables, both with only small
improvements.

Do you have suggestions for making a model that would perform better
as a predictive model?

Thank you for your time.

Sincerely,

Matthew Bowser

STEP student
USFWS Kenai National Wildlife Refuge
Soldotna, Alaska, USA

M.Sc. student
University of Alaska Fairbanks
Fairbankse, Alaska, USA

Reference

Zeileis, A., C. Kleiber, and S. Jackman, 2007. Regression models for
count data in R. Technical Report 53, Department of Statistics and
Mathematics, Wirtschaftsuniversit?t Wien, Wien, Austria. URL
http://cran.r-project.org/doc/vignettes/pscl/countreg.pdf.

Code

`data` <-
structure(list(D = c(4, 5, 12, 4, 9, 15, 4, 8, 3, 9, 6, 17, 4,
9, 6, 9, 3, 9, 7, 11, 17, 3, 10, 8, 9, 6, 7, 9, 7, 5, 15, 15,
12, 9, 10, 4, 4, 15, 7, 7, 12, 7, 12, 7, 7, 7, 5, 14, 7, 13,
1, 9, 2, 13, 6, 8, 2, 10, 5, 14, 4, 13, 5, 17, 12, 13, 7, 12,
5, 6, 10, 6, 6, 10, 4, 4, 12, 10, 3, 4, 4, 6, 7, 15, 1, 8, 8,
5, 12, 0, 5, 7, 4, 9, 6, 10, 5, 7, 7, 14, 3, 8, 15, 14, 7, 8,
7, 8, 8, 10, 9, 2, 7, 8, 2, 6, 7, 9, 3, 20, 10, 10, 4, 2, 8,
10, 10, 8, 8, 12, 8, 6, 16, 10, 5, 1, 1, 5, 3, 11, 4, 9, 16,
3, 1, 6, 5, 5, 7, 11, 11, 5, 7, 5, 3, 2, 3, 0, 3, 0, 4, 1, 12,
16, 9, 0, 7, 0, 11, 7, 9, 4, 16, 9, 10, 0, 1, 9, 15, 6, 8, 6,
4, 6, 7, 5, 7, 14, 16, 5, 8, 1, 8, 2, 10, 9, 6, 11, 3, 16, 3,
6, 8, 12, 5, 1, 1, 3, 3, 1, 5, 15, 4, 2, 2, 6, 5, 0, 0, 0, 3,
0, 16, 0, 9, 0, 0, 8, 1, 2, 2, 3, 4, 17, 4, 1, 4, 6, 4, 3, 15,
2, 2, 13, 1, 9, 7, 7, 13, 10, 11, 2, 15, 7), Day = c(159, 159,
159, 159, 166, 175, 161, 168, 161, 166, 161, 166, 161, 161, 161,
175, 161, 175, 161, 165, 176, 161, 163, 161, 168, 161, 161, 161,
161, 161, 165, 176, 175, 176, 163, 175, 163, 168, 163, 176, 176,
165, 176, 175, 161, 163, 163, 168, 163, 175, 167, 176, 167, 165,
165, 169, 165, 169, 165, 161, 165, 175, 165, 176, 175, 167, 167,
175, 167, 164, 167, 164, 181, 164, 167, 164, 176, 164, 167, 164,
167, 164, 167, 175, 167, 173, 176, 173, 178, 167, 173, 172, 173,
178, 178, 172, 181, 182, 173, 162, 162, 173, 178, 173, 172, 162,
173, 162, 173, 162, 173, 170, 178, 166, 166, 162, 166, 177, 166,
170, 166, 172, 172, 166, 172, 166, 174, 162, 164, 162, 170, 164,
170, 164, 170, 164, 177, 164, 164, 174, 174, 162, 170, 162, 172,
162, 165, 162, 165, 177, 172, 162, 170, 162, 170, 174, 165, 174,
166, 172, 174, 172, 174, 170, 170, 165, 170, 174, 174, 172, 174,
172, 174, 165, 170, 165, 170, 174, 172, 174, 172, 175, 175, 170,
171, 174, 174, 174, 172, 175, 171, 175, 174, 174, 174, 175, 172,
171, 171, 174, 160, 175, 160, 171, 170, 175, 170, 170, 160, 160,
160, 171, 171, 171, 171, 160, 160, 160, 171, 171, 176, 171, 176,
176, 171, 176, 171, 176, 176, 176, 176, 159, 166, 159, 159, 166,
168, 169, 159, 168, 169, 166, 163, 180, 163, 165, 164, 180, 166,
166, 164, 164, 177, 166), NDVI = c(0.187, 0.2, 0.379, 0.253,
0.356, 0.341, 0.268, 0.431, 0.282, 0.181, 0.243, 0.327, 0.26,
0.232, 0.438, 0.275, 0.169, 0.288, 0.138, 0.404, 0.386, 0.194,
0.266, 0.23, 0.333, 0.234, 0.258, 0.333, 0.234, 0.096, 0.354,
0.394, 0.304, 0.162, 0.565, 0.348, 0.345, 0.226, 0.316, 0.312,
0.333, 0.28, 0.325, 0.243, 0.194, 0.29, 0.221, 0.217, 0.122,
0.289, 0.475, 0.048, 0.416, 0.481, 0.159, 0.238, 0.183, 0.28,
0.32, 0.288, 0.24, 0.287, 0.363, 0.367, 0.24, 0.55, 0.441, 0.34,
0.295, 0.23, 0.32, 0.184, 0.306, 0.232, 0.289, 0.341, 0.221,
0.333, 0.17, 0.139, 0.2, 0.204, 0.301, 0.253, -0.08, 0.309, 0.232,
0.23, 0.239, -0.12, 0.26, 0.285, 0.45, 0.348, 0.396, 0.311, 0.318,
0.31, 0.261, 0.441, 0.147, 0.283, 0.339, 0.224, 0.5, 0.265, 0.2,
0.287, 0.398, 0.116, 0.292, 0.045, 0.137, 0.542, 0.171, 0.38,
0.469, 0.325, 0.139, 0.166, 0.247, 0.253, 0.466, 0.26, 0.288,
0.34, 0.288, 0.26, 0.178, 0.274, 0.358, 0.285, 0.225, 0.162,
0.223, 0.301, -0.398, -0.2, 0.239, 0.228, 0.255, 0.166, 0.306,
0.28, 0.279, 0.208, 0.377, 0.413, 0.489, 0.417, 0.333, 0.208,
0.232, 0.431, 0.283, 0.241, 0.105, 0.18, -0.172, -0.374, 0.25,
0.043, 0.215, 0.204, 0.19, 0.177, -0.106, -0.143, 0.062, 0.462,
0.256, 0.229, 0.314, 0.415, 0.307, 0.238, -0.35, 0.34, 0.275,
0.097, 0.353, 0.214, 0.435, 0.055, -0.289, 0.239, 0.186, 0.135,
0.259, 0.268, 0.258, 0.032, 0.489, 0.389, 0.298, 0.164, 0.325,
0.254, -0.059, 0.524, 0.539, 0.25, 0.175, 0.326, 0.302, -0.047,
-0.301, -0.149, 0.358, 0.495, 0.311, 0.235, 0.558, -0.156, 0,
0.146, 0.329, -0.069, -0.352, -0.356, -0.206, -0.179, 0.467,
-0.325, 0.39, -0.399, -0.165, 0.267, -0.334, -0.17, 0.58, 0.228,
0.234, 0.351, 0.3, -0.018, 0.125, 0.176, 0.322, 0.246, 0.376,
-0.185, 0.342, 0.142, -0.075, 0.186, 0.333, 0.112, 0.272, 0.277,
0.203, 0.37, 0.465, 0.425), VegS = c(14, 11, 18, 21, 31, 20,
11, 17, 10, 15, 27, 8, 17, 13, 16, 9, 10, 16, 10, 15, 11, 9,
14, 11, 11, 10, 24, 18, 12, 6, 25, 21, 25, 8, 14, 18, 11, 16,
20, 16, 10, 16, 18, 14, 13, 11, 15, 23, 11, 28, 12, 17, 12, 18,
10, 15, 7, 15, 9, 16, 18, 16, 20, 18, 12, 19, 16, 18, 20, 15,
24, 9, 15, 9, 16, 14, 17, 14, 7, 9, 9, 12, 13, 15, 14, 11, 17,
8, 14, 15, 12, 8, 10, 12, 8, 16, 15, 22, 16, 21, 10, 15, 20,
14, 27, 21, 19, 22, 21, 11, 13, 10, 13, 14, 9, 22, 22, 20, 12,
16, 20, 19, 26, 14, 13, 23, 14, 22, 19, 15, 28, 16, 20, 25, 10,
19, 0, 10, 8, 11, 17, 13, 17, 23, 37, 32, 19, 26, 12, 11, 24,
11, 21, 25, 8, 15, 21, 31, 17, 0, 12, 6, 23, 19, 29, 14, 9, 0,
18, 23, 20, 15, 15, 17, 27, 17, 2, 24, 17, 16, 26, 11, 23, 24,
10, 26, 21, 12, 20, 12, 29, 22, 20, 16, 41, 19, 27, 28, 10, 35,
28, 23, 14, 5, 23, 15, 17, 12, 11, 24, 11, 14, 7, 7, 27, 17,
15, 10, 16, 2, 11, 21, 18, 15, 8, 17, 10, 18, 15, 18, 31, 9,
14, 11, 19, 22, 8, 19, 17, 18, 25, 11, 17, 32, 25, 18, 21, 19,
35, 14, 29, 9, 28, 14), T = c(13, 15.4, 15.6, 12.3, 12.7, 13.3,
6, 13, 9.2, 17.8, 9.4, 14.2, 8.7, 14, 8.6, 13.9, 9.2, 15.1, 9.4,
16.5, 14, 11.5, 15.5, 13.3, 12.7, 14, 10.5, 14, 10.1, 16.7, 15.2,
11.2, 11.7, 17.9, 13.3, 13.9, 8.7, 16.7, 7.8, 7.9, 10.9, 15.5,
14, 14.1, 14.3, 13.3, 11.6, 16.5, 12.7, 12.6, 8.3, 9, 12.4, 15,
11.8, 14.1, 10.5, 12.4, 10.5, 17.5, 9.2, 16.3, 5.3, 5.9, 11.9,
8.9, 7.7, 15.2, 8.6, 13.2, 9.5, 15.8, 12.5, 13.8, 10.7, 10.5,
7.7, 11, 9.3, 14.6, 12, 15.4, 12.3, 14, 8.3, 19.8, 15.5, 14.3,
9, 7.6, 15.3, 12.8, 14.4, 14, 10.5, 8.9, 13.4, 12.8, 12.9, 11.2,
13.1, 10, 12.4, 15.4, 7.6, 14.9, 13.1, 11.1, 8.6, 13.6, 8.4,
11.5, 12.5, 15.6, 8.3, 9.6, 8.7, 9.7, 10.5, 12.8, 8.6, 12.7,
6.7, 8.5, 9.9, 8.3, 12.8, 9.8, 10.5, 10.7, 9.6, 11.1, 13.7, 9.5,
8.8, 3.4, 10.2, 3.5, 8.4, 11.9, 12.3, 10.2, 13.1, 8.4, 3.1, 7.2,
13.2, 7.6, 11.3, 13.5, 7.8, 5.5, 10.7, 4.8, 7.9, 13.5, 13.5,
2.1, 7.2, 9.7, 12, 9.2, 13.2, 12, 17.1, 7.9, 12.7, 11.8, 16,
6.4, 12.9, 8.1, 12.6, 10.2, 13.3, 8.5, 9.7, 9.9, 18.2, 11, 8.4,
1.5, 7.3, 10.6, 13.6, 8.4, 7.2, 13, 15.5, 9.7, 13.2, 5.9, 9.5,
10.4, 12.9, 2.6, 17.2, 15.4, 10.5, 6.7, 6.6, 7.6, 10.5, 15.6,
10.4, 5.1, 11, 9.7, 4.2, 3.6, 8.5, 11.5, 8.4, 6.9, 11, 10.4,
3.4, 3.2, 5.5, 2.4, 11.2, 2.6, 15.1, 16, 13.7, 10.5, 3.5, 13.4,
11.5, 12.3, 13.9, 14.5, 12.8, 16.8, 16.9, 13.5, 17.2, 12.5, 12.4,
11.8, 12, 10.9, 6.7, 10.9, 2.3, 5.2, 13.1, 12.1, 13.9, 12.9,
7.2, 12.5, 16, 11.7), Hemlock = c(0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1,
0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0), Alpine = c(0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,
0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1,
0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0,
0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0,
0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0),
    Snow = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
    1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("D", "Day",
"NDVI",
"VegS", "T", "Hemlock", "Alpine",
"Snow"), row.names = as.integer(c(NA,
254)), class = "data.frame")


#Running a regression.
library(MASS)
fit <- glm.nb(D ~ Day + NDVI + VegS + T + Hemlock + Alpine + Snow, data =
data)
summary(fit, correlation = FALSE)

Call:
glm.nb(formula = D ~ Day + NDVI + VegS + T + Hemlock + Alpine +
    Snow, data = data, init.theta = 11.3494468596771, link = log)

Deviance Residuals:
    Min       1Q   Median       3Q      Max
-3.7451  -0.7196  -0.1958   0.5389   2.7096

Coefficients:
             Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.882684   0.929598  -3.101 0.001929 **
Day          0.020325   0.005540   3.669 0.000244 ***
NDVI         1.353361   0.221471   6.111 9.91e-10 ***
VegS         0.016731   0.004931   3.393 0.000691 ***
T            0.074189   0.009491   7.817 5.42e-15 ***
Hemlock     -0.588858   0.174980  -3.365 0.000765 ***
Alpine      -0.452199   0.099296  -4.554 5.26e-06 ***
Snow        -1.902610   0.735708  -2.586 0.009707 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1

(Dispersion parameter for Negative Binomial(11.3494) family taken to be 1)

    Null deviance: 515.16  on 253  degrees of freedom
Residual deviance: 278.89  on 246  degrees of freedom
AIC: 1300.1

Number of Fisher Scoring iterations: 1


              Theta:  11.35
          Std. Err.:  2.71

 2 x log-likelihood:  -1282.075


#Plotting observed versus predicted values.
pdf(file="ObsVPred.pdf", width=4, height=4, family="Times",
pointsize=11)
par(mar = c(5,5,1,1), pch=1)
plot(data$D, fit$fitted.values, main="",
ylab=expression(italic(D)[predicted]),
xlab=expression(italic(D)[observed]))
abline(a=0,b=1, lty=2)
lines(lowess(data$D, fit$fitted.values))
dev.off()


#This appears to be a decent explanatory model, but as a predictive
model it is systematically biased.  It is biased high at low observed
values of D and biased low at high values observed values of D.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ObsVPred.pdf
Type: application/pdf
Size: 22031 bytes
Desc: not available
Url :
https://stat.ethz.ch/pipermail/r-help/attachments/20070808/c31131a1/attachment.pdf

Matthew and Kim Bowser

2007-Aug-09 15:43 UTC

head link

[R] Systematically biased count data regression model

Dear all,

I am attempting to explain patterns of arthropod family richness
(count data) using a regression model.  It seems to be able to do a
pretty good job as an explanatory model (i.e. demonstrating
relationships between dependent and independent variables), but it has
systematic problems as a predictive model:  It is biased high at low
observed values of family richness and biased low at high observed
values of family richness (see attached pdf).  I have tried diverse
kinds of reasonable regression models mostly as in Zeileis, et al.
(2007), as well as transforming my variables, both with only small
improvements.

Do you have suggestions for making a model that would perform better
as a predictive model?

Thank you for your time.

Sincerely,

Matthew Bowser

STEP student
USFWS Kenai National Wildlife Refuge
Soldotna, Alaska, USA

M.Sc. student
University of Alaska Fairbanks
Fairbankse, Alaska, USA

Reference

Zeileis, A., C. Kleiber, and S. Jackman, 2007. Regression models for
count data in R. Technical Report 53, Department of Statistics and
Mathematics, Wirtschaftsuniversit?t Wien, Wien, Austria. URL
http://cran.r-project.org/doc/vignettes/pscl/countreg.pdf.

Code

`data` <-
structure(list(D = c(4, 5, 12, 4, 9, 15, 4, 8, 3, 9, 6, 17, 4,
9, 6, 9, 3, 9, 7, 11, 17, 3, 10, 8, 9, 6, 7, 9, 7, 5, 15, 15,
12, 9, 10, 4, 4, 15, 7, 7, 12, 7, 12, 7, 7, 7, 5, 14, 7, 13,
1, 9, 2, 13, 6, 8, 2, 10, 5, 14, 4, 13, 5, 17, 12, 13, 7, 12,
5, 6, 10, 6, 6, 10, 4, 4, 12, 10, 3, 4, 4, 6, 7, 15, 1, 8, 8,
5, 12, 0, 5, 7, 4, 9, 6, 10, 5, 7, 7, 14, 3, 8, 15, 14, 7, 8,
7, 8, 8, 10, 9, 2, 7, 8, 2, 6, 7, 9, 3, 20, 10, 10, 4, 2, 8,
10, 10, 8, 8, 12, 8, 6, 16, 10, 5, 1, 1, 5, 3, 11, 4, 9, 16,
3, 1, 6, 5, 5, 7, 11, 11, 5, 7, 5, 3, 2, 3, 0, 3, 0, 4, 1, 12,
16, 9, 0, 7, 0, 11, 7, 9, 4, 16, 9, 10, 0, 1, 9, 15, 6, 8, 6,
4, 6, 7, 5, 7, 14, 16, 5, 8, 1, 8, 2, 10, 9, 6, 11, 3, 16, 3,
6, 8, 12, 5, 1, 1, 3, 3, 1, 5, 15, 4, 2, 2, 6, 5, 0, 0, 0, 3,
0, 16, 0, 9, 0, 0, 8, 1, 2, 2, 3, 4, 17, 4, 1, 4, 6, 4, 3, 15,
2, 2, 13, 1, 9, 7, 7, 13, 10, 11, 2, 15, 7), Day = c(159, 159,
159, 159, 166, 175, 161, 168, 161, 166, 161, 166, 161, 161, 161,
175, 161, 175, 161, 165, 176, 161, 163, 161, 168, 161, 161, 161,
161, 161, 165, 176, 175, 176, 163, 175, 163, 168, 163, 176, 176,
165, 176, 175, 161, 163, 163, 168, 163, 175, 167, 176, 167, 165,
165, 169, 165, 169, 165, 161, 165, 175, 165, 176, 175, 167, 167,
175, 167, 164, 167, 164, 181, 164, 167, 164, 176, 164, 167, 164,
167, 164, 167, 175, 167, 173, 176, 173, 178, 167, 173, 172, 173,
178, 178, 172, 181, 182, 173, 162, 162, 173, 178, 173, 172, 162,
173, 162, 173, 162, 173, 170, 178, 166, 166, 162, 166, 177, 166,
170, 166, 172, 172, 166, 172, 166, 174, 162, 164, 162, 170, 164,
170, 164, 170, 164, 177, 164, 164, 174, 174, 162, 170, 162, 172,
162, 165, 162, 165, 177, 172, 162, 170, 162, 170, 174, 165, 174,
166, 172, 174, 172, 174, 170, 170, 165, 170, 174, 174, 172, 174,
172, 174, 165, 170, 165, 170, 174, 172, 174, 172, 175, 175, 170,
171, 174, 174, 174, 172, 175, 171, 175, 174, 174, 174, 175, 172,
171, 171, 174, 160, 175, 160, 171, 170, 175, 170, 170, 160, 160,
160, 171, 171, 171, 171, 160, 160, 160, 171, 171, 176, 171, 176,
176, 171, 176, 171, 176, 176, 176, 176, 159, 166, 159, 159, 166,
168, 169, 159, 168, 169, 166, 163, 180, 163, 165, 164, 180, 166,
166, 164, 164, 177, 166), NDVI = c(0.187, 0.2, 0.379, 0.253,
0.356, 0.341, 0.268, 0.431, 0.282, 0.181, 0.243, 0.327, 0.26,
0.232, 0.438, 0.275, 0.169, 0.288, 0.138, 0.404, 0.386, 0.194,
0.266, 0.23, 0.333, 0.234, 0.258, 0.333, 0.234, 0.096, 0.354,
0.394, 0.304, 0.162, 0.565, 0.348, 0.345, 0.226, 0.316, 0.312,
0.333, 0.28, 0.325, 0.243, 0.194, 0.29, 0.221, 0.217, 0.122,
0.289, 0.475, 0.048, 0.416, 0.481, 0.159, 0.238, 0.183, 0.28,
0.32, 0.288, 0.24, 0.287, 0.363, 0.367, 0.24, 0.55, 0.441, 0.34,
0.295, 0.23, 0.32, 0.184, 0.306, 0.232, 0.289, 0.341, 0.221,
0.333, 0.17, 0.139, 0.2, 0.204, 0.301, 0.253, -0.08, 0.309, 0.232,
0.23, 0.239, -0.12, 0.26, 0.285, 0.45, 0.348, 0.396, 0.311, 0.318,
0.31, 0.261, 0.441, 0.147, 0.283, 0.339, 0.224, 0.5, 0.265, 0.2,
0.287, 0.398, 0.116, 0.292, 0.045, 0.137, 0.542, 0.171, 0.38,
0.469, 0.325, 0.139, 0.166, 0.247, 0.253, 0.466, 0.26, 0.288,
0.34, 0.288, 0.26, 0.178, 0.274, 0.358, 0.285, 0.225, 0.162,
0.223, 0.301, -0.398, -0.2, 0.239, 0.228, 0.255, 0.166, 0.306,
0.28, 0.279, 0.208, 0.377, 0.413, 0.489, 0.417, 0.333, 0.208,
0.232, 0.431, 0.283, 0.241, 0.105, 0.18, -0.172, -0.374, 0.25,
0.043, 0.215, 0.204, 0.19, 0.177, -0.106, -0.143, 0.062, 0.462,
0.256, 0.229, 0.314, 0.415, 0.307, 0.238, -0.35, 0.34, 0.275,
0.097, 0.353, 0.214, 0.435, 0.055, -0.289, 0.239, 0.186, 0.135,
0.259, 0.268, 0.258, 0.032, 0.489, 0.389, 0.298, 0.164, 0.325,
0.254, -0.059, 0.524, 0.539, 0.25, 0.175, 0.326, 0.302, -0.047,
-0.301, -0.149, 0.358, 0.495, 0.311, 0.235, 0.558, -0.156, 0,
0.146, 0.329, -0.069, -0.352, -0.356, -0.206, -0.179, 0.467,
-0.325, 0.39, -0.399, -0.165, 0.267, -0.334, -0.17, 0.58, 0.228,
0.234, 0.351, 0.3, -0.018, 0.125, 0.176, 0.322, 0.246, 0.376,
-0.185, 0.342, 0.142, -0.075, 0.186, 0.333, 0.112, 0.272, 0.277,
0.203, 0.37, 0.465, 0.425), VegS = c(14, 11, 18, 21, 31, 20,
11, 17, 10, 15, 27, 8, 17, 13, 16, 9, 10, 16, 10, 15, 11, 9,
14, 11, 11, 10, 24, 18, 12, 6, 25, 21, 25, 8, 14, 18, 11, 16,
20, 16, 10, 16, 18, 14, 13, 11, 15, 23, 11, 28, 12, 17, 12, 18,
10, 15, 7, 15, 9, 16, 18, 16, 20, 18, 12, 19, 16, 18, 20, 15,
24, 9, 15, 9, 16, 14, 17, 14, 7, 9, 9, 12, 13, 15, 14, 11, 17,
8, 14, 15, 12, 8, 10, 12, 8, 16, 15, 22, 16, 21, 10, 15, 20,
14, 27, 21, 19, 22, 21, 11, 13, 10, 13, 14, 9, 22, 22, 20, 12,
16, 20, 19, 26, 14, 13, 23, 14, 22, 19, 15, 28, 16, 20, 25, 10,
19, 0, 10, 8, 11, 17, 13, 17, 23, 37, 32, 19, 26, 12, 11, 24,
11, 21, 25, 8, 15, 21, 31, 17, 0, 12, 6, 23, 19, 29, 14, 9, 0,
18, 23, 20, 15, 15, 17, 27, 17, 2, 24, 17, 16, 26, 11, 23, 24,
10, 26, 21, 12, 20, 12, 29, 22, 20, 16, 41, 19, 27, 28, 10, 35,
28, 23, 14, 5, 23, 15, 17, 12, 11, 24, 11, 14, 7, 7, 27, 17,
15, 10, 16, 2, 11, 21, 18, 15, 8, 17, 10, 18, 15, 18, 31, 9,
14, 11, 19, 22, 8, 19, 17, 18, 25, 11, 17, 32, 25, 18, 21, 19,
35, 14, 29, 9, 28, 14), T = c(13, 15.4, 15.6, 12.3, 12.7, 13.3,
6, 13, 9.2, 17.8, 9.4, 14.2, 8.7, 14, 8.6, 13.9, 9.2, 15.1, 9.4,
16.5, 14, 11.5, 15.5, 13.3, 12.7, 14, 10.5, 14, 10.1, 16.7, 15.2,
11.2, 11.7, 17.9, 13.3, 13.9, 8.7, 16.7, 7.8, 7.9, 10.9, 15.5,
14, 14.1, 14.3, 13.3, 11.6, 16.5, 12.7, 12.6, 8.3, 9, 12.4, 15,
11.8, 14.1, 10.5, 12.4, 10.5, 17.5, 9.2, 16.3, 5.3, 5.9, 11.9,
8.9, 7.7, 15.2, 8.6, 13.2, 9.5, 15.8, 12.5, 13.8, 10.7, 10.5,
7.7, 11, 9.3, 14.6, 12, 15.4, 12.3, 14, 8.3, 19.8, 15.5, 14.3,
9, 7.6, 15.3, 12.8, 14.4, 14, 10.5, 8.9, 13.4, 12.8, 12.9, 11.2,
13.1, 10, 12.4, 15.4, 7.6, 14.9, 13.1, 11.1, 8.6, 13.6, 8.4,
11.5, 12.5, 15.6, 8.3, 9.6, 8.7, 9.7, 10.5, 12.8, 8.6, 12.7,
6.7, 8.5, 9.9, 8.3, 12.8, 9.8, 10.5, 10.7, 9.6, 11.1, 13.7, 9.5,
8.8, 3.4, 10.2, 3.5, 8.4, 11.9, 12.3, 10.2, 13.1, 8.4, 3.1, 7.2,
13.2, 7.6, 11.3, 13.5, 7.8, 5.5, 10.7, 4.8, 7.9, 13.5, 13.5,
2.1, 7.2, 9.7, 12, 9.2, 13.2, 12, 17.1, 7.9, 12.7, 11.8, 16,
6.4, 12.9, 8.1, 12.6, 10.2, 13.3, 8.5, 9.7, 9.9, 18.2, 11, 8.4,
1.5, 7.3, 10.6, 13.6, 8.4, 7.2, 13, 15.5, 9.7, 13.2, 5.9, 9.5,
10.4, 12.9, 2.6, 17.2, 15.4, 10.5, 6.7, 6.6, 7.6, 10.5, 15.6,
10.4, 5.1, 11, 9.7, 4.2, 3.6, 8.5, 11.5, 8.4, 6.9, 11, 10.4,
3.4, 3.2, 5.5, 2.4, 11.2, 2.6, 15.1, 16, 13.7, 10.5, 3.5, 13.4,
11.5, 12.3, 13.9, 14.5, 12.8, 16.8, 16.9, 13.5, 17.2, 12.5, 12.4,
11.8, 12, 10.9, 6.7, 10.9, 2.3, 5.2, 13.1, 12.1, 13.9, 12.9,
7.2, 12.5, 16, 11.7), Hemlock = c(0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1,
0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0), Alpine = c(0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,
0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1,
0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0,
0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0,
0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0),
   Snow = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
   0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
   1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
   0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("D", "Day",
"NDVI",
"VegS", "T", "Hemlock", "Alpine",
"Snow"), row.names = as.integer(c(NA,
254)), class = "data.frame")


#Running a regression.
library(MASS)
fit <- glm.nb(D ~ Day + NDVI + VegS + T + Hemlock + Alpine + Snow, data =
data)
summary(fit, correlation = FALSE)

Call:
glm.nb(formula = D ~ Day + NDVI + VegS + T + Hemlock + Alpine +
   Snow, data = data, init.theta = 11.3494468596771, link = log)

Deviance Residuals:
   Min       1Q   Median       3Q      Max
-3.7451  -0.7196  -0.1958   0.5389   2.7096

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.882684   0.929598  -3.101 0.001929 **
Day          0.020325   0.005540   3.669 0.000244 ***
NDVI         1.353361   0.221471   6.111 9.91e-10 ***
VegS         0.016731   0.004931   3.393 0.000691 ***
T            0.074189   0.009491   7.817 5.42e-15 ***
Hemlock     -0.588858   0.174980  -3.365 0.000765 ***
Alpine      -0.452199   0.099296  -4.554 5.26e-06 ***
Snow        -1.902610   0.735708  -2.586 0.009707 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1

(Dispersion parameter for Negative Binomial(11.3494) family taken to be 1)

   Null deviance: 515.16  on 253  degrees of freedom
Residual deviance: 278.89  on 246  degrees of freedom
AIC: 1300.1

Number of Fisher Scoring iterations: 1


             Theta:  11.35
         Std. Err.:  2.71

 2 x log-likelihood:  -1282.075


#Plotting observed versus predicted values.
pdf(file="ObsVPred.pdf", width=4, height=4, family="Times",
pointsize=11)
par(mar = c(5,5,1,1), pch=1)
plot(data$D, fit$fitted.values, main="",
ylab=expression(italic(D)[predicted]),
xlab=expression(italic(D)[observed]))
abline(a=0,b=1, lty=2)
lines(lowess(data$D, fit$fitted.values))
dev.off()


#This appears to be a decent explanatory model, but as a predictive
model it is systematically biased.  It is biased high at low observed
values of D and biased low at high values observed values of D.

paulandpen at optusnet.com.au

2007-Aug-10 01:59 UTC

head link

[R] Systematically biased count data regression model

Matthew,

In response to that post, I am afraid I have to disagree.  I think a poor model
fit (eg 16%) is a reflection of a lot of unmeasured factors and therefore random
error in the model.  This would explain why overall predictive performance is
poor (eg a lot of error in the model)  Your situation is different.  You are
having trouble predicting extreme values, so there is something systematic (eg
your model works well in the middle and worse at the tails) not poorly overall.

As the post does reflect, you are suffering from error in prediction, that is a
fact of life, as others have stated, and most of us who suffer from prediction
error experience it at the more extreme values.

Thanks Paul   


> Matthew and Kim Bowser <matthewnkim@gmail.com> wrote:
> 
> Dear all,
> 
> I received a very helpful response from someone who requested
> anonymity, but to whom I am grateful.
> 
> PLEASE do not quote my name or email (I am trying to stay off spam 
> lists)
> 
> Matthew:  I think this is just a reflection of
> the fact the model does not fit perfectly.  The
> example below is a simple linear regression that
> is highly significant but has R-square of
> 16%.  This model as well is "biased high at low
> observed values of y and biased low at high values observed values of
y"
> 
> set.seed(1)
> n <- 200
> m <- data.frame(x=rnorm(n,mean=10,sd=2))
> m$y <- m$x + rnorm(n,sd=4)    # simulate using intercept 0, slope 1
> f <- lm(y ~ x,data=m)
> print(summary(f))
> #
> # Call:
> # lm(formula = y ~ x, data = m)
> #
> # Residuals:
> #      Min       1Q   Median       3Q      Max
> # -11.7310  -2.1709  -0.1009   2.6733  10.3446
> #
> # Coefficients:
> #             Estimate Std. Error t value Pr(>|t|)
> # (Intercept)   0.6274     1.5830   0.396    0.692
> # x             0.9538     0.1546   6.170 3.77e-09 ***
> # ---
> # Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1
> #
> # Residual standard error: 4.052 on 198 degrees of freedom
> # Multiple R-Squared: 0.1613,     Adjusted R-squared: 0.157
> # F-statistic: 38.07 on 1 and 198 DF,  p-value: 3.773e-09
> #
>
plot(m$y,f$fitted.values,xlab="Observed",ylab="Predicted")
> lines(lowess(m$y,f$fitted.values),col="red",lty=2)
> abline(c(0,1))
>
legend("topleft",lty=c(2,1),col=c("red","black"),legend=c("Loess","45-deg
> ree"))
> - Show quoted text -
> 
> At 2007-08-09  08:43, Matthew and Kim Bowser wrote:
> >Dear all,
> >
> >I am attempting to explain patterns of arthropod family richness
> >(count data) using a regression model.  It seems to be able to do a
> >pretty good job as an explanatory model (i.e. demonstrating
> >relationships between dependent and independent variables), but it has
> >systematic problems as a predictive model:  It is biased high at low
> >observed values of family richness and biased low at high observed
> >values of family richness (see attached pdf).  I have tried diverse
> >kinds of reasonable regression models mostly as in Zeileis, et al.
> >(2007), as well as transforming my variables, both with only small
> >improvements.
> >
> >Do you have suggestions for making a model that would perform better
> >as a predictive model?
> >
> >Thank you for your time.
> >
> >Sincerely,
> >
> >Matthew Bowser
> >
> >STEP student
> >USFWS Kenai National Wildlife Refuge
> >Soldotna, Alaska, USA
> >
> >M.Sc. student
> >University of Alaska Fairbanks
> >Fairbankse, Alaska, USA
> >
> >Reference
> >
> >Zeileis, A., C. Kleiber, and S. Jackman, 2007. Regression models for
> >count data in R. Technical Report 53, Department of Statistics and
> >Mathematics, Wirtschaftsuniversität Wien, Wien, Austria. URL
> >http://cran.r-project.org/doc/vignettes/pscl/countreg.pdf.
> >
> >[snip]
> >
> >
> >#This appears to be a decent explanatory model, but as a predictive
> >model it is systematically biased.  It is biased high at low observed
> >values of D and biased low at high values observed values of D.
> >
> - Show quoted text -
> 
> On 8/9/07, Matthew and Kim Bowser <matthewnkim@gmail.com> wrote:
> > Dear all,
> >
> > I am attempting to explain patterns of arthropod family richness
> > (count data) using a regression model.  It seems to be able to do a
> > pretty good job as an explanatory model (i.e. demonstrating
> > relationships between dependent and independent variables), but it has
> > systematic problems as a predictive model:  It is biased high at low
> > observed values of family richness and biased low at high observed
> > values of family richness (see attached pdf).  I have tried diverse
> > kinds of reasonable regression models mostly as in Zeileis, et al.
> > (2007), as well as transforming my variables, both with only small
> > improvements.
> >
> > Do you have suggestions for making a model that would perform better
> > as a predictive model?
> >
> > Thank you for your time.
> >
> > Sincerely,
> >
> > Matthew Bowser
> >
> > STEP student
> > USFWS Kenai National Wildlife Refuge
> > Soldotna, Alaska, USA
> >
> > M.Sc. student
> > University of Alaska Fairbanks
> > Fairbankse, Alaska, USA
> >
> > Reference
> >
> > Zeileis, A., C. Kleiber, and S. Jackman, 2007. Regression models for
> > count data in R. Technical Report 53, Department of Statistics and
> > Mathematics, Wirtschaftsuniversität Wien, Wien, Austria. URL
> > http://cran.r-project.org/doc/vignettes/pscl/countreg.pdf.
> >
> > Code
> >
> > `data` <-
> > structure(list(D = c(4, 5, 12, 4, 9, 15, 4, 8, 3, 9, 6, 17, 4,
> > 9, 6, 9, 3, 9, 7, 11, 17, 3, 10, 8, 9, 6, 7, 9, 7, 5, 15, 15,
> > 12, 9, 10, 4, 4, 15, 7, 7, 12, 7, 12, 7, 7, 7, 5, 14, 7, 13,
> > 1, 9, 2, 13, 6, 8, 2, 10, 5, 14, 4, 13, 5, 17, 12, 13, 7, 12,
> > 5, 6, 10, 6, 6, 10, 4, 4, 12, 10, 3, 4, 4, 6, 7, 15, 1, 8, 8,
> > 5, 12, 0, 5, 7, 4, 9, 6, 10, 5, 7, 7, 14, 3, 8, 15, 14, 7, 8,
> > 7, 8, 8, 10, 9, 2, 7, 8, 2, 6, 7, 9, 3, 20, 10, 10, 4, 2, 8,
> > 10, 10, 8, 8, 12, 8, 6, 16, 10, 5, 1, 1, 5, 3, 11, 4, 9, 16,
> > 3, 1, 6, 5, 5, 7, 11, 11, 5, 7, 5, 3, 2, 3, 0, 3, 0, 4, 1, 12,
> > 16, 9, 0, 7, 0, 11, 7, 9, 4, 16, 9, 10, 0, 1, 9, 15, 6, 8, 6,
> > 4, 6, 7, 5, 7, 14, 16, 5, 8, 1, 8, 2, 10, 9, 6, 11, 3, 16, 3,
> > 6, 8, 12, 5, 1, 1, 3, 3, 1, 5, 15, 4, 2, 2, 6, 5, 0, 0, 0, 3,
> > 0, 16, 0, 9, 0, 0, 8, 1, 2, 2, 3, 4, 17, 4, 1, 4, 6, 4, 3, 15,
> > 2, 2, 13, 1, 9, 7, 7, 13, 10, 11, 2, 15, 7), Day = c(159, 159,
> > 159, 159, 166, 175, 161, 168, 161, 166, 161, 166, 161, 161, 161,
> > 175, 161, 175, 161, 165, 176, 161, 163, 161, 168, 161, 161, 161,
> > 161, 161, 165, 176, 175, 176, 163, 175, 163, 168, 163, 176, 176,
> > 165, 176, 175, 161, 163, 163, 168, 163, 175, 167, 176, 167, 165,
> > 165, 169, 165, 169, 165, 161, 165, 175, 165, 176, 175, 167, 167,
> > 175, 167, 164, 167, 164, 181, 164, 167, 164, 176, 164, 167, 164,
> > 167, 164, 167, 175, 167, 173, 176, 173, 178, 167, 173, 172, 173,
> > 178, 178, 172, 181, 182, 173, 162, 162, 173, 178, 173, 172, 162,
> > 173, 162, 173, 162, 173, 170, 178, 166, 166, 162, 166, 177, 166,
> > 170, 166, 172, 172, 166, 172, 166, 174, 162, 164, 162, 170, 164,
> > 170, 164, 170, 164, 177, 164, 164, 174, 174, 162, 170, 162, 172,
> > 162, 165, 162, 165, 177, 172, 162, 170, 162, 170, 174, 165, 174,
> > 166, 172, 174, 172, 174, 170, 170, 165, 170, 174, 174, 172, 174,
> > 172, 174, 165, 170, 165, 170, 174, 172, 174, 172, 175, 175, 170,
> > 171, 174, 174, 174, 172, 175, 171, 175, 174, 174, 174, 175, 172,
> > 171, 171, 174, 160, 175, 160, 171, 170, 175, 170, 170, 160, 160,
> > 160, 171, 171, 171, 171, 160, 160, 160, 171, 171, 176, 171, 176,
> > 176, 171, 176, 171, 176, 176, 176, 176, 159, 166, 159, 159, 166,
> > 168, 169, 159, 168, 169, 166, 163, 180, 163, 165, 164, 180, 166,
> > 166, 164, 164, 177, 166), NDVI = c(0.187, 0.2, 0.379, 0.253,
> > 0.356, 0.341, 0.268, 0.431, 0.282, 0.181, 0.243, 0.327, 0.26,
> > 0.232, 0.438, 0.275, 0.169, 0.288, 0.138, 0.404, 0.386, 0.194,
> > 0.266, 0.23, 0.333, 0.234, 0.258, 0.333, 0.234, 0.096, 0.354,
> > 0.394, 0.304, 0.162, 0.565, 0.348, 0.345, 0.226, 0.316, 0.312,
> > 0.333, 0.28, 0.325, 0.243, 0.194, 0.29, 0.221, 0.217, 0.122,
> > 0.289, 0.475, 0.048, 0.416, 0.481, 0.159, 0.238, 0.183, 0.28,
> > 0.32, 0.288, 0.24, 0.287, 0.363, 0.367, 0.24, 0.55, 0.441, 0.34,
> > 0.295, 0.23, 0.32, 0.184, 0.306, 0.232, 0.289, 0.341, 0.221,
> > 0.333, 0.17, 0.139, 0.2, 0.204, 0.301, 0.253, -0.08, 0.309, 0.232,
> > 0.23, 0.239, -0.12, 0.26, 0.285, 0.45, 0.348, 0.396, 0.311, 0.318,
> > 0.31, 0.261, 0.441, 0.147, 0.283, 0.339, 0.224, 0.5, 0.265, 0.2,
> > 0.287, 0.398, 0.116, 0.292, 0.045, 0.137, 0.542, 0.171, 0.38,
> > 0.469, 0.325, 0.139, 0.166, 0.247, 0.253, 0.466, 0.26, 0.288,
> > 0.34, 0.288, 0.26, 0.178, 0.274, 0.358, 0.285, 0.225, 0.162,
> > 0.223, 0.301, -0.398, -0.2, 0.239, 0.228, 0.255, 0.166, 0.306,
> > 0.28, 0.279, 0.208, 0.377, 0.413, 0.489, 0.417, 0.333, 0.208,
> > 0.232, 0.431, 0.283, 0.241, 0.105, 0.18, -0.172, -0.374, 0.25,
> > 0.043, 0.215, 0.204, 0.19, 0.177, -0.106, -0.143, 0.062, 0.462,
> > 0.256, 0.229, 0.314, 0.415, 0.307, 0.238, -0.35, 0.34, 0.275,
> > 0.097, 0.353, 0.214, 0.435, 0.055, -0.289, 0.239, 0.186, 0.135,
> > 0.259, 0.268, 0.258, 0.032, 0.489, 0.389, 0.298, 0.164, 0.325,
> > 0.254, -0.059, 0.524, 0.539, 0.25, 0.175, 0.326, 0.302, -0.047,
> > -0.301, -0.149, 0.358, 0.495, 0.311, 0.235, 0.558, -0.156, 0,
> > 0.146, 0.329, -0.069, -0.352, -0.356, -0.206, -0.179, 0.467,
> > -0.325, 0.39, -0.399, -0.165, 0.267, -0.334, -0.17, 0.58, 0.228,
> > 0.234, 0.351, 0.3, -0.018, 0.125, 0.176, 0.322, 0.246, 0.376,
> > -0.185, 0.342, 0.142, -0.075, 0.186, 0.333, 0.112, 0.272, 0.277,
> > 0.203, 0.37, 0.465, 0.425), VegS = c(14, 11, 18, 21, 31, 20,
> > 11, 17, 10, 15, 27, 8, 17, 13, 16, 9, 10, 16, 10, 15, 11, 9,
> > 14, 11, 11, 10, 24, 18, 12, 6, 25, 21, 25, 8, 14, 18, 11, 16,
> > 20, 16, 10, 16, 18, 14, 13, 11, 15, 23, 11, 28, 12, 17, 12, 18,
> > 10, 15, 7, 15, 9, 16, 18, 16, 20, 18, 12, 19, 16, 18, 20, 15,
> > 24, 9, 15, 9, 16, 14, 17, 14, 7, 9, 9, 12, 13, 15, 14, 11, 17,
> > 8, 14, 15, 12, 8, 10, 12, 8, 16, 15, 22, 16, 21, 10, 15, 20,
> > 14, 27, 21, 19, 22, 21, 11, 13, 10, 13, 14, 9, 22, 22, 20, 12,
> > 16, 20, 19, 26, 14, 13, 23, 14, 22, 19, 15, 28, 16, 20, 25, 10,
> > 19, 0, 10, 8, 11, 17, 13, 17, 23, 37, 32, 19, 26, 12, 11, 24,
> > 11, 21, 25, 8, 15, 21, 31, 17, 0, 12, 6, 23, 19, 29, 14, 9, 0,
> > 18, 23, 20, 15, 15, 17, 27, 17, 2, 24, 17, 16, 26, 11, 23, 24,
> > 10, 26, 21, 12, 20, 12, 29, 22, 20, 16, 41, 19, 27, 28, 10, 35,
> > 28, 23, 14, 5, 23, 15, 17, 12, 11, 24, 11, 14, 7, 7, 27, 17,
> > 15, 10, 16, 2, 11, 21, 18, 15, 8, 17, 10, 18, 15, 18, 31, 9,
> > 14, 11, 19, 22, 8, 19, 17, 18, 25, 11, 17, 32, 25, 18, 21, 19,
> > 35, 14, 29, 9, 28, 14), T = c(13, 15.4, 15.6, 12.3, 12.7, 13.3,
> > 6, 13, 9.2, 17.8, 9.4, 14.2, 8.7, 14, 8.6, 13.9, 9.2, 15.1, 9.4,
> > 16.5, 14, 11.5, 15.5, 13.3, 12.7, 14, 10.5, 14, 10.1, 16.7, 15.2,
> > 11.2, 11.7, 17.9, 13.3, 13.9, 8.7, 16.7, 7.8, 7.9, 10.9, 15.5,
> > 14, 14.1, 14.3, 13.3, 11.6, 16.5, 12.7, 12.6, 8.3, 9, 12.4, 15,
> > 11.8, 14.1, 10.5, 12.4, 10.5, 17.5, 9.2, 16.3, 5.3, 5.9, 11.9,
> > 8.9, 7.7, 15.2, 8.6, 13.2, 9.5, 15.8, 12.5, 13.8, 10.7, 10.5,
> > 7.7, 11, 9.3, 14.6, 12, 15.4, 12.3, 14, 8.3, 19.8, 15.5, 14.3,
> > 9, 7.6, 15.3, 12.8, 14.4, 14, 10.5, 8.9, 13.4, 12.8, 12.9, 11.2,
> > 13.1, 10, 12.4, 15.4, 7.6, 14.9, 13.1, 11.1, 8.6, 13.6, 8.4,
> > 11.5, 12.5, 15.6, 8.3, 9.6, 8.7, 9.7, 10.5, 12.8, 8.6, 12.7,
> > 6.7, 8.5, 9.9, 8.3, 12.8, 9.8, 10.5, 10.7, 9.6, 11.1, 13.7, 9.5,
> > 8.8, 3.4, 10.2, 3.5, 8.4, 11.9, 12.3, 10.2, 13.1, 8.4, 3.1, 7.2,
> > 13.2, 7.6, 11.3, 13.5, 7.8, 5.5, 10.7, 4.8, 7.9, 13.5, 13.5,
> > 2.1, 7.2, 9.7, 12, 9.2, 13.2, 12, 17.1, 7.9, 12.7, 11.8, 16,
> > 6.4, 12.9, 8.1, 12.6, 10.2, 13.3, 8.5, 9.7, 9.9, 18.2, 11, 8.4,
> > 1.5, 7.3, 10.6, 13.6, 8.4, 7.2, 13, 15.5, 9.7, 13.2, 5.9, 9.5,
> > 10.4, 12.9, 2.6, 17.2, 15.4, 10.5, 6.7, 6.6, 7.6, 10.5, 15.6,
> > 10.4, 5.1, 11, 9.7, 4.2, 3.6, 8.5, 11.5, 8.4, 6.9, 11, 10.4,
> > 3.4, 3.2, 5.5, 2.4, 11.2, 2.6, 15.1, 16, 13.7, 10.5, 3.5, 13.4,
> > 11.5, 12.3, 13.9, 14.5, 12.8, 16.8, 16.9, 13.5, 17.2, 12.5, 12.4,
> > 11.8, 12, 10.9, 6.7, 10.9, 2.3, 5.2, 13.1, 12.1, 13.9, 12.9,
> > 7.2, 12.5, 16, 11.7), Hemlock = c(0, 0, 0, 0, 0, 0, 0, 0, 0,
> > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> > 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
> > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> > 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> > 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1,
> > 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0), Alpine = c(0, 0, 0,
> > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> > 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
> > 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,
> > 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0,
> > 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1,
> > 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0,
> > 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0,
> > 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0),
> >    Snow = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> >    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> >    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> >    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> >    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> >    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> >    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> >    0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> >    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> >    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> >    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> >    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
> >    1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> >    0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("D",
"Day", "NDVI",
> > "VegS", "T", "Hemlock",
"Alpine", "Snow"), row.names =
> as.integer(c(NA,
> > 254)), class = "data.frame")
> >
> >
> > #Running a regression.
> > library(MASS)
> > fit <- glm.nb(D ~ Day + NDVI + VegS + T + Hemlock + Alpine + Snow, 
> data = data)
> > summary(fit, correlation = FALSE)
> >
> > Call:
> > glm.nb(formula = D ~ Day + NDVI + VegS + T + Hemlock + Alpine +
> >    Snow, data = data, init.theta = 11.3494468596771, link = log)
> >
> > Deviance Residuals:
> >    Min       1Q   Median       3Q      Max
> > -3.7451  -0.7196  -0.1958   0.5389   2.7096
> >
> > Coefficients:
> >             Estimate Std. Error z value Pr(>|z|)
> > (Intercept) -2.882684   0.929598  -3.101 0.001929 **
> > Day          0.020325   0.005540   3.669 0.000244 ***
> > NDVI         1.353361   0.221471   6.111 9.91e-10 ***
> > VegS         0.016731   0.004931   3.393 0.000691 ***
> > T            0.074189   0.009491   7.817 5.42e-15 ***
> > Hemlock     -0.588858   0.174980  -3.365 0.000765 ***
> > Alpine      -0.452199   0.099296  -4.554 5.26e-06 ***
> > Snow        -1.902610   0.735708  -2.586 0.009707 **
> > ---
> > Signif. codes:  0 '***' 0.001 '**' 0.01 '*'
0.05 '.' 0.1 ' ' 1
> >
> > (Dispersion parameter for Negative Binomial(11.3494) family taken to 
> be 1)
> >
> >    Null deviance: 515.16  on 253  degrees of freedom
> > Residual deviance: 278.89  on 246  degrees of freedom
> > AIC: 1300.1
> >
> > Number of Fisher Scoring iterations: 1
> >
> >
> >              Theta:  11.35
> >          Std. Err.:  2.71
> >
> >  2 x log-likelihood:  -1282.075
> >
> >
> > #Plotting observed versus predicted values.
> > pdf(file="ObsVPred.pdf", width=4, height=4,
family="Times",
> pointsize=11)
> > par(mar = c(5,5,1,1), pch=1)
> > plot(data$D, fit$fitted.values, main="",
> > ylab=expression(italic(D)[predicted]),
> > xlab=expression(italic(D)[observed]))
> > abline(a=0,b=1, lty=2)
> > lines(lowess(data$D, fit$fitted.values))
> > dev.off()
> >
> >
> > #This appears to be a decent explanatory model, but as a predictive
> > model it is systematically biased.  It is biased high at low observed
> > values of D and biased low at high values observed values of D.
> >
> 
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Apparently Analagous Threads

Search for more maybe matching threads

R help - Aug 2007 - Systematically biased count data regression model

[R] Systematically biased count data regression model

[R] Systematically biased count data regression model

[R] Systematically biased count data regression model

Apparently Analagous Threads