Matthew and Kim Bowser
2007-Aug-09 00:30 UTC
[R] Systematically biased count data regression model
Dear all, I am attempting to explain patterns of arthropod family richness (count data) using a regression model. It seems to be able to do a pretty good job as an explanatory model (i.e. demonstrating relationships between dependent and independent variables), but it has systematic problems as a predictive model: It is biased high at low observed values of family richness and biased low at high observed values of family richness (see attached pdf). I have tried diverse kinds of reasonable regression models mostly as in Zeileis, et al. (2007), as well as transforming my variables, both with only small improvements. Do you have suggestions for making a model that would perform better as a predictive model? Thank you for your time. Sincerely, Matthew Bowser STEP student USFWS Kenai National Wildlife Refuge Soldotna, Alaska, USA M.Sc. student University of Alaska Fairbanks Fairbankse, Alaska, USA Reference Zeileis, A., C. Kleiber, and S. Jackman, 2007. Regression models for count data in R. Technical Report 53, Department of Statistics and Mathematics, Wirtschaftsuniversit?t Wien, Wien, Austria. URL http://cran.r-project.org/doc/vignettes/pscl/countreg.pdf. Code `data` <- structure(list(D = c(4, 5, 12, 4, 9, 15, 4, 8, 3, 9, 6, 17, 4, 9, 6, 9, 3, 9, 7, 11, 17, 3, 10, 8, 9, 6, 7, 9, 7, 5, 15, 15, 12, 9, 10, 4, 4, 15, 7, 7, 12, 7, 12, 7, 7, 7, 5, 14, 7, 13, 1, 9, 2, 13, 6, 8, 2, 10, 5, 14, 4, 13, 5, 17, 12, 13, 7, 12, 5, 6, 10, 6, 6, 10, 4, 4, 12, 10, 3, 4, 4, 6, 7, 15, 1, 8, 8, 5, 12, 0, 5, 7, 4, 9, 6, 10, 5, 7, 7, 14, 3, 8, 15, 14, 7, 8, 7, 8, 8, 10, 9, 2, 7, 8, 2, 6, 7, 9, 3, 20, 10, 10, 4, 2, 8, 10, 10, 8, 8, 12, 8, 6, 16, 10, 5, 1, 1, 5, 3, 11, 4, 9, 16, 3, 1, 6, 5, 5, 7, 11, 11, 5, 7, 5, 3, 2, 3, 0, 3, 0, 4, 1, 12, 16, 9, 0, 7, 0, 11, 7, 9, 4, 16, 9, 10, 0, 1, 9, 15, 6, 8, 6, 4, 6, 7, 5, 7, 14, 16, 5, 8, 1, 8, 2, 10, 9, 6, 11, 3, 16, 3, 6, 8, 12, 5, 1, 1, 3, 3, 1, 5, 15, 4, 2, 2, 6, 5, 0, 0, 0, 3, 0, 16, 0, 9, 0, 0, 8, 1, 2, 2, 3, 4, 17, 4, 1, 4, 6, 4, 3, 15, 2, 2, 13, 1, 9, 7, 7, 13, 10, 11, 2, 15, 7), Day = c(159, 159, 159, 159, 166, 175, 161, 168, 161, 166, 161, 166, 161, 161, 161, 175, 161, 175, 161, 165, 176, 161, 163, 161, 168, 161, 161, 161, 161, 161, 165, 176, 175, 176, 163, 175, 163, 168, 163, 176, 176, 165, 176, 175, 161, 163, 163, 168, 163, 175, 167, 176, 167, 165, 165, 169, 165, 169, 165, 161, 165, 175, 165, 176, 175, 167, 167, 175, 167, 164, 167, 164, 181, 164, 167, 164, 176, 164, 167, 164, 167, 164, 167, 175, 167, 173, 176, 173, 178, 167, 173, 172, 173, 178, 178, 172, 181, 182, 173, 162, 162, 173, 178, 173, 172, 162, 173, 162, 173, 162, 173, 170, 178, 166, 166, 162, 166, 177, 166, 170, 166, 172, 172, 166, 172, 166, 174, 162, 164, 162, 170, 164, 170, 164, 170, 164, 177, 164, 164, 174, 174, 162, 170, 162, 172, 162, 165, 162, 165, 177, 172, 162, 170, 162, 170, 174, 165, 174, 166, 172, 174, 172, 174, 170, 170, 165, 170, 174, 174, 172, 174, 172, 174, 165, 170, 165, 170, 174, 172, 174, 172, 175, 175, 170, 171, 174, 174, 174, 172, 175, 171, 175, 174, 174, 174, 175, 172, 171, 171, 174, 160, 175, 160, 171, 170, 175, 170, 170, 160, 160, 160, 171, 171, 171, 171, 160, 160, 160, 171, 171, 176, 171, 176, 176, 171, 176, 171, 176, 176, 176, 176, 159, 166, 159, 159, 166, 168, 169, 159, 168, 169, 166, 163, 180, 163, 165, 164, 180, 166, 166, 164, 164, 177, 166), NDVI = c(0.187, 0.2, 0.379, 0.253, 0.356, 0.341, 0.268, 0.431, 0.282, 0.181, 0.243, 0.327, 0.26, 0.232, 0.438, 0.275, 0.169, 0.288, 0.138, 0.404, 0.386, 0.194, 0.266, 0.23, 0.333, 0.234, 0.258, 0.333, 0.234, 0.096, 0.354, 0.394, 0.304, 0.162, 0.565, 0.348, 0.345, 0.226, 0.316, 0.312, 0.333, 0.28, 0.325, 0.243, 0.194, 0.29, 0.221, 0.217, 0.122, 0.289, 0.475, 0.048, 0.416, 0.481, 0.159, 0.238, 0.183, 0.28, 0.32, 0.288, 0.24, 0.287, 0.363, 0.367, 0.24, 0.55, 0.441, 0.34, 0.295, 0.23, 0.32, 0.184, 0.306, 0.232, 0.289, 0.341, 0.221, 0.333, 0.17, 0.139, 0.2, 0.204, 0.301, 0.253, -0.08, 0.309, 0.232, 0.23, 0.239, -0.12, 0.26, 0.285, 0.45, 0.348, 0.396, 0.311, 0.318, 0.31, 0.261, 0.441, 0.147, 0.283, 0.339, 0.224, 0.5, 0.265, 0.2, 0.287, 0.398, 0.116, 0.292, 0.045, 0.137, 0.542, 0.171, 0.38, 0.469, 0.325, 0.139, 0.166, 0.247, 0.253, 0.466, 0.26, 0.288, 0.34, 0.288, 0.26, 0.178, 0.274, 0.358, 0.285, 0.225, 0.162, 0.223, 0.301, -0.398, -0.2, 0.239, 0.228, 0.255, 0.166, 0.306, 0.28, 0.279, 0.208, 0.377, 0.413, 0.489, 0.417, 0.333, 0.208, 0.232, 0.431, 0.283, 0.241, 0.105, 0.18, -0.172, -0.374, 0.25, 0.043, 0.215, 0.204, 0.19, 0.177, -0.106, -0.143, 0.062, 0.462, 0.256, 0.229, 0.314, 0.415, 0.307, 0.238, -0.35, 0.34, 0.275, 0.097, 0.353, 0.214, 0.435, 0.055, -0.289, 0.239, 0.186, 0.135, 0.259, 0.268, 0.258, 0.032, 0.489, 0.389, 0.298, 0.164, 0.325, 0.254, -0.059, 0.524, 0.539, 0.25, 0.175, 0.326, 0.302, -0.047, -0.301, -0.149, 0.358, 0.495, 0.311, 0.235, 0.558, -0.156, 0, 0.146, 0.329, -0.069, -0.352, -0.356, -0.206, -0.179, 0.467, -0.325, 0.39, -0.399, -0.165, 0.267, -0.334, -0.17, 0.58, 0.228, 0.234, 0.351, 0.3, -0.018, 0.125, 0.176, 0.322, 0.246, 0.376, -0.185, 0.342, 0.142, -0.075, 0.186, 0.333, 0.112, 0.272, 0.277, 0.203, 0.37, 0.465, 0.425), VegS = c(14, 11, 18, 21, 31, 20, 11, 17, 10, 15, 27, 8, 17, 13, 16, 9, 10, 16, 10, 15, 11, 9, 14, 11, 11, 10, 24, 18, 12, 6, 25, 21, 25, 8, 14, 18, 11, 16, 20, 16, 10, 16, 18, 14, 13, 11, 15, 23, 11, 28, 12, 17, 12, 18, 10, 15, 7, 15, 9, 16, 18, 16, 20, 18, 12, 19, 16, 18, 20, 15, 24, 9, 15, 9, 16, 14, 17, 14, 7, 9, 9, 12, 13, 15, 14, 11, 17, 8, 14, 15, 12, 8, 10, 12, 8, 16, 15, 22, 16, 21, 10, 15, 20, 14, 27, 21, 19, 22, 21, 11, 13, 10, 13, 14, 9, 22, 22, 20, 12, 16, 20, 19, 26, 14, 13, 23, 14, 22, 19, 15, 28, 16, 20, 25, 10, 19, 0, 10, 8, 11, 17, 13, 17, 23, 37, 32, 19, 26, 12, 11, 24, 11, 21, 25, 8, 15, 21, 31, 17, 0, 12, 6, 23, 19, 29, 14, 9, 0, 18, 23, 20, 15, 15, 17, 27, 17, 2, 24, 17, 16, 26, 11, 23, 24, 10, 26, 21, 12, 20, 12, 29, 22, 20, 16, 41, 19, 27, 28, 10, 35, 28, 23, 14, 5, 23, 15, 17, 12, 11, 24, 11, 14, 7, 7, 27, 17, 15, 10, 16, 2, 11, 21, 18, 15, 8, 17, 10, 18, 15, 18, 31, 9, 14, 11, 19, 22, 8, 19, 17, 18, 25, 11, 17, 32, 25, 18, 21, 19, 35, 14, 29, 9, 28, 14), T = c(13, 15.4, 15.6, 12.3, 12.7, 13.3, 6, 13, 9.2, 17.8, 9.4, 14.2, 8.7, 14, 8.6, 13.9, 9.2, 15.1, 9.4, 16.5, 14, 11.5, 15.5, 13.3, 12.7, 14, 10.5, 14, 10.1, 16.7, 15.2, 11.2, 11.7, 17.9, 13.3, 13.9, 8.7, 16.7, 7.8, 7.9, 10.9, 15.5, 14, 14.1, 14.3, 13.3, 11.6, 16.5, 12.7, 12.6, 8.3, 9, 12.4, 15, 11.8, 14.1, 10.5, 12.4, 10.5, 17.5, 9.2, 16.3, 5.3, 5.9, 11.9, 8.9, 7.7, 15.2, 8.6, 13.2, 9.5, 15.8, 12.5, 13.8, 10.7, 10.5, 7.7, 11, 9.3, 14.6, 12, 15.4, 12.3, 14, 8.3, 19.8, 15.5, 14.3, 9, 7.6, 15.3, 12.8, 14.4, 14, 10.5, 8.9, 13.4, 12.8, 12.9, 11.2, 13.1, 10, 12.4, 15.4, 7.6, 14.9, 13.1, 11.1, 8.6, 13.6, 8.4, 11.5, 12.5, 15.6, 8.3, 9.6, 8.7, 9.7, 10.5, 12.8, 8.6, 12.7, 6.7, 8.5, 9.9, 8.3, 12.8, 9.8, 10.5, 10.7, 9.6, 11.1, 13.7, 9.5, 8.8, 3.4, 10.2, 3.5, 8.4, 11.9, 12.3, 10.2, 13.1, 8.4, 3.1, 7.2, 13.2, 7.6, 11.3, 13.5, 7.8, 5.5, 10.7, 4.8, 7.9, 13.5, 13.5, 2.1, 7.2, 9.7, 12, 9.2, 13.2, 12, 17.1, 7.9, 12.7, 11.8, 16, 6.4, 12.9, 8.1, 12.6, 10.2, 13.3, 8.5, 9.7, 9.9, 18.2, 11, 8.4, 1.5, 7.3, 10.6, 13.6, 8.4, 7.2, 13, 15.5, 9.7, 13.2, 5.9, 9.5, 10.4, 12.9, 2.6, 17.2, 15.4, 10.5, 6.7, 6.6, 7.6, 10.5, 15.6, 10.4, 5.1, 11, 9.7, 4.2, 3.6, 8.5, 11.5, 8.4, 6.9, 11, 10.4, 3.4, 3.2, 5.5, 2.4, 11.2, 2.6, 15.1, 16, 13.7, 10.5, 3.5, 13.4, 11.5, 12.3, 13.9, 14.5, 12.8, 16.8, 16.9, 13.5, 17.2, 12.5, 12.4, 11.8, 12, 10.9, 6.7, 10.9, 2.3, 5.2, 13.1, 12.1, 13.9, 12.9, 7.2, 12.5, 16, 11.7), Hemlock = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0), Alpine = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0), Snow = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("D", "Day", "NDVI", "VegS", "T", "Hemlock", "Alpine", "Snow"), row.names = as.integer(c(NA, 254)), class = "data.frame") #Running a regression. library(MASS) fit <- glm.nb(D ~ Day + NDVI + VegS + T + Hemlock + Alpine + Snow, data = data) summary(fit, correlation = FALSE) Call: glm.nb(formula = D ~ Day + NDVI + VegS + T + Hemlock + Alpine + Snow, data = data, init.theta = 11.3494468596771, link = log) Deviance Residuals: Min 1Q Median 3Q Max -3.7451 -0.7196 -0.1958 0.5389 2.7096 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -2.882684 0.929598 -3.101 0.001929 ** Day 0.020325 0.005540 3.669 0.000244 *** NDVI 1.353361 0.221471 6.111 9.91e-10 *** VegS 0.016731 0.004931 3.393 0.000691 *** T 0.074189 0.009491 7.817 5.42e-15 *** Hemlock -0.588858 0.174980 -3.365 0.000765 *** Alpine -0.452199 0.099296 -4.554 5.26e-06 *** Snow -1.902610 0.735708 -2.586 0.009707 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for Negative Binomial(11.3494) family taken to be 1) Null deviance: 515.16 on 253 degrees of freedom Residual deviance: 278.89 on 246 degrees of freedom AIC: 1300.1 Number of Fisher Scoring iterations: 1 Theta: 11.35 Std. Err.: 2.71 2 x log-likelihood: -1282.075 #Plotting observed versus predicted values. pdf(file="ObsVPred.pdf", width=4, height=4, family="Times", pointsize=11) par(mar = c(5,5,1,1), pch=1) plot(data$D, fit$fitted.values, main="", ylab=expression(italic(D)[predicted]), xlab=expression(italic(D)[observed])) abline(a=0,b=1, lty=2) lines(lowess(data$D, fit$fitted.values)) dev.off() #This appears to be a decent explanatory model, but as a predictive model it is systematically biased. It is biased high at low observed values of D and biased low at high values observed values of D. -------------- next part -------------- A non-text attachment was scrubbed... Name: ObsVPred.pdf Type: application/pdf Size: 22031 bytes Desc: not available Url : https://stat.ethz.ch/pipermail/r-help/attachments/20070808/c31131a1/attachment.pdf
Matthew and Kim Bowser
2007-Aug-09 15:43 UTC
[R] Systematically biased count data regression model
Dear all, I am attempting to explain patterns of arthropod family richness (count data) using a regression model. It seems to be able to do a pretty good job as an explanatory model (i.e. demonstrating relationships between dependent and independent variables), but it has systematic problems as a predictive model: It is biased high at low observed values of family richness and biased low at high observed values of family richness (see attached pdf). I have tried diverse kinds of reasonable regression models mostly as in Zeileis, et al. (2007), as well as transforming my variables, both with only small improvements. Do you have suggestions for making a model that would perform better as a predictive model? Thank you for your time. Sincerely, Matthew Bowser STEP student USFWS Kenai National Wildlife Refuge Soldotna, Alaska, USA M.Sc. student University of Alaska Fairbanks Fairbankse, Alaska, USA Reference Zeileis, A., C. Kleiber, and S. Jackman, 2007. Regression models for count data in R. Technical Report 53, Department of Statistics and Mathematics, Wirtschaftsuniversit?t Wien, Wien, Austria. URL http://cran.r-project.org/doc/vignettes/pscl/countreg.pdf. Code `data` <- structure(list(D = c(4, 5, 12, 4, 9, 15, 4, 8, 3, 9, 6, 17, 4, 9, 6, 9, 3, 9, 7, 11, 17, 3, 10, 8, 9, 6, 7, 9, 7, 5, 15, 15, 12, 9, 10, 4, 4, 15, 7, 7, 12, 7, 12, 7, 7, 7, 5, 14, 7, 13, 1, 9, 2, 13, 6, 8, 2, 10, 5, 14, 4, 13, 5, 17, 12, 13, 7, 12, 5, 6, 10, 6, 6, 10, 4, 4, 12, 10, 3, 4, 4, 6, 7, 15, 1, 8, 8, 5, 12, 0, 5, 7, 4, 9, 6, 10, 5, 7, 7, 14, 3, 8, 15, 14, 7, 8, 7, 8, 8, 10, 9, 2, 7, 8, 2, 6, 7, 9, 3, 20, 10, 10, 4, 2, 8, 10, 10, 8, 8, 12, 8, 6, 16, 10, 5, 1, 1, 5, 3, 11, 4, 9, 16, 3, 1, 6, 5, 5, 7, 11, 11, 5, 7, 5, 3, 2, 3, 0, 3, 0, 4, 1, 12, 16, 9, 0, 7, 0, 11, 7, 9, 4, 16, 9, 10, 0, 1, 9, 15, 6, 8, 6, 4, 6, 7, 5, 7, 14, 16, 5, 8, 1, 8, 2, 10, 9, 6, 11, 3, 16, 3, 6, 8, 12, 5, 1, 1, 3, 3, 1, 5, 15, 4, 2, 2, 6, 5, 0, 0, 0, 3, 0, 16, 0, 9, 0, 0, 8, 1, 2, 2, 3, 4, 17, 4, 1, 4, 6, 4, 3, 15, 2, 2, 13, 1, 9, 7, 7, 13, 10, 11, 2, 15, 7), Day = c(159, 159, 159, 159, 166, 175, 161, 168, 161, 166, 161, 166, 161, 161, 161, 175, 161, 175, 161, 165, 176, 161, 163, 161, 168, 161, 161, 161, 161, 161, 165, 176, 175, 176, 163, 175, 163, 168, 163, 176, 176, 165, 176, 175, 161, 163, 163, 168, 163, 175, 167, 176, 167, 165, 165, 169, 165, 169, 165, 161, 165, 175, 165, 176, 175, 167, 167, 175, 167, 164, 167, 164, 181, 164, 167, 164, 176, 164, 167, 164, 167, 164, 167, 175, 167, 173, 176, 173, 178, 167, 173, 172, 173, 178, 178, 172, 181, 182, 173, 162, 162, 173, 178, 173, 172, 162, 173, 162, 173, 162, 173, 170, 178, 166, 166, 162, 166, 177, 166, 170, 166, 172, 172, 166, 172, 166, 174, 162, 164, 162, 170, 164, 170, 164, 170, 164, 177, 164, 164, 174, 174, 162, 170, 162, 172, 162, 165, 162, 165, 177, 172, 162, 170, 162, 170, 174, 165, 174, 166, 172, 174, 172, 174, 170, 170, 165, 170, 174, 174, 172, 174, 172, 174, 165, 170, 165, 170, 174, 172, 174, 172, 175, 175, 170, 171, 174, 174, 174, 172, 175, 171, 175, 174, 174, 174, 175, 172, 171, 171, 174, 160, 175, 160, 171, 170, 175, 170, 170, 160, 160, 160, 171, 171, 171, 171, 160, 160, 160, 171, 171, 176, 171, 176, 176, 171, 176, 171, 176, 176, 176, 176, 159, 166, 159, 159, 166, 168, 169, 159, 168, 169, 166, 163, 180, 163, 165, 164, 180, 166, 166, 164, 164, 177, 166), NDVI = c(0.187, 0.2, 0.379, 0.253, 0.356, 0.341, 0.268, 0.431, 0.282, 0.181, 0.243, 0.327, 0.26, 0.232, 0.438, 0.275, 0.169, 0.288, 0.138, 0.404, 0.386, 0.194, 0.266, 0.23, 0.333, 0.234, 0.258, 0.333, 0.234, 0.096, 0.354, 0.394, 0.304, 0.162, 0.565, 0.348, 0.345, 0.226, 0.316, 0.312, 0.333, 0.28, 0.325, 0.243, 0.194, 0.29, 0.221, 0.217, 0.122, 0.289, 0.475, 0.048, 0.416, 0.481, 0.159, 0.238, 0.183, 0.28, 0.32, 0.288, 0.24, 0.287, 0.363, 0.367, 0.24, 0.55, 0.441, 0.34, 0.295, 0.23, 0.32, 0.184, 0.306, 0.232, 0.289, 0.341, 0.221, 0.333, 0.17, 0.139, 0.2, 0.204, 0.301, 0.253, -0.08, 0.309, 0.232, 0.23, 0.239, -0.12, 0.26, 0.285, 0.45, 0.348, 0.396, 0.311, 0.318, 0.31, 0.261, 0.441, 0.147, 0.283, 0.339, 0.224, 0.5, 0.265, 0.2, 0.287, 0.398, 0.116, 0.292, 0.045, 0.137, 0.542, 0.171, 0.38, 0.469, 0.325, 0.139, 0.166, 0.247, 0.253, 0.466, 0.26, 0.288, 0.34, 0.288, 0.26, 0.178, 0.274, 0.358, 0.285, 0.225, 0.162, 0.223, 0.301, -0.398, -0.2, 0.239, 0.228, 0.255, 0.166, 0.306, 0.28, 0.279, 0.208, 0.377, 0.413, 0.489, 0.417, 0.333, 0.208, 0.232, 0.431, 0.283, 0.241, 0.105, 0.18, -0.172, -0.374, 0.25, 0.043, 0.215, 0.204, 0.19, 0.177, -0.106, -0.143, 0.062, 0.462, 0.256, 0.229, 0.314, 0.415, 0.307, 0.238, -0.35, 0.34, 0.275, 0.097, 0.353, 0.214, 0.435, 0.055, -0.289, 0.239, 0.186, 0.135, 0.259, 0.268, 0.258, 0.032, 0.489, 0.389, 0.298, 0.164, 0.325, 0.254, -0.059, 0.524, 0.539, 0.25, 0.175, 0.326, 0.302, -0.047, -0.301, -0.149, 0.358, 0.495, 0.311, 0.235, 0.558, -0.156, 0, 0.146, 0.329, -0.069, -0.352, -0.356, -0.206, -0.179, 0.467, -0.325, 0.39, -0.399, -0.165, 0.267, -0.334, -0.17, 0.58, 0.228, 0.234, 0.351, 0.3, -0.018, 0.125, 0.176, 0.322, 0.246, 0.376, -0.185, 0.342, 0.142, -0.075, 0.186, 0.333, 0.112, 0.272, 0.277, 0.203, 0.37, 0.465, 0.425), VegS = c(14, 11, 18, 21, 31, 20, 11, 17, 10, 15, 27, 8, 17, 13, 16, 9, 10, 16, 10, 15, 11, 9, 14, 11, 11, 10, 24, 18, 12, 6, 25, 21, 25, 8, 14, 18, 11, 16, 20, 16, 10, 16, 18, 14, 13, 11, 15, 23, 11, 28, 12, 17, 12, 18, 10, 15, 7, 15, 9, 16, 18, 16, 20, 18, 12, 19, 16, 18, 20, 15, 24, 9, 15, 9, 16, 14, 17, 14, 7, 9, 9, 12, 13, 15, 14, 11, 17, 8, 14, 15, 12, 8, 10, 12, 8, 16, 15, 22, 16, 21, 10, 15, 20, 14, 27, 21, 19, 22, 21, 11, 13, 10, 13, 14, 9, 22, 22, 20, 12, 16, 20, 19, 26, 14, 13, 23, 14, 22, 19, 15, 28, 16, 20, 25, 10, 19, 0, 10, 8, 11, 17, 13, 17, 23, 37, 32, 19, 26, 12, 11, 24, 11, 21, 25, 8, 15, 21, 31, 17, 0, 12, 6, 23, 19, 29, 14, 9, 0, 18, 23, 20, 15, 15, 17, 27, 17, 2, 24, 17, 16, 26, 11, 23, 24, 10, 26, 21, 12, 20, 12, 29, 22, 20, 16, 41, 19, 27, 28, 10, 35, 28, 23, 14, 5, 23, 15, 17, 12, 11, 24, 11, 14, 7, 7, 27, 17, 15, 10, 16, 2, 11, 21, 18, 15, 8, 17, 10, 18, 15, 18, 31, 9, 14, 11, 19, 22, 8, 19, 17, 18, 25, 11, 17, 32, 25, 18, 21, 19, 35, 14, 29, 9, 28, 14), T = c(13, 15.4, 15.6, 12.3, 12.7, 13.3, 6, 13, 9.2, 17.8, 9.4, 14.2, 8.7, 14, 8.6, 13.9, 9.2, 15.1, 9.4, 16.5, 14, 11.5, 15.5, 13.3, 12.7, 14, 10.5, 14, 10.1, 16.7, 15.2, 11.2, 11.7, 17.9, 13.3, 13.9, 8.7, 16.7, 7.8, 7.9, 10.9, 15.5, 14, 14.1, 14.3, 13.3, 11.6, 16.5, 12.7, 12.6, 8.3, 9, 12.4, 15, 11.8, 14.1, 10.5, 12.4, 10.5, 17.5, 9.2, 16.3, 5.3, 5.9, 11.9, 8.9, 7.7, 15.2, 8.6, 13.2, 9.5, 15.8, 12.5, 13.8, 10.7, 10.5, 7.7, 11, 9.3, 14.6, 12, 15.4, 12.3, 14, 8.3, 19.8, 15.5, 14.3, 9, 7.6, 15.3, 12.8, 14.4, 14, 10.5, 8.9, 13.4, 12.8, 12.9, 11.2, 13.1, 10, 12.4, 15.4, 7.6, 14.9, 13.1, 11.1, 8.6, 13.6, 8.4, 11.5, 12.5, 15.6, 8.3, 9.6, 8.7, 9.7, 10.5, 12.8, 8.6, 12.7, 6.7, 8.5, 9.9, 8.3, 12.8, 9.8, 10.5, 10.7, 9.6, 11.1, 13.7, 9.5, 8.8, 3.4, 10.2, 3.5, 8.4, 11.9, 12.3, 10.2, 13.1, 8.4, 3.1, 7.2, 13.2, 7.6, 11.3, 13.5, 7.8, 5.5, 10.7, 4.8, 7.9, 13.5, 13.5, 2.1, 7.2, 9.7, 12, 9.2, 13.2, 12, 17.1, 7.9, 12.7, 11.8, 16, 6.4, 12.9, 8.1, 12.6, 10.2, 13.3, 8.5, 9.7, 9.9, 18.2, 11, 8.4, 1.5, 7.3, 10.6, 13.6, 8.4, 7.2, 13, 15.5, 9.7, 13.2, 5.9, 9.5, 10.4, 12.9, 2.6, 17.2, 15.4, 10.5, 6.7, 6.6, 7.6, 10.5, 15.6, 10.4, 5.1, 11, 9.7, 4.2, 3.6, 8.5, 11.5, 8.4, 6.9, 11, 10.4, 3.4, 3.2, 5.5, 2.4, 11.2, 2.6, 15.1, 16, 13.7, 10.5, 3.5, 13.4, 11.5, 12.3, 13.9, 14.5, 12.8, 16.8, 16.9, 13.5, 17.2, 12.5, 12.4, 11.8, 12, 10.9, 6.7, 10.9, 2.3, 5.2, 13.1, 12.1, 13.9, 12.9, 7.2, 12.5, 16, 11.7), Hemlock = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0), Alpine = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0), Snow = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("D", "Day", "NDVI", "VegS", "T", "Hemlock", "Alpine", "Snow"), row.names = as.integer(c(NA, 254)), class = "data.frame") #Running a regression. library(MASS) fit <- glm.nb(D ~ Day + NDVI + VegS + T + Hemlock + Alpine + Snow, data = data) summary(fit, correlation = FALSE) Call: glm.nb(formula = D ~ Day + NDVI + VegS + T + Hemlock + Alpine + Snow, data = data, init.theta = 11.3494468596771, link = log) Deviance Residuals: Min 1Q Median 3Q Max -3.7451 -0.7196 -0.1958 0.5389 2.7096 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -2.882684 0.929598 -3.101 0.001929 ** Day 0.020325 0.005540 3.669 0.000244 *** NDVI 1.353361 0.221471 6.111 9.91e-10 *** VegS 0.016731 0.004931 3.393 0.000691 *** T 0.074189 0.009491 7.817 5.42e-15 *** Hemlock -0.588858 0.174980 -3.365 0.000765 *** Alpine -0.452199 0.099296 -4.554 5.26e-06 *** Snow -1.902610 0.735708 -2.586 0.009707 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for Negative Binomial(11.3494) family taken to be 1) Null deviance: 515.16 on 253 degrees of freedom Residual deviance: 278.89 on 246 degrees of freedom AIC: 1300.1 Number of Fisher Scoring iterations: 1 Theta: 11.35 Std. Err.: 2.71 2 x log-likelihood: -1282.075 #Plotting observed versus predicted values. pdf(file="ObsVPred.pdf", width=4, height=4, family="Times", pointsize=11) par(mar = c(5,5,1,1), pch=1) plot(data$D, fit$fitted.values, main="", ylab=expression(italic(D)[predicted]), xlab=expression(italic(D)[observed])) abline(a=0,b=1, lty=2) lines(lowess(data$D, fit$fitted.values)) dev.off() #This appears to be a decent explanatory model, but as a predictive model it is systematically biased. It is biased high at low observed values of D and biased low at high values observed values of D.
paulandpen at optusnet.com.au
2007-Aug-10 01:59 UTC
[R] Systematically biased count data regression model
Matthew, In response to that post, I am afraid I have to disagree. I think a poor model fit (eg 16%) is a reflection of a lot of unmeasured factors and therefore random error in the model. This would explain why overall predictive performance is poor (eg a lot of error in the model) Your situation is different. You are having trouble predicting extreme values, so there is something systematic (eg your model works well in the middle and worse at the tails) not poorly overall. As the post does reflect, you are suffering from error in prediction, that is a fact of life, as others have stated, and most of us who suffer from prediction error experience it at the more extreme values. Thanks Paul> Matthew and Kim Bowser <matthewnkim@gmail.com> wrote: > > Dear all, > > I received a very helpful response from someone who requested > anonymity, but to whom I am grateful. > > PLEASE do not quote my name or email (I am trying to stay off spam > lists) > > Matthew: I think this is just a reflection of > the fact the model does not fit perfectly. The > example below is a simple linear regression that > is highly significant but has R-square of > 16%. This model as well is "biased high at low > observed values of y and biased low at high values observed values of y" > > set.seed(1) > n <- 200 > m <- data.frame(x=rnorm(n,mean=10,sd=2)) > m$y <- m$x + rnorm(n,sd=4) # simulate using intercept 0, slope 1 > f <- lm(y ~ x,data=m) > print(summary(f)) > # > # Call: > # lm(formula = y ~ x, data = m) > # > # Residuals: > # Min 1Q Median 3Q Max > # -11.7310 -2.1709 -0.1009 2.6733 10.3446 > # > # Coefficients: > # Estimate Std. Error t value Pr(>|t|) > # (Intercept) 0.6274 1.5830 0.396 0.692 > # x 0.9538 0.1546 6.170 3.77e-09 *** > # --- > # Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > # > # Residual standard error: 4.052 on 198 degrees of freedom > # Multiple R-Squared: 0.1613, Adjusted R-squared: 0.157 > # F-statistic: 38.07 on 1 and 198 DF, p-value: 3.773e-09 > # > plot(m$y,f$fitted.values,xlab="Observed",ylab="Predicted") > lines(lowess(m$y,f$fitted.values),col="red",lty=2) > abline(c(0,1)) > legend("topleft",lty=c(2,1),col=c("red","black"),legend=c("Loess","45-deg > ree")) > - Show quoted text - > > At 2007-08-09 08:43, Matthew and Kim Bowser wrote: > >Dear all, > > > >I am attempting to explain patterns of arthropod family richness > >(count data) using a regression model. It seems to be able to do a > >pretty good job as an explanatory model (i.e. demonstrating > >relationships between dependent and independent variables), but it has > >systematic problems as a predictive model: It is biased high at low > >observed values of family richness and biased low at high observed > >values of family richness (see attached pdf). I have tried diverse > >kinds of reasonable regression models mostly as in Zeileis, et al. > >(2007), as well as transforming my variables, both with only small > >improvements. > > > >Do you have suggestions for making a model that would perform better > >as a predictive model? > > > >Thank you for your time. > > > >Sincerely, > > > >Matthew Bowser > > > >STEP student > >USFWS Kenai National Wildlife Refuge > >Soldotna, Alaska, USA > > > >M.Sc. student > >University of Alaska Fairbanks > >Fairbankse, Alaska, USA > > > >Reference > > > >Zeileis, A., C. Kleiber, and S. Jackman, 2007. Regression models for > >count data in R. Technical Report 53, Department of Statistics and > >Mathematics, Wirtschaftsuniversität Wien, Wien, Austria. URL > >http://cran.r-project.org/doc/vignettes/pscl/countreg.pdf. > > > >[snip] > > > > > >#This appears to be a decent explanatory model, but as a predictive > >model it is systematically biased. It is biased high at low observed > >values of D and biased low at high values observed values of D. > > > - Show quoted text - > > On 8/9/07, Matthew and Kim Bowser <matthewnkim@gmail.com> wrote: > > Dear all, > > > > I am attempting to explain patterns of arthropod family richness > > (count data) using a regression model. It seems to be able to do a > > pretty good job as an explanatory model (i.e. demonstrating > > relationships between dependent and independent variables), but it has > > systematic problems as a predictive model: It is biased high at low > > observed values of family richness and biased low at high observed > > values of family richness (see attached pdf). I have tried diverse > > kinds of reasonable regression models mostly as in Zeileis, et al. > > (2007), as well as transforming my variables, both with only small > > improvements. > > > > Do you have suggestions for making a model that would perform better > > as a predictive model? > > > > Thank you for your time. > > > > Sincerely, > > > > Matthew Bowser > > > > STEP student > > USFWS Kenai National Wildlife Refuge > > Soldotna, Alaska, USA > > > > M.Sc. student > > University of Alaska Fairbanks > > Fairbankse, Alaska, USA > > > > Reference > > > > Zeileis, A., C. Kleiber, and S. Jackman, 2007. Regression models for > > count data in R. Technical Report 53, Department of Statistics and > > Mathematics, Wirtschaftsuniversität Wien, Wien, Austria. URL > > http://cran.r-project.org/doc/vignettes/pscl/countreg.pdf. > > > > Code > > > > `data` <- > > structure(list(D = c(4, 5, 12, 4, 9, 15, 4, 8, 3, 9, 6, 17, 4, > > 9, 6, 9, 3, 9, 7, 11, 17, 3, 10, 8, 9, 6, 7, 9, 7, 5, 15, 15, > > 12, 9, 10, 4, 4, 15, 7, 7, 12, 7, 12, 7, 7, 7, 5, 14, 7, 13, > > 1, 9, 2, 13, 6, 8, 2, 10, 5, 14, 4, 13, 5, 17, 12, 13, 7, 12, > > 5, 6, 10, 6, 6, 10, 4, 4, 12, 10, 3, 4, 4, 6, 7, 15, 1, 8, 8, > > 5, 12, 0, 5, 7, 4, 9, 6, 10, 5, 7, 7, 14, 3, 8, 15, 14, 7, 8, > > 7, 8, 8, 10, 9, 2, 7, 8, 2, 6, 7, 9, 3, 20, 10, 10, 4, 2, 8, > > 10, 10, 8, 8, 12, 8, 6, 16, 10, 5, 1, 1, 5, 3, 11, 4, 9, 16, > > 3, 1, 6, 5, 5, 7, 11, 11, 5, 7, 5, 3, 2, 3, 0, 3, 0, 4, 1, 12, > > 16, 9, 0, 7, 0, 11, 7, 9, 4, 16, 9, 10, 0, 1, 9, 15, 6, 8, 6, > > 4, 6, 7, 5, 7, 14, 16, 5, 8, 1, 8, 2, 10, 9, 6, 11, 3, 16, 3, > > 6, 8, 12, 5, 1, 1, 3, 3, 1, 5, 15, 4, 2, 2, 6, 5, 0, 0, 0, 3, > > 0, 16, 0, 9, 0, 0, 8, 1, 2, 2, 3, 4, 17, 4, 1, 4, 6, 4, 3, 15, > > 2, 2, 13, 1, 9, 7, 7, 13, 10, 11, 2, 15, 7), Day = c(159, 159, > > 159, 159, 166, 175, 161, 168, 161, 166, 161, 166, 161, 161, 161, > > 175, 161, 175, 161, 165, 176, 161, 163, 161, 168, 161, 161, 161, > > 161, 161, 165, 176, 175, 176, 163, 175, 163, 168, 163, 176, 176, > > 165, 176, 175, 161, 163, 163, 168, 163, 175, 167, 176, 167, 165, > > 165, 169, 165, 169, 165, 161, 165, 175, 165, 176, 175, 167, 167, > > 175, 167, 164, 167, 164, 181, 164, 167, 164, 176, 164, 167, 164, > > 167, 164, 167, 175, 167, 173, 176, 173, 178, 167, 173, 172, 173, > > 178, 178, 172, 181, 182, 173, 162, 162, 173, 178, 173, 172, 162, > > 173, 162, 173, 162, 173, 170, 178, 166, 166, 162, 166, 177, 166, > > 170, 166, 172, 172, 166, 172, 166, 174, 162, 164, 162, 170, 164, > > 170, 164, 170, 164, 177, 164, 164, 174, 174, 162, 170, 162, 172, > > 162, 165, 162, 165, 177, 172, 162, 170, 162, 170, 174, 165, 174, > > 166, 172, 174, 172, 174, 170, 170, 165, 170, 174, 174, 172, 174, > > 172, 174, 165, 170, 165, 170, 174, 172, 174, 172, 175, 175, 170, > > 171, 174, 174, 174, 172, 175, 171, 175, 174, 174, 174, 175, 172, > > 171, 171, 174, 160, 175, 160, 171, 170, 175, 170, 170, 160, 160, > > 160, 171, 171, 171, 171, 160, 160, 160, 171, 171, 176, 171, 176, > > 176, 171, 176, 171, 176, 176, 176, 176, 159, 166, 159, 159, 166, > > 168, 169, 159, 168, 169, 166, 163, 180, 163, 165, 164, 180, 166, > > 166, 164, 164, 177, 166), NDVI = c(0.187, 0.2, 0.379, 0.253, > > 0.356, 0.341, 0.268, 0.431, 0.282, 0.181, 0.243, 0.327, 0.26, > > 0.232, 0.438, 0.275, 0.169, 0.288, 0.138, 0.404, 0.386, 0.194, > > 0.266, 0.23, 0.333, 0.234, 0.258, 0.333, 0.234, 0.096, 0.354, > > 0.394, 0.304, 0.162, 0.565, 0.348, 0.345, 0.226, 0.316, 0.312, > > 0.333, 0.28, 0.325, 0.243, 0.194, 0.29, 0.221, 0.217, 0.122, > > 0.289, 0.475, 0.048, 0.416, 0.481, 0.159, 0.238, 0.183, 0.28, > > 0.32, 0.288, 0.24, 0.287, 0.363, 0.367, 0.24, 0.55, 0.441, 0.34, > > 0.295, 0.23, 0.32, 0.184, 0.306, 0.232, 0.289, 0.341, 0.221, > > 0.333, 0.17, 0.139, 0.2, 0.204, 0.301, 0.253, -0.08, 0.309, 0.232, > > 0.23, 0.239, -0.12, 0.26, 0.285, 0.45, 0.348, 0.396, 0.311, 0.318, > > 0.31, 0.261, 0.441, 0.147, 0.283, 0.339, 0.224, 0.5, 0.265, 0.2, > > 0.287, 0.398, 0.116, 0.292, 0.045, 0.137, 0.542, 0.171, 0.38, > > 0.469, 0.325, 0.139, 0.166, 0.247, 0.253, 0.466, 0.26, 0.288, > > 0.34, 0.288, 0.26, 0.178, 0.274, 0.358, 0.285, 0.225, 0.162, > > 0.223, 0.301, -0.398, -0.2, 0.239, 0.228, 0.255, 0.166, 0.306, > > 0.28, 0.279, 0.208, 0.377, 0.413, 0.489, 0.417, 0.333, 0.208, > > 0.232, 0.431, 0.283, 0.241, 0.105, 0.18, -0.172, -0.374, 0.25, > > 0.043, 0.215, 0.204, 0.19, 0.177, -0.106, -0.143, 0.062, 0.462, > > 0.256, 0.229, 0.314, 0.415, 0.307, 0.238, -0.35, 0.34, 0.275, > > 0.097, 0.353, 0.214, 0.435, 0.055, -0.289, 0.239, 0.186, 0.135, > > 0.259, 0.268, 0.258, 0.032, 0.489, 0.389, 0.298, 0.164, 0.325, > > 0.254, -0.059, 0.524, 0.539, 0.25, 0.175, 0.326, 0.302, -0.047, > > -0.301, -0.149, 0.358, 0.495, 0.311, 0.235, 0.558, -0.156, 0, > > 0.146, 0.329, -0.069, -0.352, -0.356, -0.206, -0.179, 0.467, > > -0.325, 0.39, -0.399, -0.165, 0.267, -0.334, -0.17, 0.58, 0.228, > > 0.234, 0.351, 0.3, -0.018, 0.125, 0.176, 0.322, 0.246, 0.376, > > -0.185, 0.342, 0.142, -0.075, 0.186, 0.333, 0.112, 0.272, 0.277, > > 0.203, 0.37, 0.465, 0.425), VegS = c(14, 11, 18, 21, 31, 20, > > 11, 17, 10, 15, 27, 8, 17, 13, 16, 9, 10, 16, 10, 15, 11, 9, > > 14, 11, 11, 10, 24, 18, 12, 6, 25, 21, 25, 8, 14, 18, 11, 16, > > 20, 16, 10, 16, 18, 14, 13, 11, 15, 23, 11, 28, 12, 17, 12, 18, > > 10, 15, 7, 15, 9, 16, 18, 16, 20, 18, 12, 19, 16, 18, 20, 15, > > 24, 9, 15, 9, 16, 14, 17, 14, 7, 9, 9, 12, 13, 15, 14, 11, 17, > > 8, 14, 15, 12, 8, 10, 12, 8, 16, 15, 22, 16, 21, 10, 15, 20, > > 14, 27, 21, 19, 22, 21, 11, 13, 10, 13, 14, 9, 22, 22, 20, 12, > > 16, 20, 19, 26, 14, 13, 23, 14, 22, 19, 15, 28, 16, 20, 25, 10, > > 19, 0, 10, 8, 11, 17, 13, 17, 23, 37, 32, 19, 26, 12, 11, 24, > > 11, 21, 25, 8, 15, 21, 31, 17, 0, 12, 6, 23, 19, 29, 14, 9, 0, > > 18, 23, 20, 15, 15, 17, 27, 17, 2, 24, 17, 16, 26, 11, 23, 24, > > 10, 26, 21, 12, 20, 12, 29, 22, 20, 16, 41, 19, 27, 28, 10, 35, > > 28, 23, 14, 5, 23, 15, 17, 12, 11, 24, 11, 14, 7, 7, 27, 17, > > 15, 10, 16, 2, 11, 21, 18, 15, 8, 17, 10, 18, 15, 18, 31, 9, > > 14, 11, 19, 22, 8, 19, 17, 18, 25, 11, 17, 32, 25, 18, 21, 19, > > 35, 14, 29, 9, 28, 14), T = c(13, 15.4, 15.6, 12.3, 12.7, 13.3, > > 6, 13, 9.2, 17.8, 9.4, 14.2, 8.7, 14, 8.6, 13.9, 9.2, 15.1, 9.4, > > 16.5, 14, 11.5, 15.5, 13.3, 12.7, 14, 10.5, 14, 10.1, 16.7, 15.2, > > 11.2, 11.7, 17.9, 13.3, 13.9, 8.7, 16.7, 7.8, 7.9, 10.9, 15.5, > > 14, 14.1, 14.3, 13.3, 11.6, 16.5, 12.7, 12.6, 8.3, 9, 12.4, 15, > > 11.8, 14.1, 10.5, 12.4, 10.5, 17.5, 9.2, 16.3, 5.3, 5.9, 11.9, > > 8.9, 7.7, 15.2, 8.6, 13.2, 9.5, 15.8, 12.5, 13.8, 10.7, 10.5, > > 7.7, 11, 9.3, 14.6, 12, 15.4, 12.3, 14, 8.3, 19.8, 15.5, 14.3, > > 9, 7.6, 15.3, 12.8, 14.4, 14, 10.5, 8.9, 13.4, 12.8, 12.9, 11.2, > > 13.1, 10, 12.4, 15.4, 7.6, 14.9, 13.1, 11.1, 8.6, 13.6, 8.4, > > 11.5, 12.5, 15.6, 8.3, 9.6, 8.7, 9.7, 10.5, 12.8, 8.6, 12.7, > > 6.7, 8.5, 9.9, 8.3, 12.8, 9.8, 10.5, 10.7, 9.6, 11.1, 13.7, 9.5, > > 8.8, 3.4, 10.2, 3.5, 8.4, 11.9, 12.3, 10.2, 13.1, 8.4, 3.1, 7.2, > > 13.2, 7.6, 11.3, 13.5, 7.8, 5.5, 10.7, 4.8, 7.9, 13.5, 13.5, > > 2.1, 7.2, 9.7, 12, 9.2, 13.2, 12, 17.1, 7.9, 12.7, 11.8, 16, > > 6.4, 12.9, 8.1, 12.6, 10.2, 13.3, 8.5, 9.7, 9.9, 18.2, 11, 8.4, > > 1.5, 7.3, 10.6, 13.6, 8.4, 7.2, 13, 15.5, 9.7, 13.2, 5.9, 9.5, > > 10.4, 12.9, 2.6, 17.2, 15.4, 10.5, 6.7, 6.6, 7.6, 10.5, 15.6, > > 10.4, 5.1, 11, 9.7, 4.2, 3.6, 8.5, 11.5, 8.4, 6.9, 11, 10.4, > > 3.4, 3.2, 5.5, 2.4, 11.2, 2.6, 15.1, 16, 13.7, 10.5, 3.5, 13.4, > > 11.5, 12.3, 13.9, 14.5, 12.8, 16.8, 16.9, 13.5, 17.2, 12.5, 12.4, > > 11.8, 12, 10.9, 6.7, 10.9, 2.3, 5.2, 13.1, 12.1, 13.9, 12.9, > > 7.2, 12.5, 16, 11.7), Hemlock = c(0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, > > 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0), Alpine = c(0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, > > 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, > > 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, > > 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, > > 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, > > 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0), > > Snow = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, > > 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("D", "Day", "NDVI", > > "VegS", "T", "Hemlock", "Alpine", "Snow"), row.names = > as.integer(c(NA, > > 254)), class = "data.frame") > > > > > > #Running a regression. > > library(MASS) > > fit <- glm.nb(D ~ Day + NDVI + VegS + T + Hemlock + Alpine + Snow, > data = data) > > summary(fit, correlation = FALSE) > > > > Call: > > glm.nb(formula = D ~ Day + NDVI + VegS + T + Hemlock + Alpine + > > Snow, data = data, init.theta = 11.3494468596771, link = log) > > > > Deviance Residuals: > > Min 1Q Median 3Q Max > > -3.7451 -0.7196 -0.1958 0.5389 2.7096 > > > > Coefficients: > > Estimate Std. Error z value Pr(>|z|) > > (Intercept) -2.882684 0.929598 -3.101 0.001929 ** > > Day 0.020325 0.005540 3.669 0.000244 *** > > NDVI 1.353361 0.221471 6.111 9.91e-10 *** > > VegS 0.016731 0.004931 3.393 0.000691 *** > > T 0.074189 0.009491 7.817 5.42e-15 *** > > Hemlock -0.588858 0.174980 -3.365 0.000765 *** > > Alpine -0.452199 0.099296 -4.554 5.26e-06 *** > > Snow -1.902610 0.735708 -2.586 0.009707 ** > > --- > > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > > > (Dispersion parameter for Negative Binomial(11.3494) family taken to > be 1) > > > > Null deviance: 515.16 on 253 degrees of freedom > > Residual deviance: 278.89 on 246 degrees of freedom > > AIC: 1300.1 > > > > Number of Fisher Scoring iterations: 1 > > > > > > Theta: 11.35 > > Std. Err.: 2.71 > > > > 2 x log-likelihood: -1282.075 > > > > > > #Plotting observed versus predicted values. > > pdf(file="ObsVPred.pdf", width=4, height=4, family="Times", > pointsize=11) > > par(mar = c(5,5,1,1), pch=1) > > plot(data$D, fit$fitted.values, main="", > > ylab=expression(italic(D)[predicted]), > > xlab=expression(italic(D)[observed])) > > abline(a=0,b=1, lty=2) > > lines(lowess(data$D, fit$fitted.values)) > > dev.off() > > > > > > #This appears to be a decent explanatory model, but as a predictive > > model it is systematically biased. It is biased high at low observed > > values of D and biased low at high values observed values of D. > > > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.