Camarda, Carlo Giovanni
2006-Feb-27 20:23 UTC
[R] Different deviance residuals in a (similar?!?) glm example
Dear R-users, I would like to show you a simple example that gives an overview of one of my current issue. Although my working setting implies a different parametric model (which cannot be framed in the glm), I guess that what I'll get from the following example it would help for the next steps. Anyway here it is. Firstly I simulated from a series of exposures, a series of deaths (given a model, Gompertz, and a probability distribution, Poisson). Then a multiply both deaths and exposures by a constant. Finally I fitted with glm the simulated data sets and I compared the deviance residuals. They're different by a certain constant, but I could not get a meaningful relationship between the used constants (a scatter plot of them is given at the end). The following example tried to be as completed as possible and I hope that it will be clearer than my previous words. Thanks, Carlo Giovanni Camarda #### simulation procedure age <- 50:100 # time range # parameters alpha.sim <- 0.00005 beta.sim <- 0.1 # simulated hazard from a Gompertz hazard.sim <- alpha.sim*(exp(beta.sim*age)) # first dataset exposures <- c(4748, 4461, 4473, 4326, 4259, 4219, 4049, 3965, 3801, 3818, 3670, 3521, 3537, 3482, 3347, 3180, 3100, 2890, 2755, 2622, 2530, 2502, 2293, 2216, 1986, 1875, 1693, 1550, 1395, 1295, 1104, 952, 792, 755, 726, 608, 523, 419, 338, 246, 205, 148, 112, 75, 52, 32, 23, 15, 7, 6, 3) # just as example deaths.sim <- rpois(length(age), exposures*hazard.sim) # simulating deaths from a poisson distribution (Brillinger, 1986) my.offset <- log(exposures) # offset for the poisson regression # new dataset: decupleing the sample size deaths.sim1 <- deaths.sim * 10 exposures1 <- exposures * 10 my.offset1 <- log(exposures1) # fitting the first dataset fit <- glm(formula = deaths.sim ~ age + offset(my.offset), family = poisson(link = "log")) res <- residuals(fit, type="deviance") # fitting the new dataset fit1 <- glm(formula = deaths.sim1 ~ age + offset(my.offset1), family = poisson(link = "log")) res1 <- residuals(fit1, type="deviance") # estimating hazard hazard.act <- deaths.sim/exposures # == deaths.sim1/exposures1 hazard.fit <- predict(fit, type="response") / exposures hazard.fit1 <- predict(fit1, type="response") / exposures1 # plotting log-hazard plot(age, log(hazard.sim), cex=1.2) lines(age, log(hazard.act), col="red", lwd=2) lines(age, log(hazard.fit), col="blue", lwd=2, lty=2) lines(age, log(hazard.fit1), col="green", lwd=2, lty=3) all(round(hazard.fit,5)==round(hazard.fit1,5)) ## TRUE # plotting residuals plot(age, res, cex=1.2, lwd=2, ylim=range(res, res1)) points(age, res1, col="red", cex=1.2, lwd=2, pch=2) all(round(res,5)==round(res1,5)) ## FALSE # looking at the ratio ratio <- (res/res1)[1] ratio all(round(res,10)==round(res1*ratio,10)) # here the scatterplot resid-ratios vs multiplier-constant const.a <- c(1:16) # constants which I tried to multiply by const.b <- c(1,0.7071068,0.5773503,0.5,0.4472136,0.4082483,0.3779645,0.3535534, # ratio between the deviance residuals 0.3333333,0.3162278,0.3015113,0.2886751,0.2773501,0.2672612,0.2581989,0. 25) plot(const.a, const.b, pch=16) +++++ This mail has been sent through the MPI for Demographic Rese...{{dropped}}
Prof Brian Ripley
2006-Feb-28 13:41 UTC
[R] Different deviance residuals in a (similar?!?) glm example
This is a Poisson regression. You cannot just multiply counts by 10 and have a valid sample from a Poisson distribution with 10x the mean. So the example (and the calculations below) make zero statistical sense. For a poisson() family, $dev.resids function (y, mu, wt) 2 * wt * (y * log(ifelse(y == 0, 1, y/mu)) - (y - mu)) So if you multiply the counts by 10 and change the offset to multiply the mu by 10, (y - log(y/mu) - (y - mu)) is multiplied by 10. Hence the deviance residuals are scaled by sqrt(10), and res1/res is sqrt(10).> plot(const.a, const.b, pch=16) > lines(const.a, 1/sqrt(const.a))indicates a perfect fit. On Mon, 27 Feb 2006, Camarda, Carlo Giovanni wrote:> Dear R-users, > I would like to show you a simple example that gives an overview of one > of my current issue. > Although my working setting implies a different parametric model (which > cannot be framed in the glm), I guess that what I'll get from the > following example it would help for the next steps. > Anyway here it is. > Firstly I simulated from a series of exposures, a series of deaths > (given a model, Gompertz, and a probability distribution, Poisson). Then > a multiply both deaths and exposures by a constant. Finally I fitted > with glm the simulated data sets and I compared the deviance residuals. > They're different by a certain constant, but I could not get a > meaningful relationship between the used constants (a scatter plot of > them is given at the end). > The following example tried to be as completed as possible and I hope > that it will be clearer than my previous words. > Thanks, > Carlo Giovanni Camarda > > #### simulation procedure > age <- 50:100 # time range > # parameters > alpha.sim <- 0.00005 > beta.sim <- 0.1 > # simulated hazard from a Gompertz > hazard.sim <- alpha.sim*(exp(beta.sim*age)) > # first dataset > exposures <- c(4748, 4461, 4473, 4326, 4259, 4219, 4049, 3965, 3801, > 3818, 3670, 3521, 3537, > 3482, 3347, 3180, 3100, 2890, 2755, 2622, 2530, 2502, > 2293, 2216, 1986, 1875, > 1693, 1550, 1395, 1295, 1104, 952, 792, 755, 726, 608, > 523, 419, 338, > 246, 205, 148, 112, 75, 52, 32, 23, 15, 7, 6, 3) # just > as example > deaths.sim <- rpois(length(age), exposures*hazard.sim) # simulating > deaths from a poisson distribution (Brillinger, 1986) > my.offset <- log(exposures) # offset for the poisson regression > # new dataset: decupleing the sample size > deaths.sim1 <- deaths.sim * 10 > exposures1 <- exposures * 10 > my.offset1 <- log(exposures1) > # fitting the first dataset > fit <- glm(formula = deaths.sim ~ age + offset(my.offset), > family = poisson(link = "log")) > res <- residuals(fit, type="deviance") > # fitting the new dataset > fit1 <- glm(formula = deaths.sim1 ~ age + offset(my.offset1), > family = poisson(link = "log")) > res1 <- residuals(fit1, type="deviance") > # estimating hazard > hazard.act <- deaths.sim/exposures # == deaths.sim1/exposures1 > hazard.fit <- predict(fit, type="response") / exposures > hazard.fit1 <- predict(fit1, type="response") / exposures1 > # plotting log-hazard > plot(age, log(hazard.sim), cex=1.2) > lines(age, log(hazard.act), col="red", lwd=2) > lines(age, log(hazard.fit), col="blue", lwd=2, lty=2) > lines(age, log(hazard.fit1), col="green", lwd=2, lty=3) > all(round(hazard.fit,5)==round(hazard.fit1,5)) ## TRUE > # plotting residuals > plot(age, res, cex=1.2, lwd=2, ylim=range(res, res1)) > points(age, res1, col="red", cex=1.2, lwd=2, pch=2) > all(round(res,5)==round(res1,5)) ## FALSE > # looking at the ratio > ratio <- (res/res1)[1] > ratio > all(round(res,10)==round(res1*ratio,10)) > # here the scatterplot resid-ratios vs multiplier-constant > const.a <- c(1:16) # constants which I tried to multiply by > const.b <- > c(1,0.7071068,0.5773503,0.5,0.4472136,0.4082483,0.3779645,0.3535534, # > ratio between the deviance residuals > > 0.3333333,0.3162278,0.3015113,0.2886751,0.2773501,0.2672612,0.2581989,0. > 25) > plot(const.a, const.b, pch=16)-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595