Camarda, Carlo Giovanni
2006-Feb-27 20:23 UTC
[R] Different deviance residuals in a (similar?!?) glm example
Dear R-users,
I would like to show you a simple example that gives an overview of one
of my current issue.
Although my working setting implies a different parametric model (which
cannot be framed in the glm), I guess that what I'll get from the
following example it would help for the next steps.
Anyway here it is.
Firstly I simulated from a series of exposures, a series of deaths
(given a model, Gompertz, and a probability distribution, Poisson). Then
a multiply both deaths and exposures by a constant. Finally I fitted
with glm the simulated data sets and I compared the deviance residuals.
They're different by a certain constant, but I could not get a
meaningful relationship between the used constants (a scatter plot of
them is given at the end).
The following example tried to be as completed as possible and I hope
that it will be clearer than my previous words.
Thanks,
Carlo Giovanni Camarda
#### simulation procedure
age <- 50:100 # time range
# parameters
alpha.sim <- 0.00005
beta.sim <- 0.1
# simulated hazard from a Gompertz
hazard.sim <- alpha.sim*(exp(beta.sim*age))
# first dataset
exposures <- c(4748, 4461, 4473, 4326, 4259, 4219, 4049, 3965, 3801,
3818, 3670, 3521, 3537,
3482, 3347, 3180, 3100, 2890, 2755, 2622, 2530, 2502,
2293, 2216, 1986, 1875,
1693, 1550, 1395, 1295, 1104, 952, 792, 755, 726, 608,
523, 419, 338,
246, 205, 148, 112, 75, 52, 32, 23, 15, 7, 6, 3) # just
as example
deaths.sim <- rpois(length(age), exposures*hazard.sim) # simulating
deaths from a poisson distribution (Brillinger, 1986)
my.offset <- log(exposures) # offset for the poisson regression
# new dataset: decupleing the sample size
deaths.sim1 <- deaths.sim * 10
exposures1 <- exposures * 10
my.offset1 <- log(exposures1)
# fitting the first dataset
fit <- glm(formula = deaths.sim ~ age + offset(my.offset),
family = poisson(link = "log"))
res <- residuals(fit, type="deviance")
# fitting the new dataset
fit1 <- glm(formula = deaths.sim1 ~ age + offset(my.offset1),
family = poisson(link = "log"))
res1 <- residuals(fit1, type="deviance")
# estimating hazard
hazard.act <- deaths.sim/exposures # == deaths.sim1/exposures1
hazard.fit <- predict(fit, type="response") / exposures
hazard.fit1 <- predict(fit1, type="response") / exposures1
# plotting log-hazard
plot(age, log(hazard.sim), cex=1.2)
lines(age, log(hazard.act), col="red", lwd=2)
lines(age, log(hazard.fit), col="blue", lwd=2, lty=2)
lines(age, log(hazard.fit1), col="green", lwd=2, lty=3)
all(round(hazard.fit,5)==round(hazard.fit1,5)) ## TRUE
# plotting residuals
plot(age, res, cex=1.2, lwd=2, ylim=range(res, res1))
points(age, res1, col="red", cex=1.2, lwd=2, pch=2)
all(round(res,5)==round(res1,5)) ## FALSE
# looking at the ratio
ratio <- (res/res1)[1]
ratio
all(round(res,10)==round(res1*ratio,10))
# here the scatterplot resid-ratios vs multiplier-constant
const.a <- c(1:16) # constants which I tried to multiply by
const.b <-
c(1,0.7071068,0.5773503,0.5,0.4472136,0.4082483,0.3779645,0.3535534, #
ratio between the deviance residuals
0.3333333,0.3162278,0.3015113,0.2886751,0.2773501,0.2672612,0.2581989,0.
25)
plot(const.a, const.b, pch=16)
+++++
This mail has been sent through the MPI for Demographic Rese...{{dropped}}
Prof Brian Ripley
2006-Feb-28 13:41 UTC
[R] Different deviance residuals in a (similar?!?) glm example
This is a Poisson regression. You cannot just multiply counts by 10 and have a valid sample from a Poisson distribution with 10x the mean. So the example (and the calculations below) make zero statistical sense. For a poisson() family, $dev.resids function (y, mu, wt) 2 * wt * (y * log(ifelse(y == 0, 1, y/mu)) - (y - mu)) So if you multiply the counts by 10 and change the offset to multiply the mu by 10, (y - log(y/mu) - (y - mu)) is multiplied by 10. Hence the deviance residuals are scaled by sqrt(10), and res1/res is sqrt(10).> plot(const.a, const.b, pch=16) > lines(const.a, 1/sqrt(const.a))indicates a perfect fit. On Mon, 27 Feb 2006, Camarda, Carlo Giovanni wrote:> Dear R-users, > I would like to show you a simple example that gives an overview of one > of my current issue. > Although my working setting implies a different parametric model (which > cannot be framed in the glm), I guess that what I'll get from the > following example it would help for the next steps. > Anyway here it is. > Firstly I simulated from a series of exposures, a series of deaths > (given a model, Gompertz, and a probability distribution, Poisson). Then > a multiply both deaths and exposures by a constant. Finally I fitted > with glm the simulated data sets and I compared the deviance residuals. > They're different by a certain constant, but I could not get a > meaningful relationship between the used constants (a scatter plot of > them is given at the end). > The following example tried to be as completed as possible and I hope > that it will be clearer than my previous words. > Thanks, > Carlo Giovanni Camarda > > #### simulation procedure > age <- 50:100 # time range > # parameters > alpha.sim <- 0.00005 > beta.sim <- 0.1 > # simulated hazard from a Gompertz > hazard.sim <- alpha.sim*(exp(beta.sim*age)) > # first dataset > exposures <- c(4748, 4461, 4473, 4326, 4259, 4219, 4049, 3965, 3801, > 3818, 3670, 3521, 3537, > 3482, 3347, 3180, 3100, 2890, 2755, 2622, 2530, 2502, > 2293, 2216, 1986, 1875, > 1693, 1550, 1395, 1295, 1104, 952, 792, 755, 726, 608, > 523, 419, 338, > 246, 205, 148, 112, 75, 52, 32, 23, 15, 7, 6, 3) # just > as example > deaths.sim <- rpois(length(age), exposures*hazard.sim) # simulating > deaths from a poisson distribution (Brillinger, 1986) > my.offset <- log(exposures) # offset for the poisson regression > # new dataset: decupleing the sample size > deaths.sim1 <- deaths.sim * 10 > exposures1 <- exposures * 10 > my.offset1 <- log(exposures1) > # fitting the first dataset > fit <- glm(formula = deaths.sim ~ age + offset(my.offset), > family = poisson(link = "log")) > res <- residuals(fit, type="deviance") > # fitting the new dataset > fit1 <- glm(formula = deaths.sim1 ~ age + offset(my.offset1), > family = poisson(link = "log")) > res1 <- residuals(fit1, type="deviance") > # estimating hazard > hazard.act <- deaths.sim/exposures # == deaths.sim1/exposures1 > hazard.fit <- predict(fit, type="response") / exposures > hazard.fit1 <- predict(fit1, type="response") / exposures1 > # plotting log-hazard > plot(age, log(hazard.sim), cex=1.2) > lines(age, log(hazard.act), col="red", lwd=2) > lines(age, log(hazard.fit), col="blue", lwd=2, lty=2) > lines(age, log(hazard.fit1), col="green", lwd=2, lty=3) > all(round(hazard.fit,5)==round(hazard.fit1,5)) ## TRUE > # plotting residuals > plot(age, res, cex=1.2, lwd=2, ylim=range(res, res1)) > points(age, res1, col="red", cex=1.2, lwd=2, pch=2) > all(round(res,5)==round(res1,5)) ## FALSE > # looking at the ratio > ratio <- (res/res1)[1] > ratio > all(round(res,10)==round(res1*ratio,10)) > # here the scatterplot resid-ratios vs multiplier-constant > const.a <- c(1:16) # constants which I tried to multiply by > const.b <- > c(1,0.7071068,0.5773503,0.5,0.4472136,0.4082483,0.3779645,0.3535534, # > ratio between the deviance residuals > > 0.3333333,0.3162278,0.3015113,0.2886751,0.2773501,0.2672612,0.2581989,0. > 25) > plot(const.a, const.b, pch=16)-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595