Full_Name: Carmen Fernandez Version: 1.3.1 OS: Submission from: (NULL) (138.251.202.115) I think there is a problem when computing Pearson residuals, in that they seem to be computed at the raw residuals divided by the square root of the corresponding diagonal element of the weight matrix W evaluated at the last step of the iterative model fitting procedure (IWLS), instead of dividing by the square root of the model variance. Here's an example of the weird effects it produces:> me <- c(1:100)/2 > y <- rpois(100,me) > yglm <- glm(y~me-1, family=poisson(link="identity")) > plot(fitted(yglm), residuals(yglm,type="response"))plots the raw residuals... but> plot(fitted(yglm), residuals(yglm,type="pearson"))plots the raw residuals MULTIPLIED by sqrt(fitted(yglm))... because it computes the Pearson residuals dividing by the square root of the diagonal element of the weight matrix W (at the end of the iterative fitting process) rather than dividing by the sqrt of variance. -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
carmen@mcs.st-and.ac.uk writes:> I think there is a problem when computing Pearson residuals, in that they seem > to be computed at the raw residuals divided by the square root of the > corresponding diagonal element of the weight matrix W evaluated at the last step > of the iterative model fitting procedure (IWLS), instead of dividing by the > square root of the model variance. Here's an example of the weird effects it > produces: > > > me <- c(1:100)/2 > > y <- rpois(100,me) > > yglm <- glm(y~me-1, family=poisson(link="identity")) > > plot(fitted(yglm), residuals(yglm,type="response")) > > plots the raw residuals... but > > > plot(fitted(yglm), residuals(yglm,type="pearson")) > > plots the raw residuals MULTIPLIED by sqrt(fitted(yglm))... because it > computes the Pearson residuals dividing by the square root of the diagonal > element of the weight matrix W (at the end of the iterative fitting process) > rather than dividing by the sqrt of variance.Um, right... At the very least, there's a "/" that should have been "*". But do we also divide by the dispersion (when it is estimated)? I forget what the definition says. -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45) 35327907 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
[PD blundered when replying to bug report...] "Carmen Fernandez" <carmen@mcs.st-and.ac.uk> writes:> No Peter, the iterative weighting matrix is diagonal with elements equal to > the INVERSE of > > V[Y_i] * (g'(mu_i))^2 > > which is why you get into trouble. McCullagh and Nelder (but also any other > book on GLMs) is a good reference for this. I'm still pretty sure that > earlier versions of R were fine.We can easily agree that it's not right... The culprit is this revision 1.54 date: 2001/06/07 17:43:06; author: tlumley; state: Exp; lines: +1 -1 wrong sign of pearson residual for inverse link (PR#862) diff -u -r1.53 -r1.54 --- src/library/base/R/glm.R 1 Jun 2001 11:48:34 -0000 1.53 +++ src/library/base/R/glm.R 7 Jun 2001 17:43:06 -0000 1.54 @@ -651,7 +651,7 @@ d.res <- sqrt(pmax((object$family$dev.resids)(y, mu, wts), 0)) ifelse(y > mu, d.res, -d.res) } else rep(0, length(mu)), - pearson = r * sqrt(object$weights), + pearson = (y-mu)/sqrt(object$weights), working = r, response = y - mu, Which is listed as Thomas' doing, but I suspect it was discussed internally, and thus a collective piece of absentmindedness. What we have in glm.fit is w <- sqrt((weights[good] * mu.eta.val[good]^2)/variance(mu)[good]) which later gets squared before it is returned, so the weights are essentially mu.eta.val^2/variance(mu) which is what you are saying since the link derivatives are the "backwards" d(mu)/d(eta). The thing that is confusing is that for a loglinear Poisson model this comes out as mu^2/mu = mu. AFAIR, similar cancellation happens in all canonical GLIMs. For the identity link, it comes out as 1/mu, which is why it seemingly worked to switch from divide to multiply.... (And of course in ordinary weighted regression, the weights *are* usually the reciprocal variances.) Cc'ed to r-bugs, hoping that someone with a clear mind will do the right thing... -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45) 35327907 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._