Heather.Turner@warwick.ac.uk
2006-Jan-10 13:30 UTC
[Rd] standardized residuals (rstandard & plot.lm) (PR#8468)
This bug is not quite fixed - the example from my original report now works using R-2.2.1, but plot(Uniform, 6) does not. The bug is due to if (show[6]) { ymx <- max(cook, na.rm =3D TRUE) * 1.025 g <- hatval/(1 - hatval) # Potential division by zero here # plot(g, cook, xlim =3D c(0, max(g)), ylim =3D c(0, ymx),=20 main =3D main, xlab =3D "Leverage", ylab =3D "Cook's distance",=20 xaxt =3D "n", type =3D "n", ...) ... All other values of 'which' seem to work fine. Sorry not to have checked this in the beta version, Heather>>> Prof Brian Ripley <ripley at stats.ox.ac.uk> 12/06/05 04:10pm >>>Curiously, I was just looking at that, since I believe the answer should=20 be NaN, and some optimizing compilers/fast BLASes are not giving that. (There's an example in reg-test-3.R.) So I think we need to return NaN=20 when hat is within rounding error of 1. My take is that plot.lm should handle this: you will see most but not all=20 cases have na.rm=3DTRUE in calculating ylim, but as Inf is theoretically impossible it has not been considered. Note that plot.lm does not use rstandard and so needs a separate fix. Thanks for the report On Tue, 6 Dec 2005 Heather.Turner at warwick.ac.uk wrote:> Full_Name: Heather Turner > Version: 2.2.0 > OS: Windows XP > Submission from: (NULL) (137.205.240.44) > > > Standardized residuals as calculated by rstandard.lm, rstandard.glm and plot.lm > are Inf/NaN rather than zero when the un-standardized residuals are zero. This > causes plot.lm to break when calculating 'ylim' for any of the plots of > standardized residuals. Example: > > "occupationalStatus" <- > structure(as.integer(c(50, 16, 12, 11, 2, 12, 0, 0, 19, 40, 35, > 20, 8, 28, 6, 3, 26, 34, 65, 58, 12, 102, 19, 14, 8, > 18, 66, 110, 23, 162, 40, 32, 7, 11, 35, 40, 25, 90, > 21, 15, 11, 20, 88, 183, 46, 554, 158, 126, 6, 8, > 23, > 64, 28, 230, 143, 91, 2, 3, 21, 32, 12, 177, 71, > 106) > ), .Dim =3D as.integer(c(8, 8)), .Dimnames =3D > structure(list(origin =3D c("1", "2", "3", "4", "5", "6", "7", > "8"), > destination =3D c("1", "2", "3", "4", "5", "6", "7", > "8")), .Names =3D c("origin", "destination")), > class =3D "table") > Diag <- as.factor(diag(1:8)) > Rscore <- scale(as.numeric(row(occupationalStatus)), scale =3D FALSE) > Cscore <- scale(as.numeric(col(occupationalStatus)), scale =3D FALSE) > Uniform <- glm(Freq ~ origin + destination + Diag + > Rscore:Cscore, family =3D poisson, data =3D occupationalStatus) > residuals(Uniform)[as.logical(diag(8))] #zero/near-zero > rstandard(Uniform)[as.logical(diag(8))] #mostly Inf/NaN > plot(Uniform) #breaks on qqnorm plot (or any 'which' > 1) > > This could be fixed by replacing standardized residuals with zero where the hat > value is one, e.g. > rstandard.glm <- function (model, > infl =3D lm.influence(model, do.coef =3D FALSE), > ...) { > res <- infl$wt.res > hat <- infl$hat > ifelse(hat =3D=3D 1, 0, res/sqrt(summary(model)$dispersion * (1 - > infl$hat))) > } > etc. > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel=20 > >--=20 Brian D. Ripley, ripley at stats.ox.ac.uk=20 Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/=20 University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
maechler@stat.math.ethz.ch
2006-Jan-10 14:58 UTC
[Rd] standardized residuals (rstandard & plot.lm) (PR#8468)
>>>>> "Heather" == Heather Turner <Heather.Turner at warwick.ac.uk> >>>>> on Tue, 10 Jan 2006 14:30:23 +0100 (CET) writes:Heather> This bug is not quite fixed - the example from my Heather> original report now = works using R-2.2.1, but Heather> plot(Uniform, 6) Heather> does not. The bug is due ......... g <- hatval/(1 - hatval) # Potential division by zero here plot(g, cook, xlim = c(0, max(g)), ylim = c(0, ymx), .......... Heather> All other values of 'which' seem to work Heather> fine. Sorry not to have checked this in the beta Heather> version, (indeed; that would have been useful) Hmm, it's not clear what *should* be drawn in such a case. Leaving away all the observations with h_ii = 1 seems a particularly bad idea, since these are the ones that you'd definitely should remark. OTOH, for h_ii = 1, the cook distance is 'NaN' (or should that be changed; to "very large" instead ???) and plot number 6 doesn't seem to make any sense to me When 'which = 6' was proposed [ on R-devel as well, last April, http://tolstoy.newcastle.edu.au/R/devel/05/04/0595.html ah, I see, by David Firth, from your place, so, Heather, can you make sure he sees this e-mail ? ] I actually had wondered a bit about it's general usefulness,.. --- But the example is useful anyway; I think that also plot number 5, (R_i vs h_ii) does the wrong thing currently by just leaving away all points for which h_ii = 1. These really should be shown one or the other way! Martin
d.firth@warwick.ac.uk
2006-Jan-11 11:56 UTC
[Rd] standardized residuals (rstandard & plot.lm) (PR#8468)
On 10 Jan, 2006, at 14:58, maechler at stat.math.ethz.ch wrote:>>>>>> "Heather" == Heather Turner <Heather.Turner at warwick.ac.uk> >>>>>> on Tue, 10 Jan 2006 14:30:23 +0100 (CET) writes: > > Heather> This bug is not quite fixed - the example from my > Heather> original report now = works using R-2.2.1, but > > Heather> plot(Uniform, 6) > > Heather> does not. The bug is due > > ......... > g <- hatval/(1 - hatval) # Potential division by zero here > > plot(g, cook, xlim = c(0, max(g)), ylim = c(0, ymx), > .......... > > Heather> All other values of 'which' seem to work > Heather> fine. Sorry not to have checked this in the beta > Heather> version, > > (indeed; that would have been useful) > > > Hmm, it's not clear what *should* be drawn in such a > case. Leaving away all the observations with h_ii = 1 > seems a particularly bad idea, since these are the ones that > you'd definitely should remark. > OTOH, for h_ii = 1, the cook distance is 'NaN' > (or should that be changed; to "very large" instead ???) > and plot number 6 doesn't seem to make any sense to me > > When 'which = 6' was proposed > [ on R-devel as well, last April, > http://tolstoy.newcastle.edu.au/R/devel/05/04/0595.html > ah, I see, by David Firth, from your place, so, Heather, can you > make sure he sees this e-mail ? > ] > I actually had wondered a bit about it's general usefulness,..Yes, I remember that there was some discussion of this last year, and my recollection is that it was mostly luke-warm at best in regard to including this plot. The "h_ii = 1" problem can of course be taken care of by leaving out such points if they can be reliably detected, but I share Martin's unease about this. We should also worry about for example h_ii = (1 - epsilon), with epsilon small, as plotting such a point would effectively make the rest of the graph useless. Maybe it would be safest to remove the which=6 option? David