Heather.Turner@warwick.ac.uk
2006-Jan-10 13:30 UTC
[Rd] standardized residuals (rstandard & plot.lm) (PR#8468)
This bug is not quite fixed - the example from my original report now works
using R-2.2.1, but
plot(Uniform, 6)
does not. The bug is due to
if (show[6]) {
ymx <- max(cook, na.rm =3D TRUE) * 1.025
g <- hatval/(1 - hatval) # Potential division by zero here #
plot(g, cook, xlim =3D c(0, max(g)), ylim =3D c(0, ymx),=20
main =3D main, xlab =3D "Leverage", ylab =3D
"Cook's distance",=20
xaxt =3D "n", type =3D "n", ...)
...
All other values of 'which' seem to work fine. Sorry not to have checked
this in the beta version,
Heather
>>> Prof Brian Ripley <ripley at stats.ox.ac.uk> 12/06/05 04:10pm
>>>
Curiously, I was just looking at that, since I believe the answer should=20
be NaN, and some optimizing compilers/fast BLASes are not giving that.
(There's an example in reg-test-3.R.) So I think we need to return NaN=20
when hat is within rounding error of 1.
My take is that plot.lm should handle this: you will see most but not all=20
cases have na.rm=3DTRUE in calculating ylim, but as Inf is theoretically
impossible it has not been considered.
Note that plot.lm does not use rstandard and so needs a separate fix.
Thanks for the report
On Tue, 6 Dec 2005 Heather.Turner at warwick.ac.uk wrote:
> Full_Name: Heather Turner
> Version: 2.2.0
> OS: Windows XP
> Submission from: (NULL) (137.205.240.44)
>
>
> Standardized residuals as calculated by rstandard.lm, rstandard.glm and
plot.lm
> are Inf/NaN rather than zero when the un-standardized residuals are zero.
This
> causes plot.lm to break when calculating 'ylim' for any of the
plots of
> standardized residuals. Example:
>
> "occupationalStatus" <-
> structure(as.integer(c(50, 16, 12, 11, 2, 12, 0, 0, 19, 40, 35,
> 20, 8, 28, 6, 3, 26, 34, 65, 58, 12, 102, 19, 14,
8,
> 18, 66, 110, 23, 162, 40, 32, 7, 11, 35, 40, 25,
90,
> 21, 15, 11, 20, 88, 183, 46, 554, 158, 126, 6, 8,
> 23,
> 64, 28, 230, 143, 91, 2, 3, 21, 32, 12, 177, 71,
> 106)
> ), .Dim =3D as.integer(c(8, 8)), .Dimnames =3D
> structure(list(origin =3D c("1", "2",
"3", "4", "5", "6", "7",
> "8"),
> destination =3D c("1", "2",
"3", "4", "5", "6", "7",
> "8")), .Names =3D
c("origin", "destination")),
> class =3D "table")
> Diag <- as.factor(diag(1:8))
> Rscore <- scale(as.numeric(row(occupationalStatus)), scale =3D FALSE)
> Cscore <- scale(as.numeric(col(occupationalStatus)), scale =3D FALSE)
> Uniform <- glm(Freq ~ origin + destination + Diag +
> Rscore:Cscore, family =3D poisson, data =3D
occupationalStatus)
> residuals(Uniform)[as.logical(diag(8))] #zero/near-zero
> rstandard(Uniform)[as.logical(diag(8))] #mostly Inf/NaN
> plot(Uniform) #breaks on qqnorm plot (or any 'which' > 1)
>
> This could be fixed by replacing standardized residuals with zero where the
hat
> value is one, e.g.
> rstandard.glm <- function (model,
> infl =3D lm.influence(model, do.coef =3D FALSE),
> ...) {
> res <- infl$wt.res
> hat <- infl$hat
> ifelse(hat =3D=3D 1, 0, res/sqrt(summary(model)$dispersion * (1 -
> infl$hat)))
> }
> etc.
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel=20
>
>
--=20
Brian D. Ripley, ripley at stats.ox.ac.uk=20
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/=20
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
maechler@stat.math.ethz.ch
2006-Jan-10 14:58 UTC
[Rd] standardized residuals (rstandard & plot.lm) (PR#8468)
>>>>> "Heather" == Heather Turner <Heather.Turner at warwick.ac.uk> >>>>> on Tue, 10 Jan 2006 14:30:23 +0100 (CET) writes:Heather> This bug is not quite fixed - the example from my Heather> original report now = works using R-2.2.1, but Heather> plot(Uniform, 6) Heather> does not. The bug is due ......... g <- hatval/(1 - hatval) # Potential division by zero here plot(g, cook, xlim = c(0, max(g)), ylim = c(0, ymx), .......... Heather> All other values of 'which' seem to work Heather> fine. Sorry not to have checked this in the beta Heather> version, (indeed; that would have been useful) Hmm, it's not clear what *should* be drawn in such a case. Leaving away all the observations with h_ii = 1 seems a particularly bad idea, since these are the ones that you'd definitely should remark. OTOH, for h_ii = 1, the cook distance is 'NaN' (or should that be changed; to "very large" instead ???) and plot number 6 doesn't seem to make any sense to me When 'which = 6' was proposed [ on R-devel as well, last April, http://tolstoy.newcastle.edu.au/R/devel/05/04/0595.html ah, I see, by David Firth, from your place, so, Heather, can you make sure he sees this e-mail ? ] I actually had wondered a bit about it's general usefulness,.. --- But the example is useful anyway; I think that also plot number 5, (R_i vs h_ii) does the wrong thing currently by just leaving away all points for which h_ii = 1. These really should be shown one or the other way! Martin
d.firth@warwick.ac.uk
2006-Jan-11 11:56 UTC
[Rd] standardized residuals (rstandard & plot.lm) (PR#8468)
On 10 Jan, 2006, at 14:58, maechler at stat.math.ethz.ch wrote:>>>>>> "Heather" == Heather Turner <Heather.Turner at warwick.ac.uk> >>>>>> on Tue, 10 Jan 2006 14:30:23 +0100 (CET) writes: > > Heather> This bug is not quite fixed - the example from my > Heather> original report now = works using R-2.2.1, but > > Heather> plot(Uniform, 6) > > Heather> does not. The bug is due > > ......... > g <- hatval/(1 - hatval) # Potential division by zero here > > plot(g, cook, xlim = c(0, max(g)), ylim = c(0, ymx), > .......... > > Heather> All other values of 'which' seem to work > Heather> fine. Sorry not to have checked this in the beta > Heather> version, > > (indeed; that would have been useful) > > > Hmm, it's not clear what *should* be drawn in such a > case. Leaving away all the observations with h_ii = 1 > seems a particularly bad idea, since these are the ones that > you'd definitely should remark. > OTOH, for h_ii = 1, the cook distance is 'NaN' > (or should that be changed; to "very large" instead ???) > and plot number 6 doesn't seem to make any sense to me > > When 'which = 6' was proposed > [ on R-devel as well, last April, > http://tolstoy.newcastle.edu.au/R/devel/05/04/0595.html > ah, I see, by David Firth, from your place, so, Heather, can you > make sure he sees this e-mail ? > ] > I actually had wondered a bit about it's general usefulness,..Yes, I remember that there was some discussion of this last year, and my recollection is that it was mostly luke-warm at best in regard to including this plot. The "h_ii = 1" problem can of course be taken care of by leaving out such points if they can be reliably detected, but I share Martin's unease about this. We should also worry about for example h_ii = (1 - epsilon), with epsilon small, as plotting such a point would effectively make the rest of the graph useless. Maybe it would be safest to remove the which=6 option? David