thr3ads.net - R devel - [Rd] standardized residuals (rstandard & plot.lm) (PR#8468) [Jan 2006]

If this information is useful, please help other people find it:
Share via:

Heather.Turner@warwick.ac.uk

2006-Jan-10 13:30 UTC

[Rd] standardized residuals (rstandard & plot.lm) (PR#8468)

This bug is not quite fixed - the example from my original report now works
using R-2.2.1, but

plot(Uniform, 6)

does not. The bug is due to
 if (show[6]) {
        ymx <- max(cook, na.rm =3D TRUE) * 1.025
        g <- hatval/(1 - hatval) # Potential division by zero here #
        plot(g, cook, xlim =3D c(0, max(g)), ylim =3D c(0, ymx),=20
            main =3D main, xlab =3D "Leverage", ylab =3D
"Cook's distance",=20
            xaxt =3D "n", type =3D "n", ...)
...

All other values of 'which' seem to work fine. Sorry not to have checked
this in the beta version,

Heather
>>> Prof Brian Ripley <ripley at stats.ox.ac.uk> 12/06/05 04:10pm
>>>Curiously, I was just looking at that, since I believe the answer should=20
be NaN, and some optimizing compilers/fast BLASes are not giving that.
(There's an example in reg-test-3.R.)  So I think we need to return NaN=20
when hat is within rounding error of 1.

My take is that plot.lm should handle this: you will see most but not all=20
cases have na.rm=3DTRUE in calculating ylim, but as Inf is theoretically
impossible it has not been considered.

Note that plot.lm does not use rstandard and so needs a separate fix.

Thanks for the report

On Tue, 6 Dec 2005 Heather.Turner at warwick.ac.uk wrote:
> Full_Name: Heather Turner
> Version: 2.2.0
> OS: Windows XP
> Submission from: (NULL) (137.205.240.44)
>
>
> Standardized residuals as calculated by rstandard.lm, rstandard.glm and
plot.lm
> are Inf/NaN rather than zero when the un-standardized residuals are zero.
This
> causes plot.lm to break when calculating 'ylim' for any of the
plots of
> standardized residuals. Example:
>
> "occupationalStatus" <-
>    structure(as.integer(c(50, 16, 12, 11, 2, 12, 0, 0, 19, 40, 35,
>                           20, 8, 28, 6, 3, 26, 34, 65, 58, 12, 102, 19, 14,
8,
>                           18, 66, 110, 23, 162, 40, 32, 7, 11, 35, 40, 25,
90,
>                           21, 15, 11, 20, 88, 183, 46, 554, 158, 126, 6, 8,
> 23,
>                           64, 28, 230, 143, 91, 2, 3, 21, 32, 12, 177, 71,
> 106)
>                         ), .Dim =3D as.integer(c(8, 8)), .Dimnames =3D
>              structure(list(origin =3D c("1", "2",
"3", "4", "5", "6", "7",
> "8"),
>                             destination =3D c("1", "2",
"3", "4", "5", "6", "7",
>                             "8")), .Names =3D
c("origin", "destination")),
>              class =3D "table")
> Diag <- as.factor(diag(1:8))
> Rscore <- scale(as.numeric(row(occupationalStatus)), scale =3D FALSE)
> Cscore <- scale(as.numeric(col(occupationalStatus)), scale =3D FALSE)
> Uniform <- glm(Freq ~ origin + destination + Diag +
>               Rscore:Cscore, family =3D poisson, data =3D
occupationalStatus)
> residuals(Uniform)[as.logical(diag(8))] #zero/near-zero
> rstandard(Uniform)[as.logical(diag(8))] #mostly Inf/NaN
> plot(Uniform) #breaks on qqnorm plot (or any 'which' > 1)
>
> This could be fixed by replacing standardized residuals with zero where the
hat
> value is one, e.g.
> rstandard.glm <- function (model,
>                            infl =3D lm.influence(model, do.coef =3D FALSE),
>                            ...) {
>     res <- infl$wt.res
>     hat <- infl$hat
>     ifelse(hat =3D=3D 1, 0, res/sqrt(summary(model)$dispersion * (1 -
> infl$hat)))
> }
> etc.
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel=20
>
>
--=20
Brian D. Ripley,                  ripley at stats.ox.ac.uk=20
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/=20
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

maechler@stat.math.ethz.ch

2006-Jan-10 14:58 UTC

head link

[Rd] standardized residuals (rstandard & plot.lm) (PR#8468)

>>>>> "Heather" == Heather Turner <Heather.Turner at
warwick.ac.uk>
>>>>>     on Tue, 10 Jan 2006 14:30:23 +0100 (CET) writes:
    Heather> This bug is not quite fixed - the example from my
    Heather> original report now = works using R-2.2.1, but

    Heather> plot(Uniform, 6)

    Heather> does not. The bug is due

    .........
     g <- hatval/(1 - hatval) # Potential division by zero here 

     plot(g, cook, xlim = c(0, max(g)), ylim = c(0, ymx),
     ..........

    Heather> All other values of 'which' seem to work
    Heather> fine. Sorry not to have checked this in the beta
    Heather> version,

(indeed; that would have been useful)

Hmm, it's not clear what *should* be drawn in such a
case. Leaving away all the observations with  h_ii = 1
seems a particularly bad idea, since these are the ones that
you'd definitely should remark.
OTOH, for h_ii = 1, the cook distance is 'NaN' 
(or should that be changed; to  "very large" instead ???)
and plot number 6 doesn't seem to make any sense to me

When 'which = 6' was proposed 
[ on R-devel as well, last April,
  http://tolstoy.newcastle.edu.au/R/devel/05/04/0595.html
  ah, I see, by David Firth, from your place, so, Heather, can you
  make sure he sees this e-mail ?
]
I actually had wondered a bit about it's general usefulness,..

---

But the example is useful anyway; I think that also plot
number 5, (R_i vs h_ii) does the wrong thing currently by just
leaving away all points for which h_ii = 1.
These really should be shown one or the other way!

Martin

d.firth@warwick.ac.uk

2006-Jan-11 11:56 UTC

head link

[Rd] standardized residuals (rstandard & plot.lm) (PR#8468)

On 10 Jan, 2006, at 14:58, maechler at stat.math.ethz.ch wrote:
>>>>>> "Heather" == Heather Turner
<Heather.Turner at warwick.ac.uk>
>>>>>>     on Tue, 10 Jan 2006 14:30:23 +0100 (CET) writes:
>
>     Heather> This bug is not quite fixed - the example from my
>     Heather> original report now = works using R-2.2.1, but
>
>     Heather> plot(Uniform, 6)
>
>     Heather> does not. The bug is due
>
>     .........
>      g <- hatval/(1 - hatval) # Potential division by zero here
>
>      plot(g, cook, xlim = c(0, max(g)), ylim = c(0, ymx),
>      ..........
>
>     Heather> All other values of 'which' seem to work
>     Heather> fine. Sorry not to have checked this in the beta
>     Heather> version,
>
> (indeed; that would have been useful)
>
>
> Hmm, it's not clear what *should* be drawn in such a
> case. Leaving away all the observations with  h_ii = 1
> seems a particularly bad idea, since these are the ones that
> you'd definitely should remark.
> OTOH, for h_ii = 1, the cook distance is 'NaN'
> (or should that be changed; to  "very large" instead ???)
> and plot number 6 doesn't seem to make any sense to me
>
> When 'which = 6' was proposed
> [ on R-devel as well, last April,
>   http://tolstoy.newcastle.edu.au/R/devel/05/04/0595.html
>   ah, I see, by David Firth, from your place, so, Heather, can you
>   make sure he sees this e-mail ?
> ]
> I actually had wondered a bit about it's general usefulness,..
Yes, I remember that there was some discussion of this last year, and 
my recollection is that it was mostly luke-warm at best in regard to 
including this plot.

The "h_ii = 1" problem can of course be taken care of by leaving out 
such points if they can be reliably detected, but I share Martin's 
unease about this.  We should also worry about for example h_ii = (1 - 
epsilon), with epsilon small, as plotting such a point would 
effectively make the rest of the graph useless.

Maybe it would be safest to remove the which=6 option?

David

Reasonably Related Threads

Search for more maybe matching threads

R devel - Jan 2006 - standardized residuals (rstandard & plot.lm) (PR#8468)

[Rd] standardized residuals (rstandard & plot.lm) (PR#8468)

[Rd] standardized residuals (rstandard & plot.lm) (PR#8468)

[Rd] standardized residuals (rstandard & plot.lm) (PR#8468)

Reasonably Related Threads