thr3ads.net - R devel - [Rd] Inaccuracy in DFBETA calculation for GLMs [Oct 2025]

If this information is useful, please help other people find it:
Share via:

Martin Maechler

2025-Oct-09 08:55 UTC

[Rd] Inaccuracy in DFBETA calculation for GLMs

>>>>> Ravi Varadhan via R-devel 
>>>>>     on Sat, 4 Oct 2025 13:34:48 +0000 writes:
    > Hi,
    > I have been calculating sensitivity diagnostics in GLMs.  I am noticing
that the dfbeta() and influence() functions in base R are inaccurate for
non-Gaussian GLMs.  Even though the help says that the DFBETAs can be inaccurate
for GLMs, the accuracy can be substantially improved.

    > I was thinking of writing this up along with a proper fix to R Journal
but then started wondering whether this is a well-known issue and it has been
addressed in other packages.

    > Has the inaccuracy of DFBETA been addressed already?

    > Thank you,
    > Ravi

As nobody has replied till now:  No, I haven't heard yet about
such properties and even less that and how they can be
substantially improved (I assume you have "searched the net" for
that).
I agree that this would probably be a nice R journal paper when
accompanied with both math and code.

A subjective remark: Being statistically educated from ETH
Zurich and similar places (UW Seattle, Bellcore): I've been
convinced that such "leave-one-out" diagnostics are not
providing "true robustness" (against violiation of error
distribution assumptions etc), but one should rather use M- (and
MM-)estimation approaches providing a guaranteed breakdown point
above 2/n (or so, which I think is what you get with such
L.o.o. diagnostics: just look at the effect of one huge outlier
masking a large one).

For that reason, I would not want to substantially blow up our
base R code underlying DFBETA (which then has to be kept maintained
into "all" future),  but then I'm only speaking for myself and
not all of R core (and even less all of R using statisticians).

Martin

Ravi Varadhan

2025-Oct-15 13:33 UTC

head link

[Rd] Inaccuracy in DFBETA calculation for GLMs

Thank you, Martin.

I agree with the subjective remark.  But that's a different conversation!

The fix is quite easy. The difference mainly stems from the fact that R uses
"deviance" residuals instead of "working" residuals.

I will proceed as per your advice.

Thanks & Best regards,
Ravi

________________________________
From: R-devel <r-devel-bounces at r-project.org> on behalf of Martin
Maechler <maechler at stat.math.ethz.ch>
Sent: Thursday, October 9, 2025 04:55
To: Ravi Varadhan <ravi.varadhan at jhu.edu>
Cc: R Development List <R-devel at r-project.org>
Subject: Re: [Rd] Inaccuracy in DFBETA calculation for GLMs

      External Email - Use Caution

>>>>> Ravi Varadhan via R-devel
>>>>>     on Sat, 4 Oct 2025 13:34:48 +0000 writes:
    > Hi,
    > I have been calculating sensitivity diagnostics in GLMs.  I am noticing
that the dfbeta() and influence() functions in base R are inaccurate for
non-Gaussian GLMs.  Even though the help says that the DFBETAs can be inaccurate
for GLMs, the accuracy can be substantially improved.

    > I was thinking of writing this up along with a proper fix to R Journal
but then started wondering whether this is a well-known issue and it has been
addressed in other packages.

    > Has the inaccuracy of DFBETA been addressed already?

    > Thank you,
    > Ravi

As nobody has replied till now:  No, I haven't heard yet about
such properties and even less that and how they can be
substantially improved (I assume you have "searched the net" for
that).
I agree that this would probably be a nice R journal paper when
accompanied with both math and code.

A subjective remark: Being statistically educated from ETH
Zurich and similar places (UW Seattle, Bellcore): I've been
convinced that such "leave-one-out" diagnostics are not
providing "true robustness" (against violiation of error
distribution assumptions etc), but one should rather use M- (and
MM-)estimation approaches providing a guaranteed breakdown point
above 2/n (or so, which I think is what you get with such
L.o.o. diagnostics: just look at the effect of one huge outlier
masking a large one).

For that reason, I would not want to substantially blow up our
base R code underlying DFBETA (which then has to be kept maintained
into "all" future),  but then I'm only speaking for myself and
not all of R core (and even less all of R using statisticians).

Martin

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel<https://stat.ethz.ch/mailman/listinfo/r-devel>

	[[alternative HTML version deleted]]

R devel - Oct 2025 - Inaccuracy in DFBETA calculation for GLMs

[Rd] Inaccuracy in DFBETA calculation for GLMs

[Rd] Inaccuracy in DFBETA calculation for GLMs