thr3ads.net - R help - [R] Potential Issue with lm.influence [Apr 2019]

If this information is useful, please help other people find it:
Share via:

Eric Bridgeford

2019-Apr-02 20:21 UTC

[R] Fwd: Potential Issue with lm.influence

How can I add attachments? The following two files were attached in the
initial message

On Tue, Apr 2, 2019 at 3:34 PM Bert Gunter <bgunter.4567 at gmail.com>
wrote:
> Nothing was attached. The r-help server strips most attachments. Include
> your code inline.
>
> Also note that
>
> > 0/0
> [1] NaN
>
> so maybe something like that occurs in the course of your calculations.
> But that's just a guess, so feel free to disregard.
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip
)
>
>
> On Tue, Apr 2, 2019 at 11:32 AM Eric Bridgeford <ericwb95 at
gmail.com>
> wrote:
>
>> Hi R core team,
>>
>> I experienced the following issue with the attached data/code snippet,
>> where the studentized residual for a single observation appears to be
NaN
>> given finite predictors/responses, which appears to be driven by the
>> glm.influence method in the stats package. I am curious to whether this
is
>> a consequence of the specific implementation used for computing the
>> influence, which it would appear is the driving force for the NaN
>> influence
>> for the point, that I was ultimately able to trace back through the
>> lm.influence method to this specific line
>> <
>>
https://github.com/SurajGupta/r-source/blob/a28e609e72ed7c47f6ddfbb86c85279a0750f0b7/src/library/stats/R/lm.influence.R#L67
>> >
>> which
>> calls C code which calls iminfl.f
>> <
>>
https://github.com/SurajGupta/r-source/blob/master/src/library/stats/src/lminfl.f
>> >
>> (I
>> don't know fortran so I can't debug further). My understanding
is that the
>> specific issue would have to do with the leave-one-out variance
estimate
>> associated with this particular point, which it seems based on my
>> understanding should be finite given finite predictors/responses. Let
me
>> know. Thanks!
>>
>> Sincerely,
>>
>> --
>> Eric Bridgeford
>> ericwb.me
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
-- 
Eric Bridgeford
ericwb.me

Bert Gunter

2019-Apr-02 20:29 UTC

head link

[R] Fwd: Potential Issue with lm.influence

I told you already: **Include code inline **

See ?dput for how to include a text version of objects, such as data
frames, inline.

Otherwise, I believe .txt text files are not stripped if you insist on
*attaching* data or code. Others may have better advice.


Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Apr 2, 2019 at 1:21 PM Eric Bridgeford <ericwb95 at gmail.com>
wrote:
> How can I add attachments? The following two files were attached in the
> initial message
>
> On Tue, Apr 2, 2019 at 3:34 PM Bert Gunter <bgunter.4567 at
gmail.com> wrote:
>
>> Nothing was attached. The r-help server strips most attachments.
Include
>> your code inline.
>>
>> Also note that
>>
>> > 0/0
>> [1] NaN
>>
>> so maybe something like that occurs in the course of your calculations.
>> But that's just a guess, so feel free to disregard.
>>
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming
along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic
strip )
>>
>>
>> On Tue, Apr 2, 2019 at 11:32 AM Eric Bridgeford <ericwb95 at
gmail.com>
>> wrote:
>>
>>> Hi R core team,
>>>
>>> I experienced the following issue with the attached data/code
snippet,
>>> where the studentized residual for a single observation appears to
be NaN
>>> given finite predictors/responses, which appears to be driven by
the
>>> glm.influence method in the stats package. I am curious to whether
this
>>> is
>>> a consequence of the specific implementation used for computing the
>>> influence, which it would appear is the driving force for the NaN
>>> influence
>>> for the point, that I was ultimately able to trace back through the
>>> lm.influence method to this specific line
>>> <
>>>
https://github.com/SurajGupta/r-source/blob/a28e609e72ed7c47f6ddfbb86c85279a0750f0b7/src/library/stats/R/lm.influence.R#L67
>>> >
>>> which
>>> calls C code which calls iminfl.f
>>> <
>>>
https://github.com/SurajGupta/r-source/blob/master/src/library/stats/src/lminfl.f
>>> >
>>> (I
>>> don't know fortran so I can't debug further). My
understanding is that
>>> the
>>> specific issue would have to do with the leave-one-out variance
estimate
>>> associated with this particular point, which it seems based on my
>>> understanding should be finite given finite predictors/responses.
Let me
>>> know. Thanks!
>>>
>>> Sincerely,
>>>
>>> --
>>> Eric Bridgeford
>>> ericwb.me
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>
> --
> Eric Bridgeford
> ericwb.me
>
	[[alternative HTML version deleted]]

Bert Gunter

2019-Apr-02 20:38 UTC

head link

[R] Fwd: Potential Issue with lm.influence

Also, I suggest you read ?influence which may explain the source of your
NaN's .

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Apr 2, 2019 at 1:29 PM Bert Gunter <bgunter.4567 at gmail.com>
wrote:
> I told you already: **Include code inline **
>
> See ?dput for how to include a text version of objects, such as data
> frames, inline.
>
> Otherwise, I believe .txt text files are not stripped if you insist on
> *attaching* data or code. Others may have better advice.
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip
)
>
>
> On Tue, Apr 2, 2019 at 1:21 PM Eric Bridgeford <ericwb95 at
gmail.com> wrote:
>
>> How can I add attachments? The following two files were attached in the
>> initial message
>>
>> On Tue, Apr 2, 2019 at 3:34 PM Bert Gunter <bgunter.4567 at
gmail.com>
>> wrote:
>>
>>> Nothing was attached. The r-help server strips most attachments.
Include
>>> your code inline.
>>>
>>> Also note that
>>>
>>> > 0/0
>>> [1] NaN
>>>
>>> so maybe something like that occurs in the course of your
calculations.
>>> But that's just a guess, so feel free to disregard.
>>>
>>>
>>> Bert Gunter
>>>
>>> "The trouble with having an open mind is that people keep
coming along
>>> and sticking things into it."
>>> -- Opus (aka Berkeley Breathed in his "Bloom County"
comic strip )
>>>
>>>
>>> On Tue, Apr 2, 2019 at 11:32 AM Eric Bridgeford <ericwb95 at
gmail.com>
>>> wrote:
>>>
>>>> Hi R core team,
>>>>
>>>> I experienced the following issue with the attached data/code
snippet,
>>>> where the studentized residual for a single observation appears
to be
>>>> NaN
>>>> given finite predictors/responses, which appears to be driven
by the
>>>> glm.influence method in the stats package. I am curious to
whether this
>>>> is
>>>> a consequence of the specific implementation used for computing
the
>>>> influence, which it would appear is the driving force for the
NaN
>>>> influence
>>>> for the point, that I was ultimately able to trace back through
the
>>>> lm.influence method to this specific line
>>>> <
>>>>
https://github.com/SurajGupta/r-source/blob/a28e609e72ed7c47f6ddfbb86c85279a0750f0b7/src/library/stats/R/lm.influence.R#L67
>>>> >
>>>> which
>>>> calls C code which calls iminfl.f
>>>> <
>>>>
https://github.com/SurajGupta/r-source/blob/master/src/library/stats/src/lminfl.f
>>>> >
>>>> (I
>>>> don't know fortran so I can't debug further). My
understanding is that
>>>> the
>>>> specific issue would have to do with the leave-one-out variance
estimate
>>>> associated with this particular point, which it seems based on
my
>>>> understanding should be finite given finite
predictors/responses. Let me
>>>> know. Thanks!
>>>>
>>>> Sincerely,
>>>>
>>>> --
>>>> Eric Bridgeford
>>>> ericwb.me
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible
code.
>>>>
>>>
>>
>> --
>> Eric Bridgeford
>> ericwb.me
>>
>
	[[alternative HTML version deleted]]

Fox, John

2019-Apr-02 23:53 UTC

head link

[R] Fwd: Potential Issue with lm.influence

Dear Eric,

Have you looked at your data? -- for example:

	plot(log(Moons) ~ Volume, data = moon_data)
	text(log(Moons) ~ Volume, data = moon_data, labels=Name, adj=1, subset = Volume
> 400)

The negative-binomial model doesn't look reasonable, does it?

After you eliminate Jupiter there's one very high leverage point left,
Saturn. Computing studentized residuals entails an approximation to deleting
that as well from the model, so try fitting

	fit3 <- update(fit, subset = !(Name %in% c("Jupiter ",
"Saturn ")))
	summary(fit3)

which runs into numeric difficulties.

Then look at:

	plot(log(Moons) ~ Volume, data = moon_data, subset = Volume < 400)

Finally, try

	plot(log(Moons) ~ log(Volume), data = moon_data)
	fit4 <- update(fit2, . ~ log(Volume))
	rstudent(fit4)

I hope this helps,
 John

-----------------------------------------------------------------
John Fox
Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
Web: https://socialsciences.mcmaster.ca/jfox/



> -----Original Message-----
> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Eric
> Bridgeford
> Sent: Tuesday, April 2, 2019 5:01 PM
> To: Bert Gunter <bgunter.4567 at gmail.com>
> Cc: R-help <r-help at r-project.org>
> Subject: Re: [R] Fwd: Potential Issue with lm.influence
> 
> I agree the influence documentation suggests NaNs may result; however, as
> these can be manually computed and are, indeed, finite/existing (ie,
> computing the held-out influence by manually training n models for n points
> to obtain n leave one out influence measures), I don't possibly see how
the
> function SHOULD return NaN, and given that it is returning NaN, that
> suggests to me that there should be either a) Providing an alternative
> method to compute them that (may be slower) that returns the correct
> results in the even that lm.influence does not return a good approximation
> (ie, a command line argument for type="approx" that does the
> approximation strategy employed currently, or an alternative
type="direct"
> or something like that that computes them manually), or b) a heuristic to
> suggest why NaNs might result from one's particular inputs/what can be
> done to fix it (if the approximation strategy is the source of the problem)
or
> what the issue is with the data that will cause NaNs. Hence I was looking
to
> start a discussion around the specific strategy employed to compute the
> elements.
> 
> Below is the code:
> moon_data <- structure(list(Name = structure(c(8L, 13L, 2L, 7L, 1L, 5L,
11L,
>                                                12L, 9L, 10L, 4L, 6L, 3L),
.Label = c("Ceres ", "Earth",
> "Eris ",
> 
>          "Haumea ", "Jupiter ", "Makemake ",
"Mars ", "Mercury ", "Neptune ",
> 
>          "Pluto ", "Saturn ", "Uranus ",
"Venus "), class = "factor"),
>                             Distance = c(0.39, 0.72, 1, 1.52, 2.75, 5.2,
9.54, 19.22,
>                                          30.06, 39.5, 43.35, 45.8, 67.7),
Diameter = c(0.382, 0.949,
> 
>            1, 0.532, 0.08, 11.209, 9.449, 4.007, 3.883, 0.18, 0.15,
> 
>            0.12, 0.19), Mass = c(0.06, 0.82, 1, 0.11, 2e-04, 317.8,
> 
>                                  95.2, 14.6, 17.2, 0.0022, 7e-04, 7e-04,
0.0025), Moons = c(0L,
> 
> 
>                 0L, 1L, 2L, 0L, 64L, 62L, 27L, 13L, 4L, 2L, 0L, 1L), Volume
> c(0.0291869497930152,
> 
> 
> 
>     0.447504348276571, 0.523598775598299, 0.0788376225681443,
> 
> 
> 
>     0.000268082573106329, 737.393372232996, 441.729261571372,
> 
> 
> 
>     33.6865588825666, 30.6549628355953, 0.00305362805928928,
> 
> 
> 
>     0.00176714586764426, 0.00090477868423386, 0.00359136400182873
> 
> 
>                 )), row.names = c(NA, -13L), class =
"data.frame")
> 
> fit <- glm.nb(Moons ~ Volume, data = moon_data)
> rstudent(fit)
> 
> fit2 <- update(fit, subset = Name != "Jupiter ")
> rstudent(fit2)
> 
> influence(fit2)$sigma
> 
> #        1        2        3        4        5        7        8        9
>      10       11       12       13
> # 1.077945 1.077813 1.165025 1.181685 1.077954      NaN 1.044454 1.152110
> 1.187586 1.181696 1.077954 1.165147
> 
> Sincerely,
> Eric
> 
> On Tue, Apr 2, 2019 at 4:38 PM Bert Gunter <bgunter.4567 at
gmail.com>
> wrote:
> 
> > Also, I suggest you read ?influence which may explain the source of
> > your NaN's .
> >
> > Bert Gunter
> >
> > "The trouble with having an open mind is that people keep coming
along
> > and sticking things into it."
> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic
strip )
> >
> >
> > On Tue, Apr 2, 2019 at 1:29 PM Bert Gunter <bgunter.4567 at
gmail.com>
> wrote:
> >
> >> I told you already: **Include code inline **
> >>
> >> See ?dput for how to include a text version of objects, such as
data
> >> frames, inline.
> >>
> >> Otherwise, I believe .txt text files are not stripped if you
insist
> >> on
> >> *attaching* data or code. Others may have better advice.
> >>
> >>
> >> Bert Gunter
> >>
> >> "The trouble with having an open mind is that people keep
coming
> >> along and sticking things into it."
> >> -- Opus (aka Berkeley Breathed in his "Bloom County"
comic strip )
> >>
> >>
> >> On Tue, Apr 2, 2019 at 1:21 PM Eric Bridgeford <ericwb95 at
gmail.com>
> >> wrote:
> >>
> >>> How can I add attachments? The following two files were
attached in
> >>> the initial message
> >>>
> >>> On Tue, Apr 2, 2019 at 3:34 PM Bert Gunter <bgunter.4567 at
gmail.com>
> >>> wrote:
> >>>
> >>>> Nothing was attached. The r-help server strips most
attachments.
> >>>> Include your code inline.
> >>>>
> >>>> Also note that
> >>>>
> >>>> > 0/0
> >>>> [1] NaN
> >>>>
> >>>> so maybe something like that occurs in the course of your
calculations.
> >>>> But that's just a guess, so feel free to disregard.
> >>>>
> >>>>
> >>>> Bert Gunter
> >>>>
> >>>> "The trouble with having an open mind is that people
keep coming
> >>>> along and sticking things into it."
> >>>> -- Opus (aka Berkeley Breathed in his "Bloom
County" comic strip )
> >>>>
> >>>>
> >>>> On Tue, Apr 2, 2019 at 11:32 AM Eric Bridgeford
> >>>> <ericwb95 at gmail.com>
> >>>> wrote:
> >>>>
> >>>>> Hi R core team,
> >>>>>
> >>>>> I experienced the following issue with the attached
data/code
> >>>>> snippet, where the studentized residual for a single
observation
> >>>>> appears to be NaN given finite predictors/responses,
which appears
> >>>>> to be driven by the glm.influence method in the stats
package. I
> >>>>> am curious to whether this is a consequence of the
specific
> >>>>> implementation used for computing the influence, which
it would
> >>>>> appear is the driving force for the NaN influence for
the point,
> >>>>> that I was ultimately able to trace back through the
lm.influence
> >>>>> method to this specific line <
> >>>>> https://github.com/SurajGupta/r-
> source/blob/a28e609e72ed7c47f6ddfb
> >>>>>
b86c85279a0750f0b7/src/library/stats/R/lm.influence.R#L67
> >>>>> >
> >>>>> which
> >>>>> calls C code which calls iminfl.f
> >>>>> <
> >>>>>
https://github.com/SurajGupta/r-source/blob/master/src/library/sta
> >>>>> ts/src/lminfl.f
> >>>>> >
> >>>>> (I
> >>>>> don't know fortran so I can't debug further).
My understanding is
> >>>>> that the specific issue would have to do with the
leave-one-out
> >>>>> variance estimate associated with this particular
point, which it
> >>>>> seems based on my understanding should be finite given
finite
> >>>>> predictors/responses. Let me know. Thanks!
> >>>>>
> >>>>> Sincerely,
> >>>>>
> >>>>> --
> >>>>> Eric Bridgeford
> >>>>> ericwb.me
> >>>>> ______________________________________________
> >>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE
and more, see
> >>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>> PLEASE do read the posting guide
> >>>>> http://www.R-project.org/posting-guide.html
> >>>>> and provide commented, minimal, self-contained,
reproducible code.
> >>>>>
> >>>>
> >>>
> >>> --
> >>> Eric Bridgeford
> >>> ericwb.me
> >>>
> >>
> 
> --
> Eric Bridgeford
> ericwb.me
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

Eric Bridgeford

2019-Apr-03 01:08 UTC

head link

[R] Fwd: Potential Issue with lm.influence

Hey John,

I am aware they are high leverage points, and that the model is not the
best for them. The purpose of this dataset was to explore high leverage
points, and diagnostic statistics through which one would identify them.

What I am saying is that the current behavior of the function seems a
little non-specific to me; the influence for this problem is
finite/computable manually by fitting n models to n-1 points (manually
holding out each point individually to obtain the loo-variance, and
computing the influence in the non-approximate way).

I am just suggesting that it seems the function could be improved by, say,
throwing specific warnings when NaNs may arise. Ie, "Your have points that
are very high leverage. The approximation technique is not numerically
stable for these points and the results should be used with caution"
etc...; I am sure there are other also pre-hoc approaches to diagnose other
ways in which this function could fail). The approximation technique not
behaving well for points that are ultra high leverage just seems peculiar
that that would return an NaN with no other recommendations/advice/specific
warnings, especially since the influence is frequently used to diagnosing
this specific issue.

Alternatively, one could afford an optional argument type="manual"
that
computes the held-out variance manually rather than the approximate
fashion, and add a comment to use this in the help menu when you have high
leverage points (this is what I ended up doing to obtain the true influence
and the externally studentized residual).

 I just think some more specificity could be of use for future users, to
make the R:stats community even better :) Does that make sense?

Sincerely,
Eric

On Tue, Apr 2, 2019 at 7:53 PM Fox, John <jfox at mcmaster.ca> wrote:
> Dear Eric,
>
> Have you looked at your data? -- for example:
>
>         plot(log(Moons) ~ Volume, data = moon_data)
>         text(log(Moons) ~ Volume, data = moon_data, labels=Name, adj=1,
> subset = Volume > 400)
>
> The negative-binomial model doesn't look reasonable, does it?
>
> After you eliminate Jupiter there's one very high leverage point left,
> Saturn. Computing studentized residuals entails an approximation to
> deleting that as well from the model, so try fitting
>
>         fit3 <- update(fit, subset = !(Name %in% c("Jupiter ",
"Saturn ")))
>         summary(fit3)
>
> which runs into numeric difficulties.
>
> Then look at:
>
>         plot(log(Moons) ~ Volume, data = moon_data, subset = Volume <
400)
>
> Finally, try
>
>         plot(log(Moons) ~ log(Volume), data = moon_data)
>         fit4 <- update(fit2, . ~ log(Volume))
>         rstudent(fit4)
>
> I hope this helps,
>  John
>
> -----------------------------------------------------------------
> John Fox
> Professor Emeritus
> McMaster University
> Hamilton, Ontario, Canada
> Web: https://socialsciences.mcmaster.ca/jfox/
>
>
>
>
> > -----Original Message-----
> > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of
Eric
> > Bridgeford
> > Sent: Tuesday, April 2, 2019 5:01 PM
> > To: Bert Gunter <bgunter.4567 at gmail.com>
> > Cc: R-help <r-help at r-project.org>
> > Subject: Re: [R] Fwd: Potential Issue with lm.influence
> >
> > I agree the influence documentation suggests NaNs may result; however,
as
> > these can be manually computed and are, indeed, finite/existing (ie,
> > computing the held-out influence by manually training n models for n
> points
> > to obtain n leave one out influence measures), I don't possibly
see how
> the
> > function SHOULD return NaN, and given that it is returning NaN, that
> > suggests to me that there should be either a) Providing an alternative
> > method to compute them that (may be slower) that returns the correct
> > results in the even that lm.influence does not return a good
> approximation
> > (ie, a command line argument for type="approx" that does the
> > approximation strategy employed currently, or an alternative
> type="direct"
> > or something like that that computes them manually), or b) a heuristic
to
> > suggest why NaNs might result from one's particular inputs/what
can be
> > done to fix it (if the approximation strategy is the source of the
> problem) or
> > what the issue is with the data that will cause NaNs. Hence I was
> looking to
> > start a discussion around the specific strategy employed to compute
the
> > elements.
> >
> > Below is the code:
> > moon_data <- structure(list(Name = structure(c(8L, 13L, 2L, 7L, 1L,
5L,
> 11L,
> >                                                12L, 9L, 10L, 4L, 6L,
> 3L), .Label = c("Ceres ", "Earth",
> > "Eris ",
> >
> >          "Haumea ", "Jupiter ", "Makemake
", "Mars ", "Mercury ",
> "Neptune ",
> >
> >          "Pluto ", "Saturn ", "Uranus ",
"Venus "), class = "factor"),
> >                             Distance = c(0.39, 0.72, 1, 1.52, 2.75,
5.2,
> 9.54, 19.22,
> >                                          30.06, 39.5, 43.35, 45.8,
> 67.7), Diameter = c(0.382, 0.949,
> >
> >            1, 0.532, 0.08, 11.209, 9.449, 4.007, 3.883, 0.18, 0.15,
> >
> >            0.12, 0.19), Mass = c(0.06, 0.82, 1, 0.11, 2e-04, 317.8,
> >
> >                                  95.2, 14.6, 17.2, 0.0022, 7e-04,
7e-04,
> 0.0025), Moons = c(0L,
> >
> >
> >                 0L, 1L, 2L, 0L, 64L, 62L, 27L, 13L, 4L, 2L, 0L, 1L),
> Volume > > c(0.0291869497930152,
> >
> >
> >
> >     0.447504348276571, 0.523598775598299, 0.0788376225681443,
> >
> >
> >
> >     0.000268082573106329, 737.393372232996, 441.729261571372,
> >
> >
> >
> >     33.6865588825666, 30.6549628355953, 0.00305362805928928,
> >
> >
> >
> >     0.00176714586764426, 0.00090477868423386, 0.00359136400182873
> >
> >
> >                 )), row.names = c(NA, -13L), class =
"data.frame")
> >
> > fit <- glm.nb(Moons ~ Volume, data = moon_data)
> > rstudent(fit)
> >
> > fit2 <- update(fit, subset = Name != "Jupiter ")
> > rstudent(fit2)
> >
> > influence(fit2)$sigma
> >
> > #        1        2        3        4        5        7        8      
9
> >      10       11       12       13
> > # 1.077945 1.077813 1.165025 1.181685 1.077954      NaN 1.044454
1.152110
> > 1.187586 1.181696 1.077954 1.165147
> >
> > Sincerely,
> > Eric
> >
> > On Tue, Apr 2, 2019 at 4:38 PM Bert Gunter <bgunter.4567 at
gmail.com>
> > wrote:
> >
> > > Also, I suggest you read ?influence which may explain the source
of
> > > your NaN's .
> > >
> > > Bert Gunter
> > >
> > > "The trouble with having an open mind is that people keep
coming along
> > > and sticking things into it."
> > > -- Opus (aka Berkeley Breathed in his "Bloom County"
comic strip )
> > >
> > >
> > > On Tue, Apr 2, 2019 at 1:29 PM Bert Gunter <bgunter.4567 at
gmail.com>
> > wrote:
> > >
> > >> I told you already: **Include code inline **
> > >>
> > >> See ?dput for how to include a text version of objects, such
as data
> > >> frames, inline.
> > >>
> > >> Otherwise, I believe .txt text files are not stripped if you
insist
> > >> on
> > >> *attaching* data or code. Others may have better advice.
> > >>
> > >>
> > >> Bert Gunter
> > >>
> > >> "The trouble with having an open mind is that people
keep coming
> > >> along and sticking things into it."
> > >> -- Opus (aka Berkeley Breathed in his "Bloom
County" comic strip )
> > >>
> > >>
> > >> On Tue, Apr 2, 2019 at 1:21 PM Eric Bridgeford <ericwb95
at gmail.com>
> > >> wrote:
> > >>
> > >>> How can I add attachments? The following two files were
attached in
> > >>> the initial message
> > >>>
> > >>> On Tue, Apr 2, 2019 at 3:34 PM Bert Gunter
<bgunter.4567 at gmail.com>
> > >>> wrote:
> > >>>
> > >>>> Nothing was attached. The r-help server strips most
attachments.
> > >>>> Include your code inline.
> > >>>>
> > >>>> Also note that
> > >>>>
> > >>>> > 0/0
> > >>>> [1] NaN
> > >>>>
> > >>>> so maybe something like that occurs in the course of
your
> calculations.
> > >>>> But that's just a guess, so feel free to
disregard.
> > >>>>
> > >>>>
> > >>>> Bert Gunter
> > >>>>
> > >>>> "The trouble with having an open mind is that
people keep coming
> > >>>> along and sticking things into it."
> > >>>> -- Opus (aka Berkeley Breathed in his "Bloom
County" comic strip )
> > >>>>
> > >>>>
> > >>>> On Tue, Apr 2, 2019 at 11:32 AM Eric Bridgeford
> > >>>> <ericwb95 at gmail.com>
> > >>>> wrote:
> > >>>>
> > >>>>> Hi R core team,
> > >>>>>
> > >>>>> I experienced the following issue with the
attached data/code
> > >>>>> snippet, where the studentized residual for a
single observation
> > >>>>> appears to be NaN given finite
predictors/responses, which appears
> > >>>>> to be driven by the glm.influence method in the
stats package. I
> > >>>>> am curious to whether this is a consequence of
the specific
> > >>>>> implementation used for computing the influence,
which it would
> > >>>>> appear is the driving force for the NaN influence
for the point,
> > >>>>> that I was ultimately able to trace back through
the lm.influence
> > >>>>> method to this specific line <
> > >>>>> https://github.com/SurajGupta/r-
> > source/blob/a28e609e72ed7c47f6ddfb
> > >>>>>
b86c85279a0750f0b7/src/library/stats/R/lm.influence.R#L67
> > >>>>> >
> > >>>>> which
> > >>>>> calls C code which calls iminfl.f
> > >>>>> <
> > >>>>>
https://github.com/SurajGupta/r-source/blob/master/src/library/sta
> > >>>>> ts/src/lminfl.f
> > >>>>> >
> > >>>>> (I
> > >>>>> don't know fortran so I can't debug
further). My understanding is
> > >>>>> that the specific issue would have to do with the
leave-one-out
> > >>>>> variance estimate associated with this particular
point, which it
> > >>>>> seems based on my understanding should be finite
given finite
> > >>>>> predictors/responses. Let me know. Thanks!
> > >>>>>
> > >>>>> Sincerely,
> > >>>>>
> > >>>>> --
> > >>>>> Eric Bridgeford
> > >>>>> ericwb.me
> > >>>>> ______________________________________________
> > >>>>> R-help at r-project.org mailing list -- To
UNSUBSCRIBE and more, see
> > >>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> > >>>>> PLEASE do read the posting guide
> > >>>>> http://www.R-project.org/posting-guide.html
> > >>>>> and provide commented, minimal, self-contained,
reproducible code.
> > >>>>>
> > >>>>
> > >>>
> > >>> --
> > >>> Eric Bridgeford
> > >>> ericwb.me
> > >>>
> > >>
> >
> > --
> > Eric Bridgeford
> > ericwb.me
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

-- 
Eric Bridgeford
ericwb.me

	[[alternative HTML version deleted]]

peter dalgaard

2019-Apr-03 09:36 UTC

head link

[R] Fwd: Potential Issue with lm.influence

Yes, also notice that 
> predict(fit3, new=moon_data, type="resp")           1            2            3            4            5            6 
1.060694e+00 1.102008e+00 1.109695e+00 1.065515e+00 1.057896e+00 1.892312e+29 
           7            8            9           10           11           12 
3.531271e+17 2.295015e+01 1.739889e+01 1.058165e+00 1.058041e+00 1.057957e+00 
          13 
1.058217e+00 


so the model of fit3 predicts that Jupiter and Saturn should have several
bazillions of moons each!

-pd


> On 3 Apr 2019, at 01:53 , Fox, John <jfox at mcmaster.ca> wrote:
> 
> Dear Eric,
> 
> Have you looked at your data? -- for example:
> 
> 	plot(log(Moons) ~ Volume, data = moon_data)
> 	text(log(Moons) ~ Volume, data = moon_data, labels=Name, adj=1, subset =
Volume > 400)
> 
> The negative-binomial model doesn't look reasonable, does it?
> 
> After you eliminate Jupiter there's one very high leverage point left,
Saturn. Computing studentized residuals entails an approximation to deleting
that as well from the model, so try fitting
> 
> 	fit3 <- update(fit, subset = !(Name %in% c("Jupiter ",
"Saturn ")))
> 	summary(fit3)
> 
> which runs into numeric difficulties.
> 
> Then look at:
> 
> 	plot(log(Moons) ~ Volume, data = moon_data, subset = Volume < 400)
> 
> Finally, try
> 
> 	plot(log(Moons) ~ log(Volume), data = moon_data)
> 	fit4 <- update(fit2, . ~ log(Volume))
> 	rstudent(fit4)
> 
> I hope this helps,
> John
> 
> -----------------------------------------------------------------
> John Fox
> Professor Emeritus
> McMaster University
> Hamilton, Ontario, Canada
> Web: https://socialsciences.mcmaster.ca/jfox/
> 
> 
> 
> 
>> -----Original Message-----
>> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Eric
>> Bridgeford
>> Sent: Tuesday, April 2, 2019 5:01 PM
>> To: Bert Gunter <bgunter.4567 at gmail.com>
>> Cc: R-help <r-help at r-project.org>
>> Subject: Re: [R] Fwd: Potential Issue with lm.influence
>> 
>> I agree the influence documentation suggests NaNs may result; however,
as
>> these can be manually computed and are, indeed, finite/existing (ie,
>> computing the held-out influence by manually training n models for n
points
>> to obtain n leave one out influence measures), I don't possibly see
how the
>> function SHOULD return NaN, and given that it is returning NaN, that
>> suggests to me that there should be either a) Providing an alternative
>> method to compute them that (may be slower) that returns the correct
>> results in the even that lm.influence does not return a good
approximation
>> (ie, a command line argument for type="approx" that does the
>> approximation strategy employed currently, or an alternative
type="direct"
>> or something like that that computes them manually), or b) a heuristic
to
>> suggest why NaNs might result from one's particular inputs/what can
be
>> done to fix it (if the approximation strategy is the source of the
problem) or
>> what the issue is with the data that will cause NaNs. Hence I was
looking to
>> start a discussion around the specific strategy employed to compute the
>> elements.
>> 
>> Below is the code:
>> moon_data <- structure(list(Name = structure(c(8L, 13L, 2L, 7L, 1L,
5L, 11L,
>>                                               12L, 9L, 10L, 4L, 6L,
3L), .Label = c("Ceres ", "Earth",
>> "Eris ",
>> 
>>         "Haumea ", "Jupiter ", "Makemake
", "Mars ", "Mercury ", "Neptune ",
>> 
>>         "Pluto ", "Saturn ", "Uranus ",
"Venus "), class = "factor"),
>>                            Distance = c(0.39, 0.72, 1, 1.52, 2.75, 5.2,
9.54, 19.22,
>>                                         30.06, 39.5, 43.35, 45.8,
67.7), Diameter = c(0.382, 0.949,
>> 
>>           1, 0.532, 0.08, 11.209, 9.449, 4.007, 3.883, 0.18, 0.15,
>> 
>>           0.12, 0.19), Mass = c(0.06, 0.82, 1, 0.11, 2e-04, 317.8,
>> 
>>                                 95.2, 14.6, 17.2, 0.0022, 7e-04, 7e-04,
0.0025), Moons = c(0L,
>> 
>> 
>>                0L, 1L, 2L, 0L, 64L, 62L, 27L, 13L, 4L, 2L, 0L, 1L),
Volume >> c(0.0291869497930152,
>> 
>> 
>> 
>>    0.447504348276571, 0.523598775598299, 0.0788376225681443,
>> 
>> 
>> 
>>    0.000268082573106329, 737.393372232996, 441.729261571372,
>> 
>> 
>> 
>>    33.6865588825666, 30.6549628355953, 0.00305362805928928,
>> 
>> 
>> 
>>    0.00176714586764426, 0.00090477868423386, 0.00359136400182873
>> 
>> 
>>                )), row.names = c(NA, -13L), class =
"data.frame")
>> 
>> fit <- glm.nb(Moons ~ Volume, data = moon_data)
>> rstudent(fit)
>> 
>> fit2 <- update(fit, subset = Name != "Jupiter ")
>> rstudent(fit2)
>> 
>> influence(fit2)$sigma
>> 
>> #        1        2        3        4        5        7        8       
9
>>     10       11       12       13
>> # 1.077945 1.077813 1.165025 1.181685 1.077954      NaN 1.044454
1.152110
>> 1.187586 1.181696 1.077954 1.165147
>> 
>> Sincerely,
>> Eric
>> 
>> On Tue, Apr 2, 2019 at 4:38 PM Bert Gunter <bgunter.4567 at
gmail.com>
>> wrote:
>> 
>>> Also, I suggest you read ?influence which may explain the source of
>>> your NaN's .
>>> 
>>> Bert Gunter
>>> 
>>> "The trouble with having an open mind is that people keep
coming along
>>> and sticking things into it."
>>> -- Opus (aka Berkeley Breathed in his "Bloom County"
comic strip )
>>> 
>>> 
>>> On Tue, Apr 2, 2019 at 1:29 PM Bert Gunter <bgunter.4567 at
gmail.com>
>> wrote:
>>> 
>>>> I told you already: **Include code inline **
>>>> 
>>>> See ?dput for how to include a text version of objects, such as
data
>>>> frames, inline.
>>>> 
>>>> Otherwise, I believe .txt text files are not stripped if you
insist
>>>> on
>>>> *attaching* data or code. Others may have better advice.
>>>> 
>>>> 
>>>> Bert Gunter
>>>> 
>>>> "The trouble with having an open mind is that people keep
coming
>>>> along and sticking things into it."
>>>> -- Opus (aka Berkeley Breathed in his "Bloom County"
comic strip )
>>>> 
>>>> 
>>>> On Tue, Apr 2, 2019 at 1:21 PM Eric Bridgeford <ericwb95 at
gmail.com>
>>>> wrote:
>>>> 
>>>>> How can I add attachments? The following two files were
attached in
>>>>> the initial message
>>>>> 
>>>>> On Tue, Apr 2, 2019 at 3:34 PM Bert Gunter <bgunter.4567
at gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> Nothing was attached. The r-help server strips most
attachments.
>>>>>> Include your code inline.
>>>>>> 
>>>>>> Also note that
>>>>>> 
>>>>>>> 0/0
>>>>>> [1] NaN
>>>>>> 
>>>>>> so maybe something like that occurs in the course of
your calculations.
>>>>>> But that's just a guess, so feel free to disregard.
>>>>>> 
>>>>>> 
>>>>>> Bert Gunter
>>>>>> 
>>>>>> "The trouble with having an open mind is that
people keep coming
>>>>>> along and sticking things into it."
>>>>>> -- Opus (aka Berkeley Breathed in his "Bloom
County" comic strip )
>>>>>> 
>>>>>> 
>>>>>> On Tue, Apr 2, 2019 at 11:32 AM Eric Bridgeford
>>>>>> <ericwb95 at gmail.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi R core team,
>>>>>>> 
>>>>>>> I experienced the following issue with the attached
data/code
>>>>>>> snippet, where the studentized residual for a
single observation
>>>>>>> appears to be NaN given finite
predictors/responses, which appears
>>>>>>> to be driven by the glm.influence method in the
stats package. I
>>>>>>> am curious to whether this is a consequence of the
specific
>>>>>>> implementation used for computing the influence,
which it would
>>>>>>> appear is the driving force for the NaN influence
for the point,
>>>>>>> that I was ultimately able to trace back through
the lm.influence
>>>>>>> method to this specific line <
>>>>>>> https://github.com/SurajGupta/r-
>> source/blob/a28e609e72ed7c47f6ddfb
>>>>>>>
b86c85279a0750f0b7/src/library/stats/R/lm.influence.R#L67
>>>>>>>> 
>>>>>>> which
>>>>>>> calls C code which calls iminfl.f
>>>>>>> <
>>>>>>>
https://github.com/SurajGupta/r-source/blob/master/src/library/sta
>>>>>>> ts/src/lminfl.f
>>>>>>>> 
>>>>>>> (I
>>>>>>> don't know fortran so I can't debug
further). My understanding is
>>>>>>> that the specific issue would have to do with the
leave-one-out
>>>>>>> variance estimate associated with this particular
point, which it
>>>>>>> seems based on my understanding should be finite
given finite
>>>>>>> predictors/responses. Let me know. Thanks!
>>>>>>> 
>>>>>>> Sincerely,
>>>>>>> 
>>>>>>> --
>>>>>>> Eric Bridgeford
>>>>>>> ericwb.me
>>>>>>> ______________________________________________
>>>>>>> R-help at r-project.org mailing list -- To
UNSUBSCRIBE and more, see
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>> PLEASE do read the posting guide
>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>> and provide commented, minimal, self-contained,
reproducible code.
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> --
>>>>> Eric Bridgeford
>>>>> ericwb.me
>>>>> 
>>>> 
>> 
>> --
>> Eric Bridgeford
>> ericwb.me
>> 
>> 	[[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

Fox, John

2019-Apr-03 14:18 UTC

head link

[R] Potential Issue with lm.influence

Dear Eric,

I'm afraid that your argument doesn't make sense to me. As you saw when
you tried

	fit3 <- update(fit, subset = !(Name %in% c("Jupiter ",
"Saturn ")))

glm.nb() effectively wasn't able to estimate the theta parameter of the
negative binomial model. So why would it be better to base deletion diagnostics
on actually refitting the model?

The lesson to me here is that if you fit a sufficiently unreasonable model to
data, the computations may break down. Other than drawing attention to the NaN
with an explicit warning, I don't see what more could usefully be done.

Best,
 John
> On Apr 2, 2019, at 9:08 PM, Eric Bridgeford <ericwb95 at gmail.com>
wrote:
> 
> Hey John,
> 
> I am aware they are high leverage points, and that the model is not the
> best for them. The purpose of this dataset was to explore high leverage
> points, and diagnostic statistics through which one would identify them.
> 
> What I am saying is that the current behavior of the function seems a
> little non-specific to me; the influence for this problem is
> finite/computable manually by fitting n models to n-1 points (manually
> holding out each point individually to obtain the loo-variance, and
> computing the influence in the non-approximate way).
> 
> I am just suggesting that it seems the function could be improved by, say,
> throwing specific warnings when NaNs may arise. Ie, "Your have points
that
> are very high leverage. The approximation technique is not numerically
> stable for these points and the results should be used with caution"
> etc...; I am sure there are other also pre-hoc approaches to diagnose other
> ways in which this function could fail). The approximation technique not
> behaving well for points that are ultra high leverage just seems peculiar
> that that would return an NaN with no other recommendations/advice/specific
> warnings, especially since the influence is frequently used to diagnosing
> this specific issue.
> 
> Alternatively, one could afford an optional argument
type="manual" that
> computes the held-out variance manually rather than the approximate
> fashion, and add a comment to use this in the help menu when you have high
> leverage points (this is what I ended up doing to obtain the true influence
> and the externally studentized residual).
> 
> I just think some more specificity could be of use for future users, to
> make the R:stats community even better :) Does that make sense?
> 
> Sincerely,
> Eric
> 
> On Tue, Apr 2, 2019 at 7:53 PM Fox, John <jfox at mcmaster.ca> wrote:
> 
>> Dear Eric,
>> 
>> Have you looked at your data? -- for example:
>> 
>>        plot(log(Moons) ~ Volume, data = moon_data)
>>        text(log(Moons) ~ Volume, data = moon_data, labels=Name, adj=1,
>> subset = Volume > 400)
>> 
>> The negative-binomial model doesn't look reasonable, does it?
>> 
>> After you eliminate Jupiter there's one very high leverage point
left,
>> Saturn. Computing studentized residuals entails an approximation to
>> deleting that as well from the model, so try fitting
>> 
>>        fit3 <- update(fit, subset = !(Name %in% c("Jupiter
", "Saturn ")))
>>        summary(fit3)
>> 
>> which runs into numeric difficulties.
>> 
>> Then look at:
>> 
>>        plot(log(Moons) ~ Volume, data = moon_data, subset = Volume <
400)
>> 
>> Finally, try
>> 
>>        plot(log(Moons) ~ log(Volume), data = moon_data)
>>        fit4 <- update(fit2, . ~ log(Volume))
>>        rstudent(fit4)
>> 
>> I hope this helps,
>> John
>> 
>> -----------------------------------------------------------------
>> John Fox
>> Professor Emeritus
>> McMaster University
>> Hamilton, Ontario, Canada
>> Web: https://socialsciences.mcmaster.ca/jfox/
>> 
>> 
>> 
>> 
>>> -----Original Message-----
>>> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of
Eric
>>> Bridgeford
>>> Sent: Tuesday, April 2, 2019 5:01 PM
>>> To: Bert Gunter <bgunter.4567 at gmail.com>
>>> Cc: R-help <r-help at r-project.org>
>>> Subject: Re: [R] Fwd: Potential Issue with lm.influence
>>> 
>>> I agree the influence documentation suggests NaNs may result;
however, as
>>> these can be manually computed and are, indeed, finite/existing
(ie,
>>> computing the held-out influence by manually training n models for
n
>> points
>>> to obtain n leave one out influence measures), I don't possibly
see how
>> the
>>> function SHOULD return NaN, and given that it is returning NaN,
that
>>> suggests to me that there should be either a) Providing an
alternative
>>> method to compute them that (may be slower) that returns the
correct
>>> results in the even that lm.influence does not return a good
>> approximation
>>> (ie, a command line argument for type="approx" that does
the
>>> approximation strategy employed currently, or an alternative
>> type="direct"
>>> or something like that that computes them manually), or b) a
heuristic to
>>> suggest why NaNs might result from one's particular inputs/what
can be
>>> done to fix it (if the approximation strategy is the source of the
>> problem) or
>>> what the issue is with the data that will cause NaNs. Hence I was
>> looking to
>>> start a discussion around the specific strategy employed to compute
the
>>> elements.
>>> 
>>> Below is the code:
>>> moon_data <- structure(list(Name = structure(c(8L, 13L, 2L, 7L,
1L, 5L,
>> 11L,
>>>                                               12L, 9L, 10L, 4L, 6L,
>> 3L), .Label = c("Ceres ", "Earth",
>>> "Eris ",
>>> 
>>>         "Haumea ", "Jupiter ", "Makemake
", "Mars ", "Mercury ",
>> "Neptune ",
>>> 
>>>         "Pluto ", "Saturn ", "Uranus
", "Venus "), class = "factor"),
>>>                            Distance = c(0.39, 0.72, 1, 1.52, 2.75,
5.2,
>> 9.54, 19.22,
>>>                                         30.06, 39.5, 43.35, 45.8,
>> 67.7), Diameter = c(0.382, 0.949,
>>> 
>>>           1, 0.532, 0.08, 11.209, 9.449, 4.007, 3.883, 0.18, 0.15,
>>> 
>>>           0.12, 0.19), Mass = c(0.06, 0.82, 1, 0.11, 2e-04, 317.8,
>>> 
>>>                                 95.2, 14.6, 17.2, 0.0022, 7e-04,
7e-04,
>> 0.0025), Moons = c(0L,
>>> 
>>> 
>>>                0L, 1L, 2L, 0L, 64L, 62L, 27L, 13L, 4L, 2L, 0L, 1L),
>> Volume >>> c(0.0291869497930152,
>>> 
>>> 
>>> 
>>>    0.447504348276571, 0.523598775598299, 0.0788376225681443,
>>> 
>>> 
>>> 
>>>    0.000268082573106329, 737.393372232996, 441.729261571372,
>>> 
>>> 
>>> 
>>>    33.6865588825666, 30.6549628355953, 0.00305362805928928,
>>> 
>>> 
>>> 
>>>    0.00176714586764426, 0.00090477868423386, 0.00359136400182873
>>> 
>>> 
>>>                )), row.names = c(NA, -13L), class =
"data.frame")
>>> 
>>> fit <- glm.nb(Moons ~ Volume, data = moon_data)
>>> rstudent(fit)
>>> 
>>> fit2 <- update(fit, subset = Name != "Jupiter ")
>>> rstudent(fit2)
>>> 
>>> influence(fit2)$sigma
>>> 
>>> #        1        2        3        4        5        7        8   
9
>>>     10       11       12       13
>>> # 1.077945 1.077813 1.165025 1.181685 1.077954      NaN 1.044454
1.152110
>>> 1.187586 1.181696 1.077954 1.165147
>>> 
>>> Sincerely,
>>> Eric
>>> 
>>> On Tue, Apr 2, 2019 at 4:38 PM Bert Gunter <bgunter.4567 at
gmail.com>
>>> wrote:
>>> 
>>>> Also, I suggest you read ?influence which may explain the
source of
>>>> your NaN's .
>>>> 
>>>> Bert Gunter
>>>> 
>>>> "The trouble with having an open mind is that people keep
coming along
>>>> and sticking things into it."
>>>> -- Opus (aka Berkeley Breathed in his "Bloom County"
comic strip )
>>>> 
>>>> 
>>>> On Tue, Apr 2, 2019 at 1:29 PM Bert Gunter <bgunter.4567 at
gmail.com>
>>> wrote:
>>>> 
>>>>> I told you already: **Include code inline **
>>>>> 
>>>>> See ?dput for how to include a text version of objects,
such as data
>>>>> frames, inline.
>>>>> 
>>>>> Otherwise, I believe .txt text files are not stripped if
you insist
>>>>> on
>>>>> *attaching* data or code. Others may have better advice.
>>>>> 
>>>>> 
>>>>> Bert Gunter
>>>>> 
>>>>> "The trouble with having an open mind is that people
keep coming
>>>>> along and sticking things into it."
>>>>> -- Opus (aka Berkeley Breathed in his "Bloom
County" comic strip )
>>>>> 
>>>>> 
>>>>> On Tue, Apr 2, 2019 at 1:21 PM Eric Bridgeford <ericwb95
at gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> How can I add attachments? The following two files were
attached in
>>>>>> the initial message
>>>>>> 
>>>>>> On Tue, Apr 2, 2019 at 3:34 PM Bert Gunter
<bgunter.4567 at gmail.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> Nothing was attached. The r-help server strips most
attachments.
>>>>>>> Include your code inline.
>>>>>>> 
>>>>>>> Also note that
>>>>>>> 
>>>>>>>> 0/0
>>>>>>> [1] NaN
>>>>>>> 
>>>>>>> so maybe something like that occurs in the course
of your
>> calculations.
>>>>>>> But that's just a guess, so feel free to
disregard.
>>>>>>> 
>>>>>>> 
>>>>>>> Bert Gunter
>>>>>>> 
>>>>>>> "The trouble with having an open mind is that
people keep coming
>>>>>>> along and sticking things into it."
>>>>>>> -- Opus (aka Berkeley Breathed in his "Bloom
County" comic strip )
>>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Apr 2, 2019 at 11:32 AM Eric Bridgeford
>>>>>>> <ericwb95 at gmail.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi R core team,
>>>>>>>> 
>>>>>>>> I experienced the following issue with the
attached data/code
>>>>>>>> snippet, where the studentized residual for a
single observation
>>>>>>>> appears to be NaN given finite
predictors/responses, which appears
>>>>>>>> to be driven by the glm.influence method in the
stats package. I
>>>>>>>> am curious to whether this is a consequence of
the specific
>>>>>>>> implementation used for computing the
influence, which it would
>>>>>>>> appear is the driving force for the NaN
influence for the point,
>>>>>>>> that I was ultimately able to trace back
through the lm.influence
>>>>>>>> method to this specific line <
>>>>>>>> https://github.com/SurajGupta/r-
>>> source/blob/a28e609e72ed7c47f6ddfb
>>>>>>>>
b86c85279a0750f0b7/src/library/stats/R/lm.influence.R#L67
>>>>>>>>> 
>>>>>>>> which
>>>>>>>> calls C code which calls iminfl.f
>>>>>>>> <
>>>>>>>>
https://github.com/SurajGupta/r-source/blob/master/src/library/sta
>>>>>>>> ts/src/lminfl.f
>>>>>>>>> 
>>>>>>>> (I
>>>>>>>> don't know fortran so I can't debug
further). My understanding is
>>>>>>>> that the specific issue would have to do with
the leave-one-out
>>>>>>>> variance estimate associated with this
particular point, which it
>>>>>>>> seems based on my understanding should be
finite given finite
>>>>>>>> predictors/responses. Let me know. Thanks!
>>>>>>>> 
>>>>>>>> Sincerely,
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Eric Bridgeford
>>>>>>>> ericwb.me
>>>>>>>> ______________________________________________
>>>>>>>> R-help at r-project.org mailing list -- To
UNSUBSCRIBE and more, see
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>> PLEASE do read the posting guide
>>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>>> and provide commented, minimal, self-contained,
reproducible code.
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Eric Bridgeford
>>>>>> ericwb.me
>>>>>> 
>>>>> 
>>> 
>>> --
>>> Eric Bridgeford
>>> ericwb.me
>>> 
>>>      [[alternative HTML version deleted]]
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-
>>> guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
> 
> -- 
> Eric Bridgeford
> ericwb.me
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Eric Bridgeford

2019-Apr-03 14:20 UTC

head link

[R] Potential Issue with lm.influence

Hey John,

Seems fair, and, I agree a more explicit or clear (ie, giving users
indications as to why/when the lm.influence is going to misfit the data)
warning makes sense in context.

Sincerely,
Eric

On Wed, Apr 3, 2019 at 10:18 AM Fox, John <jfox at mcmaster.ca> wrote:
> Dear Eric,
>
> I'm afraid that your argument doesn't make sense to me. As you saw
when
> you tried
>
>         fit3 <- update(fit, subset = !(Name %in% c("Jupiter ",
"Saturn ")))
>
> glm.nb() effectively wasn't able to estimate the theta parameter of the
> negative binomial model. So why would it be better to base deletion
> diagnostics on actually refitting the model?
>
> The lesson to me here is that if you fit a sufficiently unreasonable model
> to data, the computations may break down. Other than drawing attention to
> the NaN with an explicit warning, I don't see what more could usefully
be
> done.
>
> Best,
>  John
>
> > On Apr 2, 2019, at 9:08 PM, Eric Bridgeford <ericwb95 at
gmail.com> wrote:
> >
> > Hey John,
> >
> > I am aware they are high leverage points, and that the model is not
the
> > best for them. The purpose of this dataset was to explore high
leverage
> > points, and diagnostic statistics through which one would identify
them.
> >
> > What I am saying is that the current behavior of the function seems a
> > little non-specific to me; the influence for this problem is
> > finite/computable manually by fitting n models to n-1 points (manually
> > holding out each point individually to obtain the loo-variance, and
> > computing the influence in the non-approximate way).
> >
> > I am just suggesting that it seems the function could be improved by,
> say,
> > throwing specific warnings when NaNs may arise. Ie, "Your have
points
> that
> > are very high leverage. The approximation technique is not numerically
> > stable for these points and the results should be used with
caution"
> > etc...; I am sure there are other also pre-hoc approaches to diagnose
> other
> > ways in which this function could fail). The approximation technique
not
> > behaving well for points that are ultra high leverage just seems
peculiar
> > that that would return an NaN with no other
> recommendations/advice/specific
> > warnings, especially since the influence is frequently used to
diagnosing
> > this specific issue.
> >
> > Alternatively, one could afford an optional argument
type="manual" that
> > computes the held-out variance manually rather than the approximate
> > fashion, and add a comment to use this in the help menu when you have
> high
> > leverage points (this is what I ended up doing to obtain the true
> influence
> > and the externally studentized residual).
> >
> > I just think some more specificity could be of use for future users,
to
> > make the R:stats community even better :) Does that make sense?
> >
> > Sincerely,
> > Eric
> >
> > On Tue, Apr 2, 2019 at 7:53 PM Fox, John <jfox at mcmaster.ca>
wrote:
> >
> >> Dear Eric,
> >>
> >> Have you looked at your data? -- for example:
> >>
> >>        plot(log(Moons) ~ Volume, data = moon_data)
> >>        text(log(Moons) ~ Volume, data = moon_data, labels=Name,
adj=1,
> >> subset = Volume > 400)
> >>
> >> The negative-binomial model doesn't look reasonable, does it?
> >>
> >> After you eliminate Jupiter there's one very high leverage
point left,
> >> Saturn. Computing studentized residuals entails an approximation
to
> >> deleting that as well from the model, so try fitting
> >>
> >>        fit3 <- update(fit, subset = !(Name %in% c("Jupiter
", "Saturn
> ")))
> >>        summary(fit3)
> >>
> >> which runs into numeric difficulties.
> >>
> >> Then look at:
> >>
> >>        plot(log(Moons) ~ Volume, data = moon_data, subset = Volume
<
> 400)
> >>
> >> Finally, try
> >>
> >>        plot(log(Moons) ~ log(Volume), data = moon_data)
> >>        fit4 <- update(fit2, . ~ log(Volume))
> >>        rstudent(fit4)
> >>
> >> I hope this helps,
> >> John
> >>
> >> -----------------------------------------------------------------
> >> John Fox
> >> Professor Emeritus
> >> McMaster University
> >> Hamilton, Ontario, Canada
> >> Web: https://socialsciences.mcmaster.ca/jfox/
> >>
> >>
> >>
> >>
> >>> -----Original Message-----
> >>> From: R-help [mailto:r-help-bounces at r-project.org] On
Behalf Of Eric
> >>> Bridgeford
> >>> Sent: Tuesday, April 2, 2019 5:01 PM
> >>> To: Bert Gunter <bgunter.4567 at gmail.com>
> >>> Cc: R-help <r-help at r-project.org>
> >>> Subject: Re: [R] Fwd: Potential Issue with lm.influence
> >>>
> >>> I agree the influence documentation suggests NaNs may result;
however,
> as
> >>> these can be manually computed and are, indeed,
finite/existing (ie,
> >>> computing the held-out influence by manually training n models
for n
> >> points
> >>> to obtain n leave one out influence measures), I don't
possibly see how
> >> the
> >>> function SHOULD return NaN, and given that it is returning
NaN, that
> >>> suggests to me that there should be either a) Providing an
alternative
> >>> method to compute them that (may be slower) that returns the
correct
> >>> results in the even that lm.influence does not return a good
> >> approximation
> >>> (ie, a command line argument for type="approx" that
does the
> >>> approximation strategy employed currently, or an alternative
> >> type="direct"
> >>> or something like that that computes them manually), or b) a
heuristic
> to
> >>> suggest why NaNs might result from one's particular
inputs/what can be
> >>> done to fix it (if the approximation strategy is the source of
the
> >> problem) or
> >>> what the issue is with the data that will cause NaNs. Hence I
was
> >> looking to
> >>> start a discussion around the specific strategy employed to
compute the
> >>> elements.
> >>>
> >>> Below is the code:
> >>> moon_data <- structure(list(Name = structure(c(8L, 13L, 2L,
7L, 1L, 5L,
> >> 11L,
> >>>                                               12L, 9L, 10L,
4L, 6L,
> >> 3L), .Label = c("Ceres ", "Earth",
> >>> "Eris ",
> >>>
> >>>         "Haumea ", "Jupiter ",
"Makemake ", "Mars ", "Mercury ",
> >> "Neptune ",
> >>>
> >>>         "Pluto ", "Saturn ", "Uranus
", "Venus "), class = "factor"),
> >>>                            Distance = c(0.39, 0.72, 1, 1.52,
2.75, 5.2,
> >> 9.54, 19.22,
> >>>                                         30.06, 39.5, 43.35,
45.8,
> >> 67.7), Diameter = c(0.382, 0.949,
> >>>
> >>>           1, 0.532, 0.08, 11.209, 9.449, 4.007, 3.883, 0.18,
0.15,
> >>>
> >>>           0.12, 0.19), Mass = c(0.06, 0.82, 1, 0.11, 2e-04,
317.8,
> >>>
> >>>                                 95.2, 14.6, 17.2, 0.0022,
7e-04, 7e-04,
> >> 0.0025), Moons = c(0L,
> >>>
> >>>
> >>>                0L, 1L, 2L, 0L, 64L, 62L, 27L, 13L, 4L, 2L, 0L,
1L),
> >> Volume > >>> c(0.0291869497930152,
> >>>
> >>>
> >>>
> >>>    0.447504348276571, 0.523598775598299, 0.0788376225681443,
> >>>
> >>>
> >>>
> >>>    0.000268082573106329, 737.393372232996, 441.729261571372,
> >>>
> >>>
> >>>
> >>>    33.6865588825666, 30.6549628355953, 0.00305362805928928,
> >>>
> >>>
> >>>
> >>>    0.00176714586764426, 0.00090477868423386,
0.00359136400182873
> >>>
> >>>
> >>>                )), row.names = c(NA, -13L), class =
"data.frame")
> >>>
> >>> fit <- glm.nb(Moons ~ Volume, data = moon_data)
> >>> rstudent(fit)
> >>>
> >>> fit2 <- update(fit, subset = Name != "Jupiter ")
> >>> rstudent(fit2)
> >>>
> >>> influence(fit2)$sigma
> >>>
> >>> #        1        2        3        4        5        7       
8
>   9
> >>>     10       11       12       13
> >>> # 1.077945 1.077813 1.165025 1.181685 1.077954      NaN
1.044454
> 1.152110
> >>> 1.187586 1.181696 1.077954 1.165147
> >>>
> >>> Sincerely,
> >>> Eric
> >>>
> >>> On Tue, Apr 2, 2019 at 4:38 PM Bert Gunter <bgunter.4567 at
gmail.com>
> >>> wrote:
> >>>
> >>>> Also, I suggest you read ?influence which may explain the
source of
> >>>> your NaN's .
> >>>>
> >>>> Bert Gunter
> >>>>
> >>>> "The trouble with having an open mind is that people
keep coming along
> >>>> and sticking things into it."
> >>>> -- Opus (aka Berkeley Breathed in his "Bloom
County" comic strip )
> >>>>
> >>>>
> >>>> On Tue, Apr 2, 2019 at 1:29 PM Bert Gunter
<bgunter.4567 at gmail.com>
> >>> wrote:
> >>>>
> >>>>> I told you already: **Include code inline **
> >>>>>
> >>>>> See ?dput for how to include a text version of
objects, such as data
> >>>>> frames, inline.
> >>>>>
> >>>>> Otherwise, I believe .txt text files are not stripped
if you insist
> >>>>> on
> >>>>> *attaching* data or code. Others may have better
advice.
> >>>>>
> >>>>>
> >>>>> Bert Gunter
> >>>>>
> >>>>> "The trouble with having an open mind is that
people keep coming
> >>>>> along and sticking things into it."
> >>>>> -- Opus (aka Berkeley Breathed in his "Bloom
County" comic strip )
> >>>>>
> >>>>>
> >>>>> On Tue, Apr 2, 2019 at 1:21 PM Eric Bridgeford
<ericwb95 at gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> How can I add attachments? The following two files
were attached in
> >>>>>> the initial message
> >>>>>>
> >>>>>> On Tue, Apr 2, 2019 at 3:34 PM Bert Gunter
<bgunter.4567 at gmail.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Nothing was attached. The r-help server strips
most attachments.
> >>>>>>> Include your code inline.
> >>>>>>>
> >>>>>>> Also note that
> >>>>>>>
> >>>>>>>> 0/0
> >>>>>>> [1] NaN
> >>>>>>>
> >>>>>>> so maybe something like that occurs in the
course of your
> >> calculations.
> >>>>>>> But that's just a guess, so feel free to
disregard.
> >>>>>>>
> >>>>>>>
> >>>>>>> Bert Gunter
> >>>>>>>
> >>>>>>> "The trouble with having an open mind is
that people keep coming
> >>>>>>> along and sticking things into it."
> >>>>>>> -- Opus (aka Berkeley Breathed in his
"Bloom County" comic strip )
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, Apr 2, 2019 at 11:32 AM Eric
Bridgeford
> >>>>>>> <ericwb95 at gmail.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Hi R core team,
> >>>>>>>>
> >>>>>>>> I experienced the following issue with the
attached data/code
> >>>>>>>> snippet, where the studentized residual
for a single observation
> >>>>>>>> appears to be NaN given finite
predictors/responses, which appears
> >>>>>>>> to be driven by the glm.influence method
in the stats package. I
> >>>>>>>> am curious to whether this is a
consequence of the specific
> >>>>>>>> implementation used for computing the
influence, which it would
> >>>>>>>> appear is the driving force for the NaN
influence for the point,
> >>>>>>>> that I was ultimately able to trace back
through the lm.influence
> >>>>>>>> method to this specific line <
> >>>>>>>> https://github.com/SurajGupta/r-
> >>> source/blob/a28e609e72ed7c47f6ddfb
> >>>>>>>>
b86c85279a0750f0b7/src/library/stats/R/lm.influence.R#L67
> >>>>>>>>>
> >>>>>>>> which
> >>>>>>>> calls C code which calls iminfl.f
> >>>>>>>> <
> >>>>>>>>
> https://github.com/SurajGupta/r-source/blob/master/src/library/sta
> >>>>>>>> ts/src/lminfl.f
> >>>>>>>>>
> >>>>>>>> (I
> >>>>>>>> don't know fortran so I can't
debug further). My understanding is
> >>>>>>>> that the specific issue would have to do
with the leave-one-out
> >>>>>>>> variance estimate associated with this
particular point, which it
> >>>>>>>> seems based on my understanding should be
finite given finite
> >>>>>>>> predictors/responses. Let me know. Thanks!
> >>>>>>>>
> >>>>>>>> Sincerely,
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Eric Bridgeford
> >>>>>>>> ericwb.me
> >>>>>>>>
______________________________________________
> >>>>>>>> R-help at r-project.org mailing list -- To
UNSUBSCRIBE and more, see
> >>>>>>>>
https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>>>>> PLEASE do read the posting guide
> >>>>>>>>
http://www.R-project.org/posting-guide.html
> >>>>>>>> and provide commented, minimal,
self-contained, reproducible code.
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Eric Bridgeford
> >>>>>> ericwb.me
> >>>>>>
> >>>>>
> >>>
> >>> --
> >>> Eric Bridgeford
> >>> ericwb.me
> >>>
> >>>      [[alternative HTML version deleted]]
> >>>
> >>> ______________________________________________
> >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
http://www.R-project.org/posting-
> >>> guide.html
> >>> and provide commented, minimal, self-contained, reproducible
code.
> >>
> >
> >
> > --
> > Eric Bridgeford
> > ericwb.me
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
-- 
Eric Bridgeford
ericwb.me

	[[alternative HTML version deleted]]

Richard M. Heiberger

2019-Apr-03 16:34 UTC

head link

[R] Potential Issue with lm.influence

fortune nomination.


The lesson to me here is that if you fit a sufficiently unreasonable
model to data, the computations may break down.

On Wed, Apr 3, 2019 at 10:18 AM Fox, John <jfox at mcmaster.ca>
wrote:>
> Dear Eric,
>
> I'm afraid that your argument doesn't make sense to me. As you saw
when you tried
>
>         fit3 <- update(fit, subset = !(Name %in% c("Jupiter ",
"Saturn ")))
>
> glm.nb() effectively wasn't able to estimate the theta parameter of the
negative binomial model. So why would it be better to base deletion diagnostics
on actually refitting the model?
>
> The lesson to me here is that if you fit a sufficiently unreasonable model
to data, the computations may break down. Other than drawing attention to the
NaN with an explicit warning, I don't see what more could usefully be done.
>
> Best,
>  John
>
> > On Apr 2, 2019, at 9:08 PM, Eric Bridgeford <ericwb95 at
gmail.com> wrote:
> >
> > Hey John,
> >
> > I am aware they are high leverage points, and that the model is not
the
> > best for them. The purpose of this dataset was to explore high
leverage
> > points, and diagnostic statistics through which one would identify
them.
> >
> > What I am saying is that the current behavior of the function seems a
> > little non-specific to me; the influence for this problem is
> > finite/computable manually by fitting n models to n-1 points (manually
> > holding out each point individually to obtain the loo-variance, and
> > computing the influence in the non-approximate way).
> >
> > I am just suggesting that it seems the function could be improved by,
say,
> > throwing specific warnings when NaNs may arise. Ie, "Your have
points that
> > are very high leverage. The approximation technique is not numerically
> > stable for these points and the results should be used with
caution"
> > etc...; I am sure there are other also pre-hoc approaches to diagnose
other
> > ways in which this function could fail). The approximation technique
not
> > behaving well for points that are ultra high leverage just seems
peculiar
> > that that would return an NaN with no other
recommendations/advice/specific
> > warnings, especially since the influence is frequently used to
diagnosing
> > this specific issue.
> >
> > Alternatively, one could afford an optional argument
type="manual" that
> > computes the held-out variance manually rather than the approximate
> > fashion, and add a comment to use this in the help menu when you have
high
> > leverage points (this is what I ended up doing to obtain the true
influence
> > and the externally studentized residual).
> >
> > I just think some more specificity could be of use for future users,
to
> > make the R:stats community even better :) Does that make sense?
> >
> > Sincerely,
> > Eric
> >
> > On Tue, Apr 2, 2019 at 7:53 PM Fox, John <jfox at mcmaster.ca>
wrote:
> >
> >> Dear Eric,
> >>
> >> Have you looked at your data? -- for example:
> >>
> >>        plot(log(Moons) ~ Volume, data = moon_data)
> >>        text(log(Moons) ~ Volume, data = moon_data, labels=Name,
adj=1,
> >> subset = Volume > 400)
> >>
> >> The negative-binomial model doesn't look reasonable, does it?
> >>
> >> After you eliminate Jupiter there's one very high leverage
point left,
> >> Saturn. Computing studentized residuals entails an approximation
to
> >> deleting that as well from the model, so try fitting
> >>
> >>        fit3 <- update(fit, subset = !(Name %in% c("Jupiter
", "Saturn ")))
> >>        summary(fit3)
> >>
> >> which runs into numeric difficulties.
> >>
> >> Then look at:
> >>
> >>        plot(log(Moons) ~ Volume, data = moon_data, subset = Volume
< 400)
> >>
> >> Finally, try
> >>
> >>        plot(log(Moons) ~ log(Volume), data = moon_data)
> >>        fit4 <- update(fit2, . ~ log(Volume))
> >>        rstudent(fit4)
> >>
> >> I hope this helps,
> >> John
> >>
> >> -----------------------------------------------------------------
> >> John Fox
> >> Professor Emeritus
> >> McMaster University
> >> Hamilton, Ontario, Canada
> >> Web: https://socialsciences.mcmaster.ca/jfox/
> >>
> >>
> >>
> >>
> >>> -----Original Message-----
> >>> From: R-help [mailto:r-help-bounces at r-project.org] On
Behalf Of Eric
> >>> Bridgeford
> >>> Sent: Tuesday, April 2, 2019 5:01 PM
> >>> To: Bert Gunter <bgunter.4567 at gmail.com>
> >>> Cc: R-help <r-help at r-project.org>
> >>> Subject: Re: [R] Fwd: Potential Issue with lm.influence
> >>>
> >>> I agree the influence documentation suggests NaNs may result;
however, as
> >>> these can be manually computed and are, indeed,
finite/existing (ie,
> >>> computing the held-out influence by manually training n models
for n
> >> points
> >>> to obtain n leave one out influence measures), I don't
possibly see how
> >> the
> >>> function SHOULD return NaN, and given that it is returning
NaN, that
> >>> suggests to me that there should be either a) Providing an
alternative
> >>> method to compute them that (may be slower) that returns the
correct
> >>> results in the even that lm.influence does not return a good
> >> approximation
> >>> (ie, a command line argument for type="approx" that
does the
> >>> approximation strategy employed currently, or an alternative
> >> type="direct"
> >>> or something like that that computes them manually), or b) a
heuristic to
> >>> suggest why NaNs might result from one's particular
inputs/what can be
> >>> done to fix it (if the approximation strategy is the source of
the
> >> problem) or
> >>> what the issue is with the data that will cause NaNs. Hence I
was
> >> looking to
> >>> start a discussion around the specific strategy employed to
compute the
> >>> elements.
> >>>
> >>> Below is the code:
> >>> moon_data <- structure(list(Name = structure(c(8L, 13L, 2L,
7L, 1L, 5L,
> >> 11L,
> >>>                                               12L, 9L, 10L,
4L, 6L,
> >> 3L), .Label = c("Ceres ", "Earth",
> >>> "Eris ",
> >>>
> >>>         "Haumea ", "Jupiter ",
"Makemake ", "Mars ", "Mercury ",
> >> "Neptune ",
> >>>
> >>>         "Pluto ", "Saturn ", "Uranus
", "Venus "), class = "factor"),
> >>>                            Distance = c(0.39, 0.72, 1, 1.52,
2.75, 5.2,
> >> 9.54, 19.22,
> >>>                                         30.06, 39.5, 43.35,
45.8,
> >> 67.7), Diameter = c(0.382, 0.949,
> >>>
> >>>           1, 0.532, 0.08, 11.209, 9.449, 4.007, 3.883, 0.18,
0.15,
> >>>
> >>>           0.12, 0.19), Mass = c(0.06, 0.82, 1, 0.11, 2e-04,
317.8,
> >>>
> >>>                                 95.2, 14.6, 17.2, 0.0022,
7e-04, 7e-04,
> >> 0.0025), Moons = c(0L,
> >>>
> >>>
> >>>                0L, 1L, 2L, 0L, 64L, 62L, 27L, 13L, 4L, 2L, 0L,
1L),
> >> Volume > >>> c(0.0291869497930152,
> >>>
> >>>
> >>>
> >>>    0.447504348276571, 0.523598775598299, 0.0788376225681443,
> >>>
> >>>
> >>>
> >>>    0.000268082573106329, 737.393372232996, 441.729261571372,
> >>>
> >>>
> >>>
> >>>    33.6865588825666, 30.6549628355953, 0.00305362805928928,
> >>>
> >>>
> >>>
> >>>    0.00176714586764426, 0.00090477868423386,
0.00359136400182873
> >>>
> >>>
> >>>                )), row.names = c(NA, -13L), class =
"data.frame")
> >>>
> >>> fit <- glm.nb(Moons ~ Volume, data = moon_data)
> >>> rstudent(fit)
> >>>
> >>> fit2 <- update(fit, subset = Name != "Jupiter ")
> >>> rstudent(fit2)
> >>>
> >>> influence(fit2)$sigma
> >>>
> >>> #        1        2        3        4        5        7       
8        9
> >>>     10       11       12       13
> >>> # 1.077945 1.077813 1.165025 1.181685 1.077954      NaN
1.044454 1.152110
> >>> 1.187586 1.181696 1.077954 1.165147
> >>>
> >>> Sincerely,
> >>> Eric
> >>>
> >>> On Tue, Apr 2, 2019 at 4:38 PM Bert Gunter <bgunter.4567 at
gmail.com>
> >>> wrote:
> >>>
> >>>> Also, I suggest you read ?influence which may explain the
source of
> >>>> your NaN's .
> >>>>
> >>>> Bert Gunter
> >>>>
> >>>> "The trouble with having an open mind is that people
keep coming along
> >>>> and sticking things into it."
> >>>> -- Opus (aka Berkeley Breathed in his "Bloom
County" comic strip )
> >>>>
> >>>>
> >>>> On Tue, Apr 2, 2019 at 1:29 PM Bert Gunter
<bgunter.4567 at gmail.com>
> >>> wrote:
> >>>>
> >>>>> I told you already: **Include code inline **
> >>>>>
> >>>>> See ?dput for how to include a text version of
objects, such as data
> >>>>> frames, inline.
> >>>>>
> >>>>> Otherwise, I believe .txt text files are not stripped
if you insist
> >>>>> on
> >>>>> *attaching* data or code. Others may have better
advice.
> >>>>>
> >>>>>
> >>>>> Bert Gunter
> >>>>>
> >>>>> "The trouble with having an open mind is that
people keep coming
> >>>>> along and sticking things into it."
> >>>>> -- Opus (aka Berkeley Breathed in his "Bloom
County" comic strip )
> >>>>>
> >>>>>
> >>>>> On Tue, Apr 2, 2019 at 1:21 PM Eric Bridgeford
<ericwb95 at gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> How can I add attachments? The following two files
were attached in
> >>>>>> the initial message
> >>>>>>
> >>>>>> On Tue, Apr 2, 2019 at 3:34 PM Bert Gunter
<bgunter.4567 at gmail.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Nothing was attached. The r-help server strips
most attachments.
> >>>>>>> Include your code inline.
> >>>>>>>
> >>>>>>> Also note that
> >>>>>>>
> >>>>>>>> 0/0
> >>>>>>> [1] NaN
> >>>>>>>
> >>>>>>> so maybe something like that occurs in the
course of your
> >> calculations.
> >>>>>>> But that's just a guess, so feel free to
disregard.
> >>>>>>>
> >>>>>>>
> >>>>>>> Bert Gunter
> >>>>>>>
> >>>>>>> "The trouble with having an open mind is
that people keep coming
> >>>>>>> along and sticking things into it."
> >>>>>>> -- Opus (aka Berkeley Breathed in his
"Bloom County" comic strip )
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, Apr 2, 2019 at 11:32 AM Eric
Bridgeford
> >>>>>>> <ericwb95 at gmail.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Hi R core team,
> >>>>>>>>
> >>>>>>>> I experienced the following issue with the
attached data/code
> >>>>>>>> snippet, where the studentized residual
for a single observation
> >>>>>>>> appears to be NaN given finite
predictors/responses, which appears
> >>>>>>>> to be driven by the glm.influence method
in the stats package. I
> >>>>>>>> am curious to whether this is a
consequence of the specific
> >>>>>>>> implementation used for computing the
influence, which it would
> >>>>>>>> appear is the driving force for the NaN
influence for the point,
> >>>>>>>> that I was ultimately able to trace back
through the lm.influence
> >>>>>>>> method to this specific line <
> >>>>>>>> https://github.com/SurajGupta/r-
> >>> source/blob/a28e609e72ed7c47f6ddfb
> >>>>>>>>
b86c85279a0750f0b7/src/library/stats/R/lm.influence.R#L67
> >>>>>>>>>
> >>>>>>>> which
> >>>>>>>> calls C code which calls iminfl.f
> >>>>>>>> <
> >>>>>>>>
https://github.com/SurajGupta/r-source/blob/master/src/library/sta
> >>>>>>>> ts/src/lminfl.f
> >>>>>>>>>
> >>>>>>>> (I
> >>>>>>>> don't know fortran so I can't
debug further). My understanding is
> >>>>>>>> that the specific issue would have to do
with the leave-one-out
> >>>>>>>> variance estimate associated with this
particular point, which it
> >>>>>>>> seems based on my understanding should be
finite given finite
> >>>>>>>> predictors/responses. Let me know. Thanks!
> >>>>>>>>
> >>>>>>>> Sincerely,
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Eric Bridgeford
> >>>>>>>> ericwb.me
> >>>>>>>>
______________________________________________
> >>>>>>>> R-help at r-project.org mailing list -- To
UNSUBSCRIBE and more, see
> >>>>>>>>
https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>>>>> PLEASE do read the posting guide
> >>>>>>>>
http://www.R-project.org/posting-guide.html
> >>>>>>>> and provide commented, minimal,
self-contained, reproducible code.
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Eric Bridgeford
> >>>>>> ericwb.me
> >>>>>>
> >>>>>
> >>>
> >>> --
> >>> Eric Bridgeford
> >>> ericwb.me
> >>>
> >>>      [[alternative HTML version deleted]]
> >>>
> >>> ______________________________________________
> >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
http://www.R-project.org/posting-
> >>> guide.html
> >>> and provide commented, minimal, self-contained, reproducible
code.
> >>
> >
> >
> > --
> > Eric Bridgeford
> > ericwb.me
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

R help - Apr 2019 - Potential Issue with lm.influence

[R] Fwd: Potential Issue with lm.influence

[R] Fwd: Potential Issue with lm.influence

[R] Fwd: Potential Issue with lm.influence

[R] Fwd: Potential Issue with lm.influence

[R] Fwd: Potential Issue with lm.influence

[R] Fwd: Potential Issue with lm.influence

[R] Potential Issue with lm.influence

[R] Potential Issue with lm.influence

[R] Potential Issue with lm.influence