thr3ads.net - R help - [R] Fwd: Potential Issue with lm.influence [Apr 2019]

If this information is useful, please help other people find it:
Share via:

Bert Gunter

2019-Apr-02 20:38 UTC

[R] Fwd: Potential Issue with lm.influence

Also, I suggest you read ?influence which may explain the source of your
NaN's .

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Apr 2, 2019 at 1:29 PM Bert Gunter <bgunter.4567 at gmail.com>
wrote:
> I told you already: **Include code inline **
>
> See ?dput for how to include a text version of objects, such as data
> frames, inline.
>
> Otherwise, I believe .txt text files are not stripped if you insist on
> *attaching* data or code. Others may have better advice.
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip
)
>
>
> On Tue, Apr 2, 2019 at 1:21 PM Eric Bridgeford <ericwb95 at
gmail.com> wrote:
>
>> How can I add attachments? The following two files were attached in the
>> initial message
>>
>> On Tue, Apr 2, 2019 at 3:34 PM Bert Gunter <bgunter.4567 at
gmail.com>
>> wrote:
>>
>>> Nothing was attached. The r-help server strips most attachments.
Include
>>> your code inline.
>>>
>>> Also note that
>>>
>>> > 0/0
>>> [1] NaN
>>>
>>> so maybe something like that occurs in the course of your
calculations.
>>> But that's just a guess, so feel free to disregard.
>>>
>>>
>>> Bert Gunter
>>>
>>> "The trouble with having an open mind is that people keep
coming along
>>> and sticking things into it."
>>> -- Opus (aka Berkeley Breathed in his "Bloom County"
comic strip )
>>>
>>>
>>> On Tue, Apr 2, 2019 at 11:32 AM Eric Bridgeford <ericwb95 at
gmail.com>
>>> wrote:
>>>
>>>> Hi R core team,
>>>>
>>>> I experienced the following issue with the attached data/code
snippet,
>>>> where the studentized residual for a single observation appears
to be
>>>> NaN
>>>> given finite predictors/responses, which appears to be driven
by the
>>>> glm.influence method in the stats package. I am curious to
whether this
>>>> is
>>>> a consequence of the specific implementation used for computing
the
>>>> influence, which it would appear is the driving force for the
NaN
>>>> influence
>>>> for the point, that I was ultimately able to trace back through
the
>>>> lm.influence method to this specific line
>>>> <
>>>>
https://github.com/SurajGupta/r-source/blob/a28e609e72ed7c47f6ddfbb86c85279a0750f0b7/src/library/stats/R/lm.influence.R#L67
>>>> >
>>>> which
>>>> calls C code which calls iminfl.f
>>>> <
>>>>
https://github.com/SurajGupta/r-source/blob/master/src/library/stats/src/lminfl.f
>>>> >
>>>> (I
>>>> don't know fortran so I can't debug further). My
understanding is that
>>>> the
>>>> specific issue would have to do with the leave-one-out variance
estimate
>>>> associated with this particular point, which it seems based on
my
>>>> understanding should be finite given finite
predictors/responses. Let me
>>>> know. Thanks!
>>>>
>>>> Sincerely,
>>>>
>>>> --
>>>> Eric Bridgeford
>>>> ericwb.me
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible
code.
>>>>
>>>
>>
>> --
>> Eric Bridgeford
>> ericwb.me
>>
>
	[[alternative HTML version deleted]]

Eric Bridgeford

2019-Apr-02 21:00 UTC

head link

[R] Fwd: Potential Issue with lm.influence

I agree the influence documentation suggests NaNs may result; however, as
these can be manually computed and are, indeed, finite/existing (ie,
computing the held-out influence by manually training n models for n points
to obtain n leave one out influence measures), I don't possibly see how the
function SHOULD return NaN, and given that it is returning NaN, that
suggests to me that there should be either a) Providing an alternative
method to compute them that (may be slower) that returns the correct
results in the even that lm.influence does not return a good approximation
(ie, a command line argument for type="approx" that does the
approximation
strategy employed currently, or an alternative type="direct" or
something
like that that computes them manually), or b) a heuristic to suggest why
NaNs might result from one's particular inputs/what can be done to fix it
(if the approximation strategy is the source of the problem) or what the
issue is with the data that will cause NaNs. Hence I was looking to start a
discussion around the specific strategy employed to compute the elements.

Below is the code:
moon_data <- structure(list(Name = structure(c(8L, 13L, 2L, 7L, 1L, 5L,
11L,
                                               12L, 9L, 10L, 4L, 6L, 3L),
.Label = c("Ceres ", "Earth", "Eris ",

         "Haumea ", "Jupiter ", "Makemake ",
"Mars ", "Mercury ", "Neptune
",

         "Pluto ", "Saturn ", "Uranus ",
"Venus "), class = "factor"),
                            Distance = c(0.39, 0.72, 1, 1.52, 2.75, 5.2,
9.54, 19.22,
                                         30.06, 39.5, 43.35, 45.8, 67.7),
Diameter = c(0.382, 0.949,

           1, 0.532, 0.08, 11.209, 9.449, 4.007, 3.883, 0.18, 0.15,

           0.12, 0.19), Mass = c(0.06, 0.82, 1, 0.11, 2e-04, 317.8,

                                 95.2, 14.6, 17.2, 0.0022, 7e-04, 7e-04,
0.0025), Moons = c(0L,


                0L, 1L, 2L, 0L, 64L, 62L, 27L, 13L, 4L, 2L, 0L, 1L), Volume
= c(0.0291869497930152,



    0.447504348276571, 0.523598775598299, 0.0788376225681443,



    0.000268082573106329, 737.393372232996, 441.729261571372,



    33.6865588825666, 30.6549628355953, 0.00305362805928928,



    0.00176714586764426, 0.00090477868423386, 0.00359136400182873


                )), row.names = c(NA, -13L), class = "data.frame")

fit <- glm.nb(Moons ~ Volume, data = moon_data)
rstudent(fit)

fit2 <- update(fit, subset = Name != "Jupiter ")
rstudent(fit2)

influence(fit2)$sigma

#        1        2        3        4        5        7        8        9
     10       11       12       13
# 1.077945 1.077813 1.165025 1.181685 1.077954      NaN 1.044454 1.152110
1.187586 1.181696 1.077954 1.165147

Sincerely,
Eric

On Tue, Apr 2, 2019 at 4:38 PM Bert Gunter <bgunter.4567 at gmail.com>
wrote:
> Also, I suggest you read ?influence which may explain the source of your
> NaN's .
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip
)
>
>
> On Tue, Apr 2, 2019 at 1:29 PM Bert Gunter <bgunter.4567 at
gmail.com> wrote:
>
>> I told you already: **Include code inline **
>>
>> See ?dput for how to include a text version of objects, such as data
>> frames, inline.
>>
>> Otherwise, I believe .txt text files are not stripped if you insist on
>> *attaching* data or code. Others may have better advice.
>>
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming
along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic
strip )
>>
>>
>> On Tue, Apr 2, 2019 at 1:21 PM Eric Bridgeford <ericwb95 at
gmail.com>
>> wrote:
>>
>>> How can I add attachments? The following two files were attached in
the
>>> initial message
>>>
>>> On Tue, Apr 2, 2019 at 3:34 PM Bert Gunter <bgunter.4567 at
gmail.com>
>>> wrote:
>>>
>>>> Nothing was attached. The r-help server strips most
attachments.
>>>> Include your code inline.
>>>>
>>>> Also note that
>>>>
>>>> > 0/0
>>>> [1] NaN
>>>>
>>>> so maybe something like that occurs in the course of your
calculations.
>>>> But that's just a guess, so feel free to disregard.
>>>>
>>>>
>>>> Bert Gunter
>>>>
>>>> "The trouble with having an open mind is that people keep
coming along
>>>> and sticking things into it."
>>>> -- Opus (aka Berkeley Breathed in his "Bloom County"
comic strip )
>>>>
>>>>
>>>> On Tue, Apr 2, 2019 at 11:32 AM Eric Bridgeford <ericwb95 at
gmail.com>
>>>> wrote:
>>>>
>>>>> Hi R core team,
>>>>>
>>>>> I experienced the following issue with the attached
data/code snippet,
>>>>> where the studentized residual for a single observation
appears to be
>>>>> NaN
>>>>> given finite predictors/responses, which appears to be
driven by the
>>>>> glm.influence method in the stats package. I am curious to
whether
>>>>> this is
>>>>> a consequence of the specific implementation used for
computing the
>>>>> influence, which it would appear is the driving force for
the NaN
>>>>> influence
>>>>> for the point, that I was ultimately able to trace back
through the
>>>>> lm.influence method to this specific line
>>>>> <
>>>>>
https://github.com/SurajGupta/r-source/blob/a28e609e72ed7c47f6ddfbb86c85279a0750f0b7/src/library/stats/R/lm.influence.R#L67
>>>>> >
>>>>> which
>>>>> calls C code which calls iminfl.f
>>>>> <
>>>>>
https://github.com/SurajGupta/r-source/blob/master/src/library/stats/src/lminfl.f
>>>>> >
>>>>> (I
>>>>> don't know fortran so I can't debug further). My
understanding is that
>>>>> the
>>>>> specific issue would have to do with the leave-one-out
variance
>>>>> estimate
>>>>> associated with this particular point, which it seems based
on my
>>>>> understanding should be finite given finite
predictors/responses. Let
>>>>> me
>>>>> know. Thanks!
>>>>>
>>>>> Sincerely,
>>>>>
>>>>> --
>>>>> Eric Bridgeford
>>>>> ericwb.me
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained,
reproducible code.
>>>>>
>>>>
>>>
>>> --
>>> Eric Bridgeford
>>> ericwb.me
>>>
>>
-- 
Eric Bridgeford
ericwb.me

	[[alternative HTML version deleted]]

Jim Lemon

2019-Apr-02 22:36 UTC

head link

[R] Fwd: Potential Issue with lm.influence

Hi Eric,
When I run your code (using the MASS library) I find that
rstudent(fit2) also returns NaN in the seventh position. Perhaps the
problem is occurring there and not in the "influence" function.

Jim

On Wed, Apr 3, 2019 at 9:12 AM Eric Bridgeford <ericwb95 at gmail.com>
wrote:>
> I agree the influence documentation suggests NaNs may result; however, as
> these can be manually computed and are, indeed, finite/existing (ie,
> computing the held-out influence by manually training n models for n points
> to obtain n leave one out influence measures), I don't possibly see how
the
> function SHOULD return NaN, and given that it is returning NaN, that
> suggests to me that there should be either a) Providing an alternative
> method to compute them that (may be slower) that returns the correct
> results in the even that lm.influence does not return a good approximation
> (ie, a command line argument for type="approx" that does the
approximation
> strategy employed currently, or an alternative type="direct" or
something
> like that that computes them manually), or b) a heuristic to suggest why
> NaNs might result from one's particular inputs/what can be done to fix
it
> (if the approximation strategy is the source of the problem) or what the
> issue is with the data that will cause NaNs. Hence I was looking to start a
> discussion around the specific strategy employed to compute the elements.
>
> Below is the code:
> moon_data <- structure(list(Name = structure(c(8L, 13L, 2L, 7L, 1L, 5L,
> 11L,
>                                                12L, 9L, 10L, 4L, 6L, 3L),
> .Label = c("Ceres ", "Earth", "Eris ",
>
>          "Haumea ", "Jupiter ", "Makemake ",
"Mars ", "Mercury ", "Neptune
> ",
>
>          "Pluto ", "Saturn ", "Uranus ",
"Venus "), class = "factor"),
>                             Distance = c(0.39, 0.72, 1, 1.52, 2.75, 5.2,
> 9.54, 19.22,
>                                          30.06, 39.5, 43.35, 45.8, 67.7),
> Diameter = c(0.382, 0.949,
>
>            1, 0.532, 0.08, 11.209, 9.449, 4.007, 3.883, 0.18, 0.15,
>
>            0.12, 0.19), Mass = c(0.06, 0.82, 1, 0.11, 2e-04, 317.8,
>
>                                  95.2, 14.6, 17.2, 0.0022, 7e-04, 7e-04,
> 0.0025), Moons = c(0L,
>
>
>                 0L, 1L, 2L, 0L, 64L, 62L, 27L, 13L, 4L, 2L, 0L, 1L), Volume
> = c(0.0291869497930152,
>
>
>
>     0.447504348276571, 0.523598775598299, 0.0788376225681443,
>
>
>
>     0.000268082573106329, 737.393372232996, 441.729261571372,
>
>
>
>     33.6865588825666, 30.6549628355953, 0.00305362805928928,
>
>
>
>     0.00176714586764426, 0.00090477868423386, 0.00359136400182873
>
>
>                 )), row.names = c(NA, -13L), class =
"data.frame")
>
> fit <- glm.nb(Moons ~ Volume, data = moon_data)
> rstudent(fit)
>
> fit2 <- update(fit, subset = Name != "Jupiter ")
> rstudent(fit2)
>
> influence(fit2)$sigma
>
> #        1        2        3        4        5        7        8        9
>      10       11       12       13
> # 1.077945 1.077813 1.165025 1.181685 1.077954      NaN 1.044454 1.152110
> 1.187586 1.181696 1.077954 1.165147
>
> Sincerely,
> Eric
>

R help - Apr 2019 - Fwd: Potential Issue with lm.influence

[R] Fwd: Potential Issue with lm.influence

[R] Fwd: Potential Issue with lm.influence

[R] Fwd: Potential Issue with lm.influence