thr3ads.net - R help - [R] A Tip: lm, glm, and retained cases [Aug 2008]

If this information is useful, please help other people find it:
Share via:

(Ted Harding)

2008-Aug-26 23:45 UTC

[R] A Tip: lm, glm, and retained cases

Hi Folks,
This tip is probably lurking somewhere already, but I've just
discovered it the hard way, so it is probably worth passing
on for the benefit of those who might otherwise hack their
way along the same path.

Say (for example) you want to do a logistic regression of a
binary response Y on variables X1, X2, X3, X4:

  GLM <- glm(Y ~ X1 + X2 + X3 + X4)

Say there are 1000 cases in the data. Because of missing values
(NAs) in the variables, the number of complete cases retained
for the regression is, say, 600. glm() does this automatically.

QUESTION: Which cases are they?

You can of course find out "by hand" on the lines of

  ix <- which( (!is.na(Y))&(!is.na(X1))&...&(!is.na(X4)) )

but one feels that GLM already knows -- so how to get it to talk?

ANSWER: (e.g.)

  ix <- as.integer(names(GLM$fit))

Reason: When glm(Y~X1+...) picks up the data passed to it, it
assigns[*] to each element of Y a name which is its integer
position in the variable, expressed as a character string
("1", "2", "3", ... ).
[*] Assuming (as is usually the case) that the elements didn't
have names in the first place. Otherwise these names are used;
modify the above approach accordingly.

These names are retained during the computation, and when incomplete
cases are dropped the retained complete cases retain their original
names. Thus, any per-case series of computed values (such as $fit)
has the names of the retained cases the values correspond to. These
can be discovered by

  names(GLM$fit)

but you don't want them as character strings, so convert them
to integers:

  as.integer(names(GLM$fit))

Done! I hope this helps some people.
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 27-Aug-08                                       Time: 00:45:47
------------------------------ XFMail ------------------------------

hadley wickham

2008-Aug-26 23:49 UTC

head link

[R] A Tip: lm, glm, and retained cases

On Tue, Aug 26, 2008 at 6:45 PM, Ted Harding
<Ted.Harding at manchester.ac.uk> wrote:> Hi Folks,
> This tip is probably lurking somewhere already, but I've just
> discovered it the hard way, so it is probably worth passing
> on for the benefit of those who might otherwise hack their
> way along the same path.
>
> Say (for example) you want to do a logistic regression of a
> binary response Y on variables X1, X2, X3, X4:
>
>  GLM <- glm(Y ~ X1 + X2 + X3 + X4)
>
> Say there are 1000 cases in the data. Because of missing values
> (NAs) in the variables, the number of complete cases retained
> for the regression is, say, 600. glm() does this automatically.
>
> QUESTION: Which cases are they?
>
> You can of course find out "by hand" on the lines of
>
>  ix <- which( (!is.na(Y))&(!is.na(X1))&...&(!is.na(X4)) )
>
> but one feels that GLM already knows -- so how to get it to talk?
>
> ANSWER: (e.g.)
>
>  ix <- as.integer(names(GLM$fit))
Alternatively, you can use:

attr(GLM$model, "na.action")

Hadley

-- 
http://had.co.nz/

Apparently Analagous Threads

Search for more maybe matching threads

R help - Aug 2008 - A Tip: lm, glm, and retained cases

[R] A Tip: lm, glm, and retained cases

[R] A Tip: lm, glm, and retained cases

Apparently Analagous Threads