Hello,
I am experiencing odd behavior with the subset parameter for glm. It appears
that the parameter uses non-standard evaluation, but only in some cases. Below
is a reproducible example.
library(survey) # for example dataset
data(api)
stype <- "E"
(a <- glm(api00~ell+meals+mobility, data = apistrat,
subset = apistrat$stype == stype))
(b <- glm(api00~ell+meals+mobility, data = apistrat,
subset = apistrat$stype == "E"))
# should be equal since stype = "E" but they aren't
coef(a)==coef(b)
# for some reason works as expected here
i = 4
(c <- glm(mpg ~ wt, data = mtcars, subset = mtcars$cyl==i))
(d <- glm(mpg ~ wt, data = mtcars, subset = mtcars$cyl==4))
coef(c)==coef(d)
I can't really explain what is happening so I would appreciate help.
Kind Regards,
Carl Ganz
When you use the data= argument in glm(), the function looks in the data.frame
for a variable first. You have created two versions of stype, one in the
data.frame and one outside it. So your first glm() selects all the cases
apistrat since apistrat$stype always equals apistrat$stype. You can see this
with
(b <- glm(api00~ell+meals+mobility, data = apistrat,
subset = stype == "E"))
gives the same results as
(a <- glm(api00~ell+meals+mobility, data = apistrat,
subset = apistrat$stype == "E"))
If you want to use a variable outside the data frame, give it another name,
e.g.:
styp <- "E"
(a <- glm(api00~ell+meals+mobility, data = apistrat,
subset = stype == styp))
-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352
-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Ganz, Carl
Sent: Tuesday, March 21, 2017 10:50 AM
To: r-help at r-project.org
Subject: [R] Issue with subset in glm
Hello,
I am experiencing odd behavior with the subset parameter for glm. It appears
that the parameter uses non-standard evaluation, but only in some cases. Below
is a reproducible example.
library(survey) # for example dataset
data(api)
stype <- "E"
(a <- glm(api00~ell+meals+mobility, data = apistrat,
subset = apistrat$stype == stype))
(b <- glm(api00~ell+meals+mobility, data = apistrat,
subset = apistrat$stype == "E"))
# should be equal since stype = "E" but they aren't
coef(a)==coef(b)
# for some reason works as expected here
i = 4
(c <- glm(mpg ~ wt, data = mtcars, subset = mtcars$cyl==i))
(d <- glm(mpg ~ wt, data = mtcars, subset = mtcars$cyl==4))
coef(c)==coef(d)
I can't really explain what is happening so I would appreciate help.
Kind Regards,
Carl Ganz
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
The subset argument is evaluated in "data" first, then in the caller's environment, etc. So: 1) In your first example, stype is a *vector*, and the subset expression is identically TRUE, hence is equivalent to making the call without the subset argument. 2) The second call fits the subset with stype = "E", hence is different. 3) "i" is not found in mtcars, hence is looked for in the caller, where it has the value 4, giving the same subset and result as in the next call. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Mar 21, 2017 at 8:50 AM, Ganz, Carl <carlganz at ucla.edu> wrote:> Hello, > > I am experiencing odd behavior with the subset parameter for glm. It appears that the parameter uses non-standard evaluation, but only in some cases. Below is a reproducible example. > > library(survey) # for example dataset > > data(api) > stype <- "E" > (a <- glm(api00~ell+meals+mobility, data = apistrat, > subset = apistrat$stype == stype)) > (b <- glm(api00~ell+meals+mobility, data = apistrat, > subset = apistrat$stype == "E")) > # should be equal since stype = "E" but they aren't > coef(a)==coef(b) > > # for some reason works as expected here > i = 4 > (c <- glm(mpg ~ wt, data = mtcars, subset = mtcars$cyl==i)) > (d <- glm(mpg ~ wt, data = mtcars, subset = mtcars$cyl==4)) > coef(c)==coef(d) > > I can't really explain what is happening so I would appreciate help. > > Kind Regards, > Carl Ganz > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.