Hello, I am experiencing odd behavior with the subset parameter for glm. It appears that the parameter uses non-standard evaluation, but only in some cases. Below is a reproducible example. library(survey) # for example dataset data(api) stype <- "E" (a <- glm(api00~ell+meals+mobility, data = apistrat, subset = apistrat$stype == stype)) (b <- glm(api00~ell+meals+mobility, data = apistrat, subset = apistrat$stype == "E")) # should be equal since stype = "E" but they aren't coef(a)==coef(b) # for some reason works as expected here i = 4 (c <- glm(mpg ~ wt, data = mtcars, subset = mtcars$cyl==i)) (d <- glm(mpg ~ wt, data = mtcars, subset = mtcars$cyl==4)) coef(c)==coef(d) I can't really explain what is happening so I would appreciate help. Kind Regards, Carl Ganz
When you use the data= argument in glm(), the function looks in the data.frame for a variable first. You have created two versions of stype, one in the data.frame and one outside it. So your first glm() selects all the cases apistrat since apistrat$stype always equals apistrat$stype. You can see this with (b <- glm(api00~ell+meals+mobility, data = apistrat, subset = stype == "E")) gives the same results as (a <- glm(api00~ell+meals+mobility, data = apistrat, subset = apistrat$stype == "E")) If you want to use a variable outside the data frame, give it another name, e.g.: styp <- "E" (a <- glm(api00~ell+meals+mobility, data = apistrat, subset = stype == styp)) ------------------------------------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -----Original Message----- From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Ganz, Carl Sent: Tuesday, March 21, 2017 10:50 AM To: r-help at r-project.org Subject: [R] Issue with subset in glm Hello, I am experiencing odd behavior with the subset parameter for glm. It appears that the parameter uses non-standard evaluation, but only in some cases. Below is a reproducible example. library(survey) # for example dataset data(api) stype <- "E" (a <- glm(api00~ell+meals+mobility, data = apistrat, subset = apistrat$stype == stype)) (b <- glm(api00~ell+meals+mobility, data = apistrat, subset = apistrat$stype == "E")) # should be equal since stype = "E" but they aren't coef(a)==coef(b) # for some reason works as expected here i = 4 (c <- glm(mpg ~ wt, data = mtcars, subset = mtcars$cyl==i)) (d <- glm(mpg ~ wt, data = mtcars, subset = mtcars$cyl==4)) coef(c)==coef(d) I can't really explain what is happening so I would appreciate help. Kind Regards, Carl Ganz ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
The subset argument is evaluated in "data" first, then in the caller's environment, etc. So: 1) In your first example, stype is a *vector*, and the subset expression is identically TRUE, hence is equivalent to making the call without the subset argument. 2) The second call fits the subset with stype = "E", hence is different. 3) "i" is not found in mtcars, hence is looked for in the caller, where it has the value 4, giving the same subset and result as in the next call. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Mar 21, 2017 at 8:50 AM, Ganz, Carl <carlganz at ucla.edu> wrote:> Hello, > > I am experiencing odd behavior with the subset parameter for glm. It appears that the parameter uses non-standard evaluation, but only in some cases. Below is a reproducible example. > > library(survey) # for example dataset > > data(api) > stype <- "E" > (a <- glm(api00~ell+meals+mobility, data = apistrat, > subset = apistrat$stype == stype)) > (b <- glm(api00~ell+meals+mobility, data = apistrat, > subset = apistrat$stype == "E")) > # should be equal since stype = "E" but they aren't > coef(a)==coef(b) > > # for some reason works as expected here > i = 4 > (c <- glm(mpg ~ wt, data = mtcars, subset = mtcars$cyl==i)) > (d <- glm(mpg ~ wt, data = mtcars, subset = mtcars$cyl==4)) > coef(c)==coef(d) > > I can't really explain what is happening so I would appreciate help. > > Kind Regards, > Carl Ganz > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.