Jens Heumann
2019-Apr-30 15:24 UTC
[R] Passing formula as parameter to `lm` within `sapply` causes error [BUG?]
Hi, `lm` won't take formula as a parameter when it is within a `sapply`; see example below. Please, could anyone either point me to a syntax error or confirm that this might be a bug? Best, Jens [Disclaimer: This is my first post here, following advice of how to proceed with possible bugs from here: https://www.r-project.org/bugs.html] SUMMARY While `lm` alone accepts formula parameter `FO` well, the same within a `sapply` causes an error. When putting everything as parameter but formula `FO`, it's still working, though. All parameters work fine within a similar `for` loop. MCVE (see data / R-version at bottom) > summary(lm(y ~ x, df1, df1[["z"]] == 1, df1[["w"]]))$coef[1, ] Estimate Std. Error t value Pr(>|t|) 1.6269038 0.9042738 1.7991275 0.3229600 > summary(lm(FO, data, data[[st]] == st1, data[[ws]]))$coef[1, ] Estimate Std. Error t value Pr(>|t|) 1.6269038 0.9042738 1.7991275 0.3229600 > sapply(unique(df1$z), function(s) + summary(lm(y ~ x, df1, df1[["z"]] == s, df1[[ws]]))$coef[1, ]) [,1] [,2] [,3] Estimate 1.6269038 -0.1404174 -0.010338774 Std. Error 0.9042738 0.4577001 1.858138516 t value 1.7991275 -0.3067890 -0.005564049 Pr(>|t|) 0.3229600 0.8104951 0.996457853 > sapply(unique(data[[st]]), function(s) + summary(lm(FO, data, data[[st]] == s, data[[ws]]))$coef[1, ]) # !!! Error in eval(substitute(subset), data, env) : object 's' not found > sapply(unique(data[[st]]), function(s) + summary(lm(y ~ x, data, data[[st]] == s, data[[ws]]))$coef[1, ]) [,1] [,2] [,3] Estimate 1.6269038 -0.1404174 -0.010338774 Std. Error 0.9042738 0.4577001 1.858138516 t value 1.7991275 -0.3067890 -0.005564049 Pr(>|t|) 0.3229600 0.8104951 0.996457853 > m <- matrix(NA, 4, length(unique(data[[st]]))) > for (s in unique(data[[st]])) { + m[, s] <- summary(lm(FO, data, data[[st]] == s, data[[ws]]))$coef[1, ] + } > m [,1] [,2] [,3] [1,] 1.6269038 -0.1404174 -0.010338774 [2,] 0.9042738 0.4577001 1.858138516 [3,] 1.7991275 -0.3067890 -0.005564049 [4,] 0.3229600 0.8104951 0.996457853 # DATA ################################################################# df1 <- structure(list(x = c(1.37095844714667, -0.564698171396089, 0.363128411337339, 0.63286260496104, 0.404268323140999, -0.106124516091484, 1.51152199743894, -0.0946590384130976, 2.01842371387704), y = c(1.30824434809425, 0.740171482827397, 2.64977380403845, -0.755998096151299, 0.125479556323628, -0.239445852485142, 2.14747239550901, -0.37891195982917, -0.638031707027734 ), z = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), w = c(0.7, 0.8, 1.2, 0.9, 1.3, 1.2, 0.8, 1, 1)), class = "data.frame", row.names = c(NA, -9L)) FO <- y ~ x; data <- df1; st <- "z"; ws <- "w"; st1 <- 1 ######################################################################## > R.version _ platform x86_64-w64-mingw32 arch x86_64 os mingw32 system x86_64, mingw32 status major 3 minor 6.0 year 2019 month 04 day 26 svn rev 76424 language R version.string R version 3.6.0 (2019-04-26) nickname Planting of a Tree ######################################################################### NOTE: Question on SO two days ago (https://stackoverflow.com/questions/55893189/passing-formula-as-parameter-to-lm-within-sapply-causes-error-bug-confirmation) brought many views but neither answer nor bug confirmation.
David Winsemius
2019-Apr-30 21:03 UTC
[R] Passing formula as parameter to `lm` within `sapply` causes error [BUG?]
Try using do.call ? David Sent from my iPhone> On Apr 30, 2019, at 9:24 AM, Jens Heumann <jens.heumann at students.unibe.ch> wrote: > > Hi, > > `lm` won't take formula as a parameter when it is within a `sapply`; see example below. Please, could anyone either point me to a syntax error or confirm that this might be a bug? > > Best, > Jens > > [Disclaimer: This is my first post here, following advice of how to proceed with possible bugs from here: https://www.r-project.org/bugs.html] > > > SUMMARY > > While `lm` alone accepts formula parameter `FO` well, the same within a `sapply` causes an error. When putting everything as parameter but formula `FO`, it's still working, though. All parameters work fine within a similar `for` loop. > > > MCVE (see data / R-version at bottom) > > > summary(lm(y ~ x, df1, df1[["z"]] == 1, df1[["w"]]))$coef[1, ] > Estimate Std. Error t value Pr(>|t|) > 1.6269038 0.9042738 1.7991275 0.3229600 > > summary(lm(FO, data, data[[st]] == st1, data[[ws]]))$coef[1, ] > Estimate Std. Error t value Pr(>|t|) > 1.6269038 0.9042738 1.7991275 0.3229600 > > sapply(unique(df1$z), function(s) > + summary(lm(y ~ x, df1, df1[["z"]] == s, df1[[ws]]))$coef[1, ]) > [,1] [,2] [,3] > Estimate 1.6269038 -0.1404174 -0.010338774 > Std. Error 0.9042738 0.4577001 1.858138516 > t value 1.7991275 -0.3067890 -0.005564049 > Pr(>|t|) 0.3229600 0.8104951 0.996457853 > > sapply(unique(data[[st]]), function(s) > + summary(lm(FO, data, data[[st]] == s, data[[ws]]))$coef[1, ]) # !!! > Error in eval(substitute(subset), data, env) : object 's' not found > > sapply(unique(data[[st]]), function(s) > + summary(lm(y ~ x, data, data[[st]] == s, data[[ws]]))$coef[1, ]) > [,1] [,2] [,3] > Estimate 1.6269038 -0.1404174 -0.010338774 > Std. Error 0.9042738 0.4577001 1.858138516 > t value 1.7991275 -0.3067890 -0.005564049 > Pr(>|t|) 0.3229600 0.8104951 0.996457853 > > m <- matrix(NA, 4, length(unique(data[[st]]))) > > for (s in unique(data[[st]])) { > + m[, s] <- summary(lm(FO, data, data[[st]] == s, data[[ws]]))$coef[1, ] > + } > > m > [,1] [,2] [,3] > [1,] 1.6269038 -0.1404174 -0.010338774 > [2,] 0.9042738 0.4577001 1.858138516 > [3,] 1.7991275 -0.3067890 -0.005564049 > [4,] 0.3229600 0.8104951 0.996457853 > > # DATA ################################################################# > > df1 <- structure(list(x = c(1.37095844714667, -0.564698171396089, 0.363128411337339, > 0.63286260496104, 0.404268323140999, -0.106124516091484, 1.51152199743894, > -0.0946590384130976, 2.01842371387704), y = c(1.30824434809425, > 0.740171482827397, 2.64977380403845, -0.755998096151299, 0.125479556323628, > -0.239445852485142, 2.14747239550901, -0.37891195982917, -0.638031707027734 > ), z = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), w = c(0.7, 0.8, > 1.2, 0.9, 1.3, 1.2, 0.8, 1, 1)), class = "data.frame", row.names = c(NA, > -9L)) > > FO <- y ~ x; data <- df1; st <- "z"; ws <- "w"; st1 <- 1 > > ######################################################################## > > > R.version > _ > platform x86_64-w64-mingw32 > arch x86_64 > os mingw32 > system x86_64, mingw32 > status > major 3 > minor 6.0 > year 2019 > month 04 > day 26 > svn rev 76424 > language R > version.string R version 3.6.0 (2019-04-26) > nickname Planting of a Tree > > ######################################################################### > > NOTE: Question on SO two days ago (https://stackoverflow.com/questions/55893189/passing-formula-as-parameter-to-lm-within-sapply-causes-error-bug-confirmation) brought many views but neither answer nor bug confirmation. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Duncan Murdoch
2019-Apr-30 22:32 UTC
[R] Passing formula as parameter to `lm` within `sapply` causes error [BUG?]
On 30/04/2019 11:24 a.m., Jens Heumann wrote:> Hi, > > `lm` won't take formula as a parameter when it is within a `sapply`; see > example below. Please, could anyone either point me to a syntax error or > confirm that this might be a bug? >I haven't looked carefully at your example. From a quick glance, however, I'd suspect that the issue is with the formula. Formulas have attached environments, where they look up variables in them that aren't in the data argument to lm(). In your code it's not obvious to me what environment would be attached, but I suspect it's the caller of sapply, not the environment that sapply creates for a particular value of its argument. I think this because of a rule that is supposed to be followed in R: Formulas get the environment where they were created attached to them. That would be your global environment. R is flexible, so functions don't have to follow this rule, but it causes lots of confusion when they don't. Duncan Murdoch> Best, > Jens > > [Disclaimer: This is my first post here, following advice of how to > proceed with possible bugs from here: https://www.r-project.org/bugs.html] > > > SUMMARY > > While `lm` alone accepts formula parameter `FO` well, the same within a > `sapply` causes an error. When putting everything as parameter but > formula `FO`, it's still working, though. All parameters work fine > within a similar `for` loop. > > > MCVE (see data / R-version at bottom) > > > summary(lm(y ~ x, df1, df1[["z"]] == 1, df1[["w"]]))$coef[1, ] > Estimate Std. Error t value Pr(>|t|) > 1.6269038 0.9042738 1.7991275 0.3229600 > > summary(lm(FO, data, data[[st]] == st1, data[[ws]]))$coef[1, ] > Estimate Std. Error t value Pr(>|t|) > 1.6269038 0.9042738 1.7991275 0.3229600 > > sapply(unique(df1$z), function(s) > + summary(lm(y ~ x, df1, df1[["z"]] == s, df1[[ws]]))$coef[1, ]) > [,1] [,2] [,3] > Estimate 1.6269038 -0.1404174 -0.010338774 > Std. Error 0.9042738 0.4577001 1.858138516 > t value 1.7991275 -0.3067890 -0.005564049 > Pr(>|t|) 0.3229600 0.8104951 0.996457853 > > sapply(unique(data[[st]]), function(s) > + summary(lm(FO, data, data[[st]] == s, data[[ws]]))$coef[1, ]) # !!! > Error in eval(substitute(subset), data, env) : object 's' not found > > sapply(unique(data[[st]]), function(s) > + summary(lm(y ~ x, data, data[[st]] == s, data[[ws]]))$coef[1, ]) > [,1] [,2] [,3] > Estimate 1.6269038 -0.1404174 -0.010338774 > Std. Error 0.9042738 0.4577001 1.858138516 > t value 1.7991275 -0.3067890 -0.005564049 > Pr(>|t|) 0.3229600 0.8104951 0.996457853 > > m <- matrix(NA, 4, length(unique(data[[st]]))) > > for (s in unique(data[[st]])) { > + m[, s] <- summary(lm(FO, data, data[[st]] == s, data[[ws]]))$coef[1, ] > + } > > m > [,1] [,2] [,3] > [1,] 1.6269038 -0.1404174 -0.010338774 > [2,] 0.9042738 0.4577001 1.858138516 > [3,] 1.7991275 -0.3067890 -0.005564049 > [4,] 0.3229600 0.8104951 0.996457853 > > # DATA ################################################################# > > df1 <- structure(list(x = c(1.37095844714667, -0.564698171396089, > 0.363128411337339, > 0.63286260496104, 0.404268323140999, -0.106124516091484, 1.51152199743894, > -0.0946590384130976, 2.01842371387704), y = c(1.30824434809425, > 0.740171482827397, 2.64977380403845, -0.755998096151299, 0.125479556323628, > -0.239445852485142, 2.14747239550901, -0.37891195982917, -0.638031707027734 > ), z = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), w = c(0.7, 0.8, > 1.2, 0.9, 1.3, 1.2, 0.8, 1, 1)), class = "data.frame", row.names = c(NA, > -9L)) > > FO <- y ~ x; data <- df1; st <- "z"; ws <- "w"; st1 <- 1 > > ######################################################################## > > > R.version > _ > platform x86_64-w64-mingw32 > arch x86_64 > os mingw32 > system x86_64, mingw32 > status > major 3 > minor 6.0 > year 2019 > month 04 > day 26 > svn rev 76424 > language R > version.string R version 3.6.0 (2019-04-26) > nickname Planting of a Tree > > ######################################################################### > > NOTE: Question on SO two days ago > (https://stackoverflow.com/questions/55893189/passing-formula-as-parameter-to-lm-within-sapply-causes-error-bug-confirmation) > brought many views but neither answer nor bug confirmation. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Jens Heumann
2019-May-01 05:14 UTC
[R] Passing formula as parameter to `lm` within `sapply` causes error [BUG?]
Thanks a lot for your hint, David. It finally worked doing: > sapply(unique(data[[st]]), function(s) + summary(do.call("lm", list(FO, data, data[[st]] == s, + data[[ws]])))$coef[1, ]) [,1] [,2] [,3] Estimate 1.6269038 -0.1404174 -0.010338774 Std. Error 0.9042738 0.4577001 1.858138516 t value 1.7991275 -0.3067890 -0.005564049 Pr(>|t|) 0.3229600 0.8104951 0.996457853 Best, Jens On 30.04.2019 23:03, David Winsemius wrote:> Try using do.call > > ? > David > > Sent from my iPhone > >> On Apr 30, 2019, at 9:24 AM, Jens Heumann <jens.heumann at students.unibe.ch> wrote: >> >> Hi, >> >> `lm` won't take formula as a parameter when it is within a `sapply`; see example below. Please, could anyone either point me to a syntax error or confirm that this might be a bug? >> >> Best, >> Jens >> >> [Disclaimer: This is my first post here, following advice of how to proceed with possible bugs from here: https://www.r-project.org/bugs.html] >> >> >> SUMMARY >> >> While `lm` alone accepts formula parameter `FO` well, the same within a `sapply` causes an error. When putting everything as parameter but formula `FO`, it's still working, though. All parameters work fine within a similar `for` loop. >> >> >> MCVE (see data / R-version at bottom) >> >>> summary(lm(y ~ x, df1, df1[["z"]] == 1, df1[["w"]]))$coef[1, ] >> Estimate Std. Error t value Pr(>|t|) >> 1.6269038 0.9042738 1.7991275 0.3229600 >>> summary(lm(FO, data, data[[st]] == st1, data[[ws]]))$coef[1, ] >> Estimate Std. Error t value Pr(>|t|) >> 1.6269038 0.9042738 1.7991275 0.3229600 >>> sapply(unique(df1$z), function(s) >> + summary(lm(y ~ x, df1, df1[["z"]] == s, df1[[ws]]))$coef[1, ]) >> [,1] [,2] [,3] >> Estimate 1.6269038 -0.1404174 -0.010338774 >> Std. Error 0.9042738 0.4577001 1.858138516 >> t value 1.7991275 -0.3067890 -0.005564049 >> Pr(>|t|) 0.3229600 0.8104951 0.996457853 >>> sapply(unique(data[[st]]), function(s) >> + summary(lm(FO, data, data[[st]] == s, data[[ws]]))$coef[1, ]) # !!! >> Error in eval(substitute(subset), data, env) : object 's' not found >>> sapply(unique(data[[st]]), function(s) >> + summary(lm(y ~ x, data, data[[st]] == s, data[[ws]]))$coef[1, ]) >> [,1] [,2] [,3] >> Estimate 1.6269038 -0.1404174 -0.010338774 >> Std. Error 0.9042738 0.4577001 1.858138516 >> t value 1.7991275 -0.3067890 -0.005564049 >> Pr(>|t|) 0.3229600 0.8104951 0.996457853 >>> m <- matrix(NA, 4, length(unique(data[[st]]))) >>> for (s in unique(data[[st]])) { >> + m[, s] <- summary(lm(FO, data, data[[st]] == s, data[[ws]]))$coef[1, ] >> + } >>> m >> [,1] [,2] [,3] >> [1,] 1.6269038 -0.1404174 -0.010338774 >> [2,] 0.9042738 0.4577001 1.858138516 >> [3,] 1.7991275 -0.3067890 -0.005564049 >> [4,] 0.3229600 0.8104951 0.996457853 >> >> # DATA ################################################################# >> >> df1 <- structure(list(x = c(1.37095844714667, -0.564698171396089, 0.363128411337339, >> 0.63286260496104, 0.404268323140999, -0.106124516091484, 1.51152199743894, >> -0.0946590384130976, 2.01842371387704), y = c(1.30824434809425, >> 0.740171482827397, 2.64977380403845, -0.755998096151299, 0.125479556323628, >> -0.239445852485142, 2.14747239550901, -0.37891195982917, -0.638031707027734 >> ), z = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), w = c(0.7, 0.8, >> 1.2, 0.9, 1.3, 1.2, 0.8, 1, 1)), class = "data.frame", row.names = c(NA, >> -9L)) >> >> FO <- y ~ x; data <- df1; st <- "z"; ws <- "w"; st1 <- 1 >> >> ######################################################################## >> >>> R.version >> _ >> platform x86_64-w64-mingw32 >> arch x86_64 >> os mingw32 >> system x86_64, mingw32 >> status >> major 3 >> minor 6.0 >> year 2019 >> month 04 >> day 26 >> svn rev 76424 >> language R >> version.string R version 3.6.0 (2019-04-26) >> nickname Planting of a Tree >> >> ######################################################################### >> >> NOTE: Question on SO two days ago (https://stackoverflow.com/questions/55893189/passing-formula-as-parameter-to-lm-within-sapply-causes-error-bug-confirmation) brought many views but neither answer nor bug confirmation. >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >