A pragmatic solution could be to create a simple linear regression example with variables in the global environment and then another example with a data.frame. The latter might be somewhat more complex, e.g., with several regressors and/or mixed categorical and numeric covariates to illustrate how regression and analysis of (co-)variance can be combined. I like to use MASS's whiteside data for this: data("whiteside", package = "MASS") m1 <- lm(Gas ~ Temp, data = whiteside) m2 <- lm(Gas ~ Insul + Temp, data = whiteside) m3 <- lm(Gas ~ Insul * Temp, data = whiteside) anova(m1, m2, m3) Moreover, some binary response data.frame with a few covariates might be a useful addition to "datasets". For example a more granular version of the "Titanic" data (in addition to the 4-way tabel ?Titanic). Or another relatively straightforward data set, popular in econometrics and social sciences is the "Mroz" data, see e.g., help("PSID1976", package = "AER"). I would be happy to help with these if such additions were considered for datasets/stats. On Sat, 15 Dec 2018, David Hugh-Jones wrote:> I would argue examples should encourage good practice. Beginners ought to > learn to keep data in data frames and not to overuse attach(). Experts can > do otherwise at their own risk, but they have less need of explicit > examples. > > On Fri, 14 Dec 2018 at 14:51, S Ellison <S.Ellison at lgcgroup.com> wrote: > >> FWIW, before all the examples are changed to data frame variants, I think >> there's fairly good reason to have at least _one_ example that does _not_ >> place variables in a data frame. >> >> The data argument in lm() is optional. And there is more than one way to >> manage data in a project. I personally don't much like lots of stray >> variables lurking about, but if those are the only variables out there and >> we can be sure they aren't affected by other code, it's hardly essential to >> create a data frame to hold something you already have. >> Also, attach() is still part of R, for those folk who have a data frame >> but want to reference the contents across a wider range of functions >> without using with() a lot. lm() can reasonably omit the data argument >> there, too. >> >> So while there are good reasons to use data frames, there are also good >> reasons to provide examples that don't. >> >> Steve Ellison >> >> >>> -----Original Message----- >>> From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of Ben >>> Bolker >>> Sent: 13 December 2018 20:36 >>> To: r-devel at r-project.org >>> Subject: Re: [Rd] Documentation examples for lm and glm >>> >>> >>> Agree. Or just create the data frame with those variables in it >>> directly ... >>> >>> On 2018-12-13 3:26 p.m., Thomas Yee wrote: >>>> Hello, >>>> >>>> something that has been on my mind for a decade or two has >>>> been the examples for lm() and glm(). They encourage poor style >>>> because of mismanagement of data frames. Also, having the >>>> variables in a data frame means that predict() >>>> is more likely to work properly. >>>> >>>> For lm(), the variables should be put into a data frame. >>>> As 2 vectors are assigned first in the general workspace they >>>> should be deleted afterwards. >>>> >>>> For the glm(), the data frame d.AD is constructed but not used. Also, >>>> its 3 components were assigned first in the general workspace, so they >>>> float around dangerously afterwards like in the lm() example. >>>> >>>> Rather than attached improved .Rd files here, they are put at >>>> www.stat.auckland.ac.nz/~yee/Rdfiles >>>> You are welcome to use them! >>>> >>>> Best, >>>> >>>> Thomas >>>> >>>> ______________________________________________ >>>> R-devel at r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> >>> ______________________________________________ >>> R-devel at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >> >> >> ******************************************************************* >> This email and any attachments are confidential. Any u...{{dropped:12}} > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >
frederik m@ili@g off ofb@@et
2018-Dec-15 14:24 UTC
[Rd] Documentation examples for lm and glm
I agree with Steve and Achim that we should keep some examples with no data frame. That's Objectively Simpler, whether or not it leads to clutter in the wrong hands. As Steve points out, we have attach() which is an excellent language feature - not to mention with(). I would go even further and say that the examples that are in lm() now should stay at the top. Because people may be used to referring to them, and also because Historical Order is generally a good order in which to learn things. However, if there is an important function argument ("data=") not in the examples, then we should add examples which use it. Likewise if there is a popular programming style (putting things in a data frame). So let's do something along the lines of what Thomas is requesting, but put it after the existing documentation? Please? On a bit of a tangent, I would like to see an example in lm() which plots my data with a fitted line through it. I'm probably betraying my ignorance here, but I was asked how to do this when showing R to a friend and I thought it should be in lm(), after all it seems a bit more basic than displaying a Normal Q-Q plot (whatever that is! gasp...). Similarly for glm(). Perhaps all this can be accomplished with merely doubling the size of the existing examples. Thanks. Frederick On Sat, Dec 15, 2018 at 02:15:52PM +0100, Achim Zeileis wrote:>A pragmatic solution could be to create a simple linear regression >example with variables in the global environment and then another >example with a data.frame. > >The latter might be somewhat more complex, e.g., with several >regressors and/or mixed categorical and numeric covariates to >illustrate how regression and analysis of (co-)variance can be >combined. I like to use MASS's whiteside data for this: > >data("whiteside", package = "MASS") >m1 <- lm(Gas ~ Temp, data = whiteside) >m2 <- lm(Gas ~ Insul + Temp, data = whiteside) >m3 <- lm(Gas ~ Insul * Temp, data = whiteside) >anova(m1, m2, m3) > >Moreover, some binary response data.frame with a few covariates might >be a useful addition to "datasets". For example a more granular >version of the "Titanic" data (in addition to the 4-way tabel >?Titanic). Or another relatively straightforward data set, popular in >econometrics and social sciences is the "Mroz" data, see e.g., >help("PSID1976", package = "AER"). > >I would be happy to help with these if such additions were considered >for datasets/stats. > > >On Sat, 15 Dec 2018, David Hugh-Jones wrote: > >>I would argue examples should encourage good practice. Beginners ought to >>learn to keep data in data frames and not to overuse attach(). Experts can >>do otherwise at their own risk, but they have less need of explicit >>examples. >> >>On Fri, 14 Dec 2018 at 14:51, S Ellison <S.Ellison at lgcgroup.com> wrote: >> >>>FWIW, before all the examples are changed to data frame variants, I think >>>there's fairly good reason to have at least _one_ example that does _not_ >>>place variables in a data frame. >>> >>>The data argument in lm() is optional. And there is more than one way to >>>manage data in a project. I personally don't much like lots of stray >>>variables lurking about, but if those are the only variables out there and >>>we can be sure they aren't affected by other code, it's hardly essential to >>>create a data frame to hold something you already have. >>>Also, attach() is still part of R, for those folk who have a data frame >>>but want to reference the contents across a wider range of functions >>>without using with() a lot. lm() can reasonably omit the data argument >>>there, too. >>> >>>So while there are good reasons to use data frames, there are also good >>>reasons to provide examples that don't. >>> >>>Steve Ellison >>> >>> >>>>-----Original Message----- >>>>From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of Ben >>>>Bolker >>>>Sent: 13 December 2018 20:36 >>>>To: r-devel at r-project.org >>>>Subject: Re: [Rd] Documentation examples for lm and glm >>>> >>>> >>>> Agree. Or just create the data frame with those variables in it >>>>directly ... >>>> >>>>On 2018-12-13 3:26 p.m., Thomas Yee wrote: >>>>>Hello, >>>>> >>>>>something that has been on my mind for a decade or two has >>>>>been the examples for lm() and glm(). They encourage poor style >>>>>because of mismanagement of data frames. Also, having the >>>>>variables in a data frame means that predict() >>>>>is more likely to work properly. >>>>> >>>>>For lm(), the variables should be put into a data frame. >>>>>As 2 vectors are assigned first in the general workspace they >>>>>should be deleted afterwards. >>>>> >>>>>For the glm(), the data frame d.AD is constructed but not used. Also, >>>>>its 3 components were assigned first in the general workspace, so they >>>>>float around dangerously afterwards like in the lm() example. >>>>> >>>>>Rather than attached improved .Rd files here, they are put at >>>>>www.stat.auckland.ac.nz/~yee/Rdfiles >>>>>You are welcome to use them! >>>>> >>>>>Best, >>>>> >>>>>Thomas >>>>> >>>>>______________________________________________ >>>>>R-devel at r-project.org mailing list >>>>>https://stat.ethz.ch/mailman/listinfo/r-devel >>>> >>>>______________________________________________ >>>>R-devel at r-project.org mailing list >>>>https://stat.ethz.ch/mailman/listinfo/r-devel >>> >>> >>>******************************************************************* >>>This email and any attachments are confidential. Any u...{{dropped:12}} >> >>______________________________________________ >>R-devel at r-project.org mailing list >>https://stat.ethz.ch/mailman/listinfo/r-devel >> > >______________________________________________ >R-devel at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-devel >
On Sat, 15 Dec 2018, frederik at ofb.net wrote:> I agree with Steve and Achim that we should keep some examples with no > data frame. That's Objectively Simpler, whether or not it leads to > clutter in the wrong hands. As Steve points out, we have attach() > which is an excellent language feature - not to mention with().Just for the record: Personally, I wouldn't recommend using lm() with attach() or with() but would always encourage using data= instead. In my previous e-mail I just wanted to point out that a pragmatic step for the man page could be to keep one example without data= argument when adding examples with data=.> I would go even further and say that the examples that are in lm() now > should stay at the top. Because people may be used to referring to > them, and also because Historical Order is generally a good order in > which to learn things. However, if there is an important function > argument ("data=") not in the examples, then we should add examples > which use it. Likewise if there is a popular programming style > (putting things in a data frame). So let's do something along the > lines of what Thomas is requesting, but put it after the existing > documentation? Please? > > On a bit of a tangent, I would like to see an example in lm() which > plots my data with a fitted line through it. I'm probably betraying my > ignorance here, but I was asked how to do this when showing R to a > friend and I thought it should be in lm(), after all it seems a bit > more basic than displaying a Normal Q-Q plot (whatever that is! > gasp...). Similarly for glm(). Perhaps all this can be accomplished > with merely doubling the size of the existing examples. > > Thanks. > > Frederick > > On Sat, Dec 15, 2018 at 02:15:52PM +0100, Achim Zeileis wrote: >> A pragmatic solution could be to create a simple linear regression example >> with variables in the global environment and then another example with a >> data.frame. >> >> The latter might be somewhat more complex, e.g., with several regressors >> and/or mixed categorical and numeric covariates to illustrate how >> regression and analysis of (co-)variance can be combined. I like to use >> MASS's whiteside data for this: >> >> data("whiteside", package = "MASS") >> m1 <- lm(Gas ~ Temp, data = whiteside) >> m2 <- lm(Gas ~ Insul + Temp, data = whiteside) >> m3 <- lm(Gas ~ Insul * Temp, data = whiteside) >> anova(m1, m2, m3) >> >> Moreover, some binary response data.frame with a few covariates might be a >> useful addition to "datasets". For example a more granular version of the >> "Titanic" data (in addition to the 4-way tabel ?Titanic). Or another >> relatively straightforward data set, popular in econometrics and social >> sciences is the "Mroz" data, see e.g., help("PSID1976", package = "AER"). >> >> I would be happy to help with these if such additions were considered for >> datasets/stats. >> >> >> On Sat, 15 Dec 2018, David Hugh-Jones wrote: >> >>> I would argue examples should encourage good practice. Beginners ought to >>> learn to keep data in data frames and not to overuse attach(). Experts can >>> do otherwise at their own risk, but they have less need of explicit >>> examples. >>> >>> On Fri, 14 Dec 2018 at 14:51, S Ellison <S.Ellison at lgcgroup.com> wrote: >>> >>>> FWIW, before all the examples are changed to data frame variants, I think >>>> there's fairly good reason to have at least _one_ example that does _not_ >>>> place variables in a data frame. >>>> >>>> The data argument in lm() is optional. And there is more than one way to >>>> manage data in a project. I personally don't much like lots of stray >>>> variables lurking about, but if those are the only variables out there >>>> and >>>> we can be sure they aren't affected by other code, it's hardly essential >>>> to >>>> create a data frame to hold something you already have. >>>> Also, attach() is still part of R, for those folk who have a data frame >>>> but want to reference the contents across a wider range of functions >>>> without using with() a lot. lm() can reasonably omit the data argument >>>> there, too. >>>> >>>> So while there are good reasons to use data frames, there are also good >>>> reasons to provide examples that don't. >>>> >>>> Steve Ellison >>>> >>>> >>>>> -----Original Message----- >>>>> From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of Ben >>>>> Bolker >>>>> Sent: 13 December 2018 20:36 >>>>> To: r-devel at r-project.org >>>>> Subject: Re: [Rd] Documentation examples for lm and glm >>>>> >>>>> >>>>> Agree. Or just create the data frame with those variables in it >>>>> directly ... >>>>> >>>>> On 2018-12-13 3:26 p.m., Thomas Yee wrote: >>>>>> Hello, >>>>>> >>>>>> something that has been on my mind for a decade or two has >>>>>> been the examples for lm() and glm(). They encourage poor style >>>>>> because of mismanagement of data frames. Also, having the >>>>>> variables in a data frame means that predict() >>>>>> is more likely to work properly. >>>>>> >>>>>> For lm(), the variables should be put into a data frame. >>>>>> As 2 vectors are assigned first in the general workspace they >>>>>> should be deleted afterwards. >>>>>> >>>>>> For the glm(), the data frame d.AD is constructed but not used. Also, >>>>>> its 3 components were assigned first in the general workspace, so they >>>>>> float around dangerously afterwards like in the lm() example. >>>>>> >>>>>> Rather than attached improved .Rd files here, they are put at >>>>>> www.stat.auckland.ac.nz/~yee/Rdfiles >>>>>> You are welcome to use them! >>>>>> >>>>>> Best, >>>>>> >>>>>> Thomas >>>>>> >>>>>> ______________________________________________ >>>>>> R-devel at r-project.org mailing list >>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel >>>>> >>>>> ______________________________________________ >>>>> R-devel at r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/r-devel >>>> >>>> >>>> ******************************************************************* >>>> This email and any attachments are confidential. Any u...{{dropped:12}} >>> >>> ______________________________________________ >>> R-devel at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> >