thr3ads.net - R devel - [Rd] Documentation examples for lm and glm [Dec 2018]

If this information is useful, please help other people find it:
Share via:

S Ellison

2018-Dec-14 13:50 UTC

[Rd] Documentation examples for lm and glm

FWIW, before all the examples are changed to data frame variants, I think
there's fairly good reason to have at least _one_ example that does _not_
place variables in a data frame.

The data argument in lm() is optional. And there is more than one way to manage
data in a project. I personally don't much like lots of stray variables
lurking about, but if those are the only variables out there and we can be sure
they aren't affected by other code, it's hardly essential to create a
data frame to hold something you already have.
Also, attach() is still part of R, for those folk who have a data frame but want
to reference the contents across a wider range of functions without using with()
a lot. lm() can reasonably omit the data argument there, too.

So while there are good reasons to use data frames, there are also good reasons
to provide examples that don't.

Steve Ellison

> -----Original Message-----
> From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of Ben
> Bolker
> Sent: 13 December 2018 20:36
> To: r-devel at r-project.org
> Subject: Re: [Rd] Documentation examples for lm and glm
> 
> 
>   Agree.  Or just create the data frame with those variables in it
> directly ...
> 
> On 2018-12-13 3:26 p.m., Thomas Yee wrote:
> > Hello,
> >
> > something that has been on my mind for a decade or two has
> > been the examples for lm() and glm(). They encourage poor style
> > because of mismanagement of data frames. Also, having the
> > variables in a data frame means that predict()
> > is more likely to work properly.
> >
> > For lm(), the variables should be put into a data frame.
> > As 2 vectors are assigned first in the general workspace they
> > should be deleted afterwards.
> >
> > For the glm(), the data frame d.AD is constructed but not used. Also,
> > its 3 components were assigned first in the general workspace, so they
> > float around dangerously afterwards like in the lm() example.
> >
> > Rather than attached improved .Rd files here, they are put at
> > www.stat.auckland.ac.nz/~yee/Rdfiles
> > You are welcome to use them!
> >
> > Best,
> >
> > Thomas
> >
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}

David Hugh-Jones

2018-Dec-15 07:47 UTC

head link

[Rd] Documentation examples for lm and glm

I would argue examples should encourage good practice. Beginners ought to
learn to keep data in data frames and not to overuse attach(). Experts can
do otherwise at their own risk, but they have less need of explicit
examples.

On Fri, 14 Dec 2018 at 14:51, S Ellison <S.Ellison at lgcgroup.com> wrote:
> FWIW, before all the examples are changed to data frame variants, I think
> there's fairly good reason to have at least _one_ example that does
_not_
> place variables in a data frame.
>
> The data argument in lm() is optional. And there is more than one way to
> manage data in a project. I personally don't much like lots of stray
> variables lurking about, but if those are the only variables out there and
> we can be sure they aren't affected by other code, it's hardly
essential to
> create a data frame to hold something you already have.
> Also, attach() is still part of R, for those folk who have a data frame
> but want to reference the contents across a wider range of functions
> without using with() a lot. lm() can reasonably omit the data argument
> there, too.
>
> So while there are good reasons to use data frames, there are also good
> reasons to provide examples that don't.
>
> Steve Ellison
>
>
> > -----Original Message-----
> > From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of
Ben
> > Bolker
> > Sent: 13 December 2018 20:36
> > To: r-devel at r-project.org
> > Subject: Re: [Rd] Documentation examples for lm and glm
> >
> >
> >   Agree.  Or just create the data frame with those variables in it
> > directly ...
> >
> > On 2018-12-13 3:26 p.m., Thomas Yee wrote:
> > > Hello,
> > >
> > > something that has been on my mind for a decade or two has
> > > been the examples for lm() and glm(). They encourage poor style
> > > because of mismanagement of data frames. Also, having the
> > > variables in a data frame means that predict()
> > > is more likely to work properly.
> > >
> > > For lm(), the variables should be put into a data frame.
> > > As 2 vectors are assigned first in the general workspace they
> > > should be deleted afterwards.
> > >
> > > For the glm(), the data frame d.AD is constructed but not used.
Also,
> > > its 3 components were assigned first in the general workspace, so
they
> > > float around dangerously afterwards like in the lm() example.
> > >
> > > Rather than attached improved .Rd files here, they are put at
> > > www.stat.auckland.ac.nz/~yee/Rdfiles
> > > You are welcome to use them!
> > >
> > > Best,
> > >
> > > Thomas
> > >
> > > ______________________________________________
> > > R-devel at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
> *******************************************************************
> This email and any attachments are confidential. Any u...{{dropped:12}}

Achim Zeileis

2018-Dec-15 13:15 UTC

head link

[Rd] Documentation examples for lm and glm

A pragmatic solution could be to create a simple linear regression example 
with variables in the global environment and then another example with a 
data.frame.

The latter might be somewhat more complex, e.g., with several regressors 
and/or mixed categorical and numeric covariates to illustrate how 
regression and analysis of (co-)variance can be combined. I like to use 
MASS's whiteside data for this:

data("whiteside", package = "MASS")
m1 <- lm(Gas ~ Temp, data = whiteside)
m2 <- lm(Gas ~ Insul + Temp, data = whiteside)
m3 <- lm(Gas ~ Insul * Temp, data = whiteside)
anova(m1, m2, m3)

Moreover, some binary response data.frame with a few covariates might be a 
useful addition to "datasets". For example a more granular version of
the
"Titanic" data (in addition to the 4-way tabel ?Titanic). Or another 
relatively straightforward data set, popular in econometrics and social 
sciences is the "Mroz" data, see e.g., help("PSID1976",
package = "AER").

I would be happy to help with these if such additions were considered for 
datasets/stats.


On Sat, 15 Dec 2018, David Hugh-Jones wrote:
> I would argue examples should encourage good practice. Beginners ought to
> learn to keep data in data frames and not to overuse attach(). Experts can
> do otherwise at their own risk, but they have less need of explicit
> examples.
>
> On Fri, 14 Dec 2018 at 14:51, S Ellison <S.Ellison at lgcgroup.com>
wrote:
>
>> FWIW, before all the examples are changed to data frame variants, I
think
>> there's fairly good reason to have at least _one_ example that does
_not_
>> place variables in a data frame.
>>
>> The data argument in lm() is optional. And there is more than one way
to
>> manage data in a project. I personally don't much like lots of
stray
>> variables lurking about, but if those are the only variables out there
and
>> we can be sure they aren't affected by other code, it's hardly
essential to
>> create a data frame to hold something you already have.
>> Also, attach() is still part of R, for those folk who have a data frame
>> but want to reference the contents across a wider range of functions
>> without using with() a lot. lm() can reasonably omit the data argument
>> there, too.
>>
>> So while there are good reasons to use data frames, there are also good
>> reasons to provide examples that don't.
>>
>> Steve Ellison
>>
>>
>>> -----Original Message-----
>>> From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf
Of Ben
>>> Bolker
>>> Sent: 13 December 2018 20:36
>>> To: r-devel at r-project.org
>>> Subject: Re: [Rd] Documentation examples for lm and glm
>>>
>>>
>>>   Agree.  Or just create the data frame with those variables in it
>>> directly ...
>>>
>>> On 2018-12-13 3:26 p.m., Thomas Yee wrote:
>>>> Hello,
>>>>
>>>> something that has been on my mind for a decade or two has
>>>> been the examples for lm() and glm(). They encourage poor style
>>>> because of mismanagement of data frames. Also, having the
>>>> variables in a data frame means that predict()
>>>> is more likely to work properly.
>>>>
>>>> For lm(), the variables should be put into a data frame.
>>>> As 2 vectors are assigned first in the general workspace they
>>>> should be deleted afterwards.
>>>>
>>>> For the glm(), the data frame d.AD is constructed but not used.
Also,
>>>> its 3 components were assigned first in the general workspace,
so they
>>>> float around dangerously afterwards like in the lm() example.
>>>>
>>>> Rather than attached improved .Rd files here, they are put at
>>>> www.stat.auckland.ac.nz/~yee/Rdfiles
>>>> You are welcome to use them!
>>>>
>>>> Best,
>>>>
>>>> Thomas
>>>>
>>>> ______________________________________________
>>>> R-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>> *******************************************************************
>> This email and any attachments are confidential. Any u...{{dropped:12}}
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

Martin Maechler

2018-Dec-17 08:05 UTC

head link

[Rd] Documentation examples for lm and glm

>>>>> David Hugh-Jones 
>>>>>     on Sat, 15 Dec 2018 08:47:28 +0100 writes:
    > I would argue examples should encourage good
    > practice. Beginners ought to learn to keep data in data
    > frames and not to overuse attach(). 

Note there's no attach() there in any of these examples!

    > otherwise at their own risk, but they have less need of
    > explicit examples.

The glm examples are nice in sofar they show both uses.

I agree the lm() example(s) are  "didactically misleading" by
not using data frames at all.

I disagree that only data frame examples should be shown.
If  lm()  is one of the first R functions a beginneR must use --
because they are in a basic stats class, say --  it may be
*better* didactically to focus on lm()  in the very first
example, and use data frames in a next one ...
.... and instead of next one, we have the pretty clear comment
     
  ### less simple examples in "See Also" above

I'm not convinced (but you can try more) we should change those
examples or add more there.

Martin

    > On Fri, 14 Dec 2018 at 14:51, S Ellison
    > <S.Ellison at lgcgroup.com> wrote:

    >> FWIW, before all the examples are changed to data frame
    >> variants, I think there's fairly good reason to have at
    >> least _one_ example that does _not_ place variables in a
    >> data frame.
    >> 
    >> The data argument in lm() is optional. And there is more
    >> than one way to manage data in a project. I personally
    >> don't much like lots of stray variables lurking about,
    >> but if those are the only variables out there and we can
    >> be sure they aren't affected by other code, it's hardly
    >> essential to create a data frame to hold something you
    >> already have.  Also, attach() is still part of R, for
    >> those folk who have a data frame but want to reference
    >> the contents across a wider range of functions without
    >> using with() a lot. lm() can reasonably omit the data
    >> argument there, too.
    >> 
    >> So while there are good reasons to use data frames, there
    >> are also good reasons to provide examples that don't.
    >> 
    >> Steve Ellison
    >> 
    >> 
    >> > -----Original Message----- > From: R-devel
    >> [mailto:r-devel-bounces at r-project.org] On Behalf Of Ben >
    >> Bolker > Sent: 13 December 2018 20:36 > To:
    >> r-devel at r-project.org > Subject: Re: [Rd] Documentation
    >> examples for lm and glm
    >> >
    >> >
    >> > Agree.  Or just create the data frame with those
    >> variables in it > directly ...
    >> >
    >> > On 2018-12-13 3:26 p.m., Thomas Yee wrote: > > Hello,
    >> > >
    >> > > something that has been on my mind for a decade or
    >> two has > > been the examples for lm() and glm(). They
    >> encourage poor style > > because of mismanagement of data
    >> frames. Also, having the > > variables in a data frame
    >> means that predict() > > is more likely to work properly.
    >> > >
    >> > > For lm(), the variables should be put into a data
    >> frame.  > > As 2 vectors are assigned first in the
    >> general workspace they > > should be deleted afterwards.
    >> > >
    >> > > For the glm(), the data frame d.AD is constructed but
    >> not used. Also, > > its 3 components were assigned first
    >> in the general workspace, so they > > float around
    >> dangerously afterwards like in the lm() example.
    >> > >
    >> > > Rather than attached improved .Rd files here, they
    >> are put at > > www.stat.auckland.ac.nz/~yee/Rdfiles > >
    >> You are welcome to use them!
    >> > >
    >> > > Best,
    >> > >
    >> > > Thomas
    >> > >
    >> > > ______________________________________________ > >
    >> R-devel at r-project.org mailing list > >
    >> https://stat.ethz.ch/mailman/listinfo/r-devel
    >> >
    >> > ______________________________________________ >
    >> R-devel at r-project.org mailing list >
    >> https://stat.ethz.ch/mailman/listinfo/r-devel
    >> 
    >> 
    >> *******************************************************************
    >> This email and any attachments are confidential. Any
    >> u...{{dropped:12}}

    > ______________________________________________
    > R-devel at r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel

Fox, John

2018-Dec-17 14:21 UTC

head link

[Rd] Documentation examples for lm and glm

Dear Martin,

I think that everyone agrees that it?s generally preferable to use the data
argument to lm() and I have nothing significant to add to the substance of the
discussion, but I think that it?s a mistake not to add to the current examples,
for the following reasons:

(1) Relegating examples using the data argument to ?see also? doesn?t suggest
that using the argument is a best practice. Most users won?t bother to click the
links.

(2) In my opinion, an new initial example using the data argument would more
clearly suggest that this is the normally the best option.

(3) I think that it would also be desirable to add a remark to the explanation
of the data argument, something like, ?Although the argument is optional,
it's generally preferable to specify it explicitly.? And similarly on the
help page for glm().

My two (or three) cents.

John

  -------------------------------------------------
  John Fox, Professor Emeritus
  McMaster University
  Hamilton, Ontario, Canada
  Web: http::/socserv.mcmaster.ca/jfox
> On Dec 17, 2018, at 3:05 AM, Martin Maechler <maechler at
stat.math.ethz.ch> wrote:
> 
>>>>>> David Hugh-Jones 
>>>>>>    on Sat, 15 Dec 2018 08:47:28 +0100 writes:
> 
>> I would argue examples should encourage good
>> practice. Beginners ought to learn to keep data in data
>> frames and not to overuse attach(). 
> 
> Note there's no attach() there in any of these examples!
> 
>> otherwise at their own risk, but they have less need of
>> explicit examples.
> 
> The glm examples are nice in sofar they show both uses.
> 
> I agree the lm() example(s) are  "didactically misleading" by
> not using data frames at all.
> 
> I disagree that only data frame examples should be shown.
> If  lm()  is one of the first R functions a beginneR must use --
> because they are in a basic stats class, say --  it may be
> *better* didactically to focus on lm()  in the very first
> example, and use data frames in a next one ...
> .... and instead of next one, we have the pretty clear comment
> 
>  ### less simple examples in "See Also" above
> 
> I'm not convinced (but you can try more) we should change those
> examples or add more there.
> 
> Martin
> 
>> On Fri, 14 Dec 2018 at 14:51, S Ellison
>> <S.Ellison at lgcgroup.com> wrote:
> 
>>> FWIW, before all the examples are changed to data frame
>>> variants, I think there's fairly good reason to have at
>>> least _one_ example that does _not_ place variables in a
>>> data frame.
>>> 
>>> The data argument in lm() is optional. And there is more
>>> than one way to manage data in a project. I personally
>>> don't much like lots of stray variables lurking about,
>>> but if those are the only variables out there and we can
>>> be sure they aren't affected by other code, it's hardly
>>> essential to create a data frame to hold something you
>>> already have.  Also, attach() is still part of R, for
>>> those folk who have a data frame but want to reference
>>> the contents across a wider range of functions without
>>> using with() a lot. lm() can reasonably omit the data
>>> argument there, too.
>>> 
>>> So while there are good reasons to use data frames, there
>>> are also good reasons to provide examples that don't.
>>> 
>>> Steve Ellison
>>> 
>>> 
>>>> -----Original Message----- > From: R-devel
>>> [mailto:r-devel-bounces at r-project.org] On Behalf Of Ben >
>>> Bolker > Sent: 13 December 2018 20:36 > To:
>>> r-devel at r-project.org > Subject: Re: [Rd] Documentation
>>> examples for lm and glm
>>>> 
>>>> 
>>>> Agree.  Or just create the data frame with those
>>> variables in it > directly ...
>>>> 
>>>> On 2018-12-13 3:26 p.m., Thomas Yee wrote: > > Hello,
>>>>> 
>>>>> something that has been on my mind for a decade or
>>> two has > > been the examples for lm() and glm(). They
>>> encourage poor style > > because of mismanagement of data
>>> frames. Also, having the > > variables in a data frame
>>> means that predict() > > is more likely to work properly.
>>>>> 
>>>>> For lm(), the variables should be put into a data
>>> frame.  > > As 2 vectors are assigned first in the
>>> general workspace they > > should be deleted afterwards.
>>>>> 
>>>>> For the glm(), the data frame d.AD is constructed but
>>> not used. Also, > > its 3 components were assigned first
>>> in the general workspace, so they > > float around
>>> dangerously afterwards like in the lm() example.
>>>>> 
>>>>> Rather than attached improved .Rd files here, they
>>> are put at > > www.stat.auckland.ac.nz/~yee/Rdfiles > >
>>> You are welcome to use them!
>>>>> 
>>>>> Best,
>>>>> 
>>>>> Thomas
>>>>> 
>>>>> ______________________________________________ > >
>>> R-devel at r-project.org mailing list > >
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>> 
>>>> ______________________________________________ >
>>> R-devel at r-project.org mailing list >
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>> 
>>> 
>>> *******************************************************************
>>> This email and any attachments are confidential. Any
>>> u...{{dropped:12}}
> 
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

Heinz Tuechler

2018-Dec-17 15:19 UTC

head link

[Rd] Documentation examples for lm and glm

Dear All,

do you think that use of a data argument is best practice in the example 
below?

regards,

Heinz

### trivial example
plotwithline <- function(x, y) {
     plot(x, y)
     abline(lm(y~x)) ## data argument?
}

set.seed(25)
df0 <- data.frame(x=rnorm(20), y=rnorm(20))

plotwithline(df0[['x']], df0[['y']])



Fox, John wrote/hat geschrieben on/am 17.12.2018 15:21:> Dear Martin,
>
> I think that everyone agrees that it?s generally preferable to use the data
argument to lm() and I have nothing significant to add to the substance of the
discussion, but I think that it?s a mistake not to add to the current examples,
for the following reasons:
>
> (1) Relegating examples using the data argument to ?see also? doesn?t
suggest that using the argument is a best practice. Most users won?t bother to
click the links.
>
> (2) In my opinion, an new initial example using the data argument would
more clearly suggest that this is the normally the best option.
>
> (3) I think that it would also be desirable to add a remark to the
explanation of the data argument, something like, ?Although the argument is
optional, it's generally preferable to specify it explicitly.? And similarly
on the help page for glm().
>
> My two (or three) cents.
>
> John
>
>   -------------------------------------------------
>   John Fox, Professor Emeritus
>   McMaster University
>   Hamilton, Ontario, Canada
>   Web: http::/socserv.mcmaster.ca/jfox
>
>> On Dec 17, 2018, at 3:05 AM, Martin Maechler <maechler at
stat.math.ethz.ch> wrote:
>>
>>>>>>> David Hugh-Jones
>>>>>>>    on Sat, 15 Dec 2018 08:47:28 +0100 writes:
>>
>>> I would argue examples should encourage good
>>> practice. Beginners ought to learn to keep data in data
>>> frames and not to overuse attach().
>>
>> Note there's no attach() there in any of these examples!
>>
>>> otherwise at their own risk, but they have less need of
>>> explicit examples.
>>
>> The glm examples are nice in sofar they show both uses.
>>
>> I agree the lm() example(s) are  "didactically misleading" by
>> not using data frames at all.
>>
>> I disagree that only data frame examples should be shown.
>> If  lm()  is one of the first R functions a beginneR must use --
>> because they are in a basic stats class, say --  it may be
>> *better* didactically to focus on lm()  in the very first
>> example, and use data frames in a next one ...
>> .... and instead of next one, we have the pretty clear comment
>>
>>  ### less simple examples in "See Also" above
>>
>> I'm not convinced (but you can try more) we should change those
>> examples or add more there.
>>
>> Martin
>>
>>> On Fri, 14 Dec 2018 at 14:51, S Ellison
>>> <S.Ellison at lgcgroup.com> wrote:
>>
>>>> FWIW, before all the examples are changed to data frame
>>>> variants, I think there's fairly good reason to have at
>>>> least _one_ example that does _not_ place variables in a
>>>> data frame.
>>>>
>>>> The data argument in lm() is optional. And there is more
>>>> than one way to manage data in a project. I personally
>>>> don't much like lots of stray variables lurking about,
>>>> but if those are the only variables out there and we can
>>>> be sure they aren't affected by other code, it's hardly
>>>> essential to create a data frame to hold something you
>>>> already have.  Also, attach() is still part of R, for
>>>> those folk who have a data frame but want to reference
>>>> the contents across a wider range of functions without
>>>> using with() a lot. lm() can reasonably omit the data
>>>> argument there, too.
>>>>
>>>> So while there are good reasons to use data frames, there
>>>> are also good reasons to provide examples that don't.
>>>>
>>>> Steve Ellison
>>>>
>>>>
>>>>> -----Original Message----- > From: R-devel
>>>> [mailto:r-devel-bounces at r-project.org] On Behalf Of Ben >
>>>> Bolker > Sent: 13 December 2018 20:36 > To:
>>>> r-devel at r-project.org > Subject: Re: [Rd] Documentation
>>>> examples for lm and glm
>>>>>
>>>>>
>>>>> Agree.  Or just create the data frame with those
>>>> variables in it > directly ...
>>>>>
>>>>> On 2018-12-13 3:26 p.m., Thomas Yee wrote: > > Hello,
>>>>>>
>>>>>> something that has been on my mind for a decade or
>>>> two has > > been the examples for lm() and glm(). They
>>>> encourage poor style > > because of mismanagement of data
>>>> frames. Also, having the > > variables in a data frame
>>>> means that predict() > > is more likely to work properly.
>>>>>>
>>>>>> For lm(), the variables should be put into a data
>>>> frame.  > > As 2 vectors are assigned first in the
>>>> general workspace they > > should be deleted afterwards.
>>>>>>
>>>>>> For the glm(), the data frame d.AD is constructed but
>>>> not used. Also, > > its 3 components were assigned first
>>>> in the general workspace, so they > > float around
>>>> dangerously afterwards like in the lm() example.
>>>>>>
>>>>>> Rather than attached improved .Rd files here, they
>>>> are put at > > www.stat.auckland.ac.nz/~yee/Rdfiles >
>
>>>> You are welcome to use them!
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Thomas
>>>>>>
>>>>>> ______________________________________________ >
>
>>>> R-devel at r-project.org mailing list > >
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>>
>>>>> ______________________________________________ >
>>>> R-devel at r-project.org mailing list >
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>
>>>>
>>>>
*******************************************************************
>>>> This email and any attachments are confidential. Any
>>>> u...{{dropped:12}}
>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

Fox, John

2018-Dec-17 15:23 UTC

head link

[Rd] Documentation examples for lm and glm

Dear Heinz,

  ----------------------------------------------> On Dec 17, 2018, at 10:19 AM, Heinz Tuechler <tuechler at gmx.at>
wrote:
> 
> Dear All,
> 
> do you think that use of a data argument is best practice in the example
below?
No, but it is *normally* or *usually* the best option, in my opinion.

Best,
 John
> 
> regards,
> 
> Heinz
> 
> ### trivial example
> plotwithline <- function(x, y) {
>    plot(x, y)
>    abline(lm(y~x)) ## data argument?
> }
> 
> set.seed(25)
> df0 <- data.frame(x=rnorm(20), y=rnorm(20))
> 
> plotwithline(df0[['x']], df0[['y']])
> 
> 
> 
> Fox, John wrote/hat geschrieben on/am 17.12.2018 15:21:
>> Dear Martin,
>> 
>> I think that everyone agrees that it?s generally preferable to use the
data argument to lm() and I have nothing significant to add to the substance of
the discussion, but I think that it?s a mistake not to add to the current
examples, for the following reasons:
>> 
>> (1) Relegating examples using the data argument to ?see also? doesn?t
suggest that using the argument is a best practice. Most users won?t bother to
click the links.
>> 
>> (2) In my opinion, an new initial example using the data argument would
more clearly suggest that this is the normally the best option.
>> 
>> (3) I think that it would also be desirable to add a remark to the
explanation of the data argument, something like, ?Although the argument is
optional, it's generally preferable to specify it explicitly.? And similarly
on the help page for glm().
>> 
>> My two (or three) cents.
>> 
>> John
>> 
>>  -------------------------------------------------
>>  John Fox, Professor Emeritus
>>  McMaster University
>>  Hamilton, Ontario, Canada
>>  Web: http::/socserv.mcmaster.ca/jfox
>> 
>>> On Dec 17, 2018, at 3:05 AM, Martin Maechler <maechler at
stat.math.ethz.ch> wrote:
>>> 
>>>>>>>> David Hugh-Jones
>>>>>>>>   on Sat, 15 Dec 2018 08:47:28 +0100 writes:
>>> 
>>>> I would argue examples should encourage good
>>>> practice. Beginners ought to learn to keep data in data
>>>> frames and not to overuse attach().
>>> 
>>> Note there's no attach() there in any of these examples!
>>> 
>>>> otherwise at their own risk, but they have less need of
>>>> explicit examples.
>>> 
>>> The glm examples are nice in sofar they show both uses.
>>> 
>>> I agree the lm() example(s) are  "didactically
misleading" by
>>> not using data frames at all.
>>> 
>>> I disagree that only data frame examples should be shown.
>>> If  lm()  is one of the first R functions a beginneR must use --
>>> because they are in a basic stats class, say --  it may be
>>> *better* didactically to focus on lm()  in the very first
>>> example, and use data frames in a next one ...
>>> .... and instead of next one, we have the pretty clear comment
>>> 
>>> ### less simple examples in "See Also" above
>>> 
>>> I'm not convinced (but you can try more) we should change those
>>> examples or add more there.
>>> 
>>> Martin
>>> 
>>>> On Fri, 14 Dec 2018 at 14:51, S Ellison
>>>> <S.Ellison at lgcgroup.com> wrote:
>>> 
>>>>> FWIW, before all the examples are changed to data frame
>>>>> variants, I think there's fairly good reason to have at
>>>>> least _one_ example that does _not_ place variables in a
>>>>> data frame.
>>>>> 
>>>>> The data argument in lm() is optional. And there is more
>>>>> than one way to manage data in a project. I personally
>>>>> don't much like lots of stray variables lurking about,
>>>>> but if those are the only variables out there and we can
>>>>> be sure they aren't affected by other code, it's
hardly
>>>>> essential to create a data frame to hold something you
>>>>> already have.  Also, attach() is still part of R, for
>>>>> those folk who have a data frame but want to reference
>>>>> the contents across a wider range of functions without
>>>>> using with() a lot. lm() can reasonably omit the data
>>>>> argument there, too.
>>>>> 
>>>>> So while there are good reasons to use data frames, there
>>>>> are also good reasons to provide examples that don't.
>>>>> 
>>>>> Steve Ellison
>>>>> 
>>>>> 
>>>>>> -----Original Message----- > From: R-devel
>>>>> [mailto:r-devel-bounces at r-project.org] On Behalf Of Ben
>
>>>>> Bolker > Sent: 13 December 2018 20:36 > To:
>>>>> r-devel at r-project.org > Subject: Re: [Rd]
Documentation
>>>>> examples for lm and glm
>>>>>> 
>>>>>> 
>>>>>> Agree.  Or just create the data frame with those
>>>>> variables in it > directly ...
>>>>>> 
>>>>>> On 2018-12-13 3:26 p.m., Thomas Yee wrote: > >
Hello,
>>>>>>> 
>>>>>>> something that has been on my mind for a decade or
>>>>> two has > > been the examples for lm() and glm().
They
>>>>> encourage poor style > > because of mismanagement of
data
>>>>> frames. Also, having the > > variables in a data
frame
>>>>> means that predict() > > is more likely to work
properly.
>>>>>>> 
>>>>>>> For lm(), the variables should be put into a data
>>>>> frame.  > > As 2 vectors are assigned first in the
>>>>> general workspace they > > should be deleted
afterwards.
>>>>>>> 
>>>>>>> For the glm(), the data frame d.AD is constructed
but
>>>>> not used. Also, > > its 3 components were assigned
first
>>>>> in the general workspace, so they > > float around
>>>>> dangerously afterwards like in the lm() example.
>>>>>>> 
>>>>>>> Rather than attached improved .Rd files here, they
>>>>> are put at > > www.stat.auckland.ac.nz/~yee/Rdfiles
> >
>>>>> You are welcome to use them!
>>>>>>> 
>>>>>>> Best,
>>>>>>> 
>>>>>>> Thomas
>>>>>>> 
>>>>>>> ______________________________________________ >
>
>>>>> R-devel at r-project.org mailing list > >
>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>>> 
>>>>>> ______________________________________________ >
>>>>> R-devel at r-project.org mailing list >
>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>> 
>>>>> 
>>>>>
*******************************************************************
>>>>> This email and any attachments are confidential. Any
>>>>> u...{{dropped:12}}
>>> 
>>>> ______________________________________________
>>>> R-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>> 
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

Heinz Tuechler

2018-Dec-17 16:36 UTC

head link

[Rd] Documentation examples for lm and glm

Dear John,

fully agreed! In the global environment I always keep my 
"data-variables" in a data.frame. However, if I look in help I like 
examples that start with the particular aspects of a function. It is 
important to know, if a function offers a data argument, but in the 
first line I don't need an example for the use of a data argument each 
time I look in help.

best,
Heinz

Fox, John wrote/hat geschrieben on/am 17.12.2018 16:23:> Dear Heinz,
>
>   ----------------------------------------------
>> On Dec 17, 2018, at 10:19 AM, Heinz Tuechler <tuechler at gmx.at>
wrote:
>>
>> Dear All,
>>
>> do you think that use of a data argument is best practice in the
example below?
>
> No, but it is *normally* or *usually* the best option, in my opinion.
>
> Best,
>  John
>
>>
>> regards,
>>
>> Heinz
>>
>> ### trivial example
>> plotwithline <- function(x, y) {
>>    plot(x, y)
>>    abline(lm(y~x)) ## data argument?
>> }
>>
>> set.seed(25)
>> df0 <- data.frame(x=rnorm(20), y=rnorm(20))
>>
>> plotwithline(df0[['x']], df0[['y']])
>>
>>
>>
>> Fox, John wrote/hat geschrieben on/am 17.12.2018 15:21:
>>> Dear Martin,
>>>
>>> I think that everyone agrees that it?s generally preferable to use
the data argument to lm() and I have nothing significant to add to the substance
of the discussion, but I think that it?s a mistake not to add to the current
examples, for the following reasons:
>>>
>>> (1) Relegating examples using the data argument to ?see also?
doesn?t suggest that using the argument is a best practice. Most users won?t
bother to click the links.
>>>
>>> (2) In my opinion, an new initial example using the data argument
would more clearly suggest that this is the normally the best option.
>>>
>>> (3) I think that it would also be desirable to add a remark to the
explanation of the data argument, something like, ?Although the argument is
optional, it's generally preferable to specify it explicitly.? And similarly
on the help page for glm().
>>>
>>> My two (or three) cents.
>>>
>>> John
>>>
>>>  -------------------------------------------------
>>>  John Fox, Professor Emeritus
>>>  McMaster University
>>>  Hamilton, Ontario, Canada
>>>  Web: http::/socserv.mcmaster.ca/jfox
>>>
>>>> On Dec 17, 2018, at 3:05 AM, Martin Maechler <maechler at
stat.math.ethz.ch> wrote:
>>>>
>>>>>>>>> David Hugh-Jones
>>>>>>>>>   on Sat, 15 Dec 2018 08:47:28 +0100
writes:
>>>>
>>>>> I would argue examples should encourage good
>>>>> practice. Beginners ought to learn to keep data in data
>>>>> frames and not to overuse attach().
>>>>
>>>> Note there's no attach() there in any of these examples!
>>>>
>>>>> otherwise at their own risk, but they have less need of
>>>>> explicit examples.
>>>>
>>>> The glm examples are nice in sofar they show both uses.
>>>>
>>>> I agree the lm() example(s) are  "didactically
misleading" by
>>>> not using data frames at all.
>>>>
>>>> I disagree that only data frame examples should be shown.
>>>> If  lm()  is one of the first R functions a beginneR must use
--
>>>> because they are in a basic stats class, say --  it may be
>>>> *better* didactically to focus on lm()  in the very first
>>>> example, and use data frames in a next one ...
>>>> .... and instead of next one, we have the pretty clear comment
>>>>
>>>> ### less simple examples in "See Also" above
>>>>
>>>> I'm not convinced (but you can try more) we should change
those
>>>> examples or add more there.
>>>>
>>>> Martin
>>>>
>>>>> On Fri, 14 Dec 2018 at 14:51, S Ellison
>>>>> <S.Ellison at lgcgroup.com> wrote:
>>>>
>>>>>> FWIW, before all the examples are changed to data frame
>>>>>> variants, I think there's fairly good reason to
have at
>>>>>> least _one_ example that does _not_ place variables in
a
>>>>>> data frame.
>>>>>>
>>>>>> The data argument in lm() is optional. And there is
more
>>>>>> than one way to manage data in a project. I personally
>>>>>> don't much like lots of stray variables lurking
about,
>>>>>> but if those are the only variables out there and we
can
>>>>>> be sure they aren't affected by other code,
it's hardly
>>>>>> essential to create a data frame to hold something you
>>>>>> already have.  Also, attach() is still part of R, for
>>>>>> those folk who have a data frame but want to reference
>>>>>> the contents across a wider range of functions without
>>>>>> using with() a lot. lm() can reasonably omit the data
>>>>>> argument there, too.
>>>>>>
>>>>>> So while there are good reasons to use data frames,
there
>>>>>> are also good reasons to provide examples that
don't.
>>>>>>
>>>>>> Steve Ellison
>>>>>>
>>>>>>
>>>>>>> -----Original Message----- > From: R-devel
>>>>>> [mailto:r-devel-bounces at r-project.org] On Behalf Of
Ben >
>>>>>> Bolker > Sent: 13 December 2018 20:36 > To:
>>>>>> r-devel at r-project.org > Subject: Re: [Rd]
Documentation
>>>>>> examples for lm and glm
>>>>>>>
>>>>>>>
>>>>>>> Agree.  Or just create the data frame with those
>>>>>> variables in it > directly ...
>>>>>>>
>>>>>>> On 2018-12-13 3:26 p.m., Thomas Yee wrote: >
> Hello,
>>>>>>>>
>>>>>>>> something that has been on my mind for a decade
or
>>>>>> two has > > been the examples for lm() and glm().
They
>>>>>> encourage poor style > > because of mismanagement
of data
>>>>>> frames. Also, having the > > variables in a data
frame
>>>>>> means that predict() > > is more likely to work
properly.
>>>>>>>>
>>>>>>>> For lm(), the variables should be put into a
data
>>>>>> frame.  > > As 2 vectors are assigned first in
the
>>>>>> general workspace they > > should be deleted
afterwards.
>>>>>>>>
>>>>>>>> For the glm(), the data frame d.AD is
constructed but
>>>>>> not used. Also, > > its 3 components were
assigned first
>>>>>> in the general workspace, so they > > float
around
>>>>>> dangerously afterwards like in the lm() example.
>>>>>>>>
>>>>>>>> Rather than attached improved .Rd files here,
they
>>>>>> are put at > >
www.stat.auckland.ac.nz/~yee/Rdfiles > >
>>>>>> You are welcome to use them!
>>>>>>>>
>>>>>>>> Best,
>>>>>>>>
>>>>>>>> Thomas
>>>>>>>>

Possibly Parallel Threads

Search for more possibly parallel threads

R devel - Dec 2018 - Documentation examples for lm and glm

[Rd] Documentation examples for lm and glm

[Rd] Documentation examples for lm and glm

[Rd] Documentation examples for lm and glm

[Rd] Documentation examples for lm and glm

[Rd] Documentation examples for lm and glm

[Rd] Documentation examples for lm and glm

[Rd] Documentation examples for lm and glm

[Rd] Documentation examples for lm and glm

Possibly Parallel Threads