peter dalgaard
2016-Feb-19 09:02 UTC
[Rd] should `data` respect default.stringsAsFactors()?
Aha... Hadn't noticed that stringsAsFactors only works via as.is in
read.table.
Yes, the doc should probably be fixed. The code probably not -- packages loading
different data sets depending on user options is an even worse idea than hav?ng
the option in the first place... (I don't mean having the possibility, I
mean the default.stringsAsFactor thing).
In general, read.table() gets many things wrong, if you don't set switches
and/or postprocess. E.g., even when you do intend to read factors, the
alphabetical level order is often not desired. My favourite workaround for
data() is to drop a corresponding foo.R file in the ./data directory. This will
be run in preference to loading foo.txt (or foo.csv, etc) and can contain, like,
dd <- read.table(foo.txt,.....)
dd$cook <- factor(dd$cook,
levels=c("rare","medium","well-done"))
etc.
-pd
> On 19 Feb 2016, at 01:39 , Joshua Ulrich <josh.m.ulrich at gmail.com>
wrote:
>
> On Thu, Feb 18, 2016 at 6:03 PM, Cook, Malcolm <MEC at stowers.org>
wrote:
>> Hi Peter,
>>
>> Sorry if I was not clear. Perhaps an example will make my point:
>>
>>> data(iris)
>>> class(iris$Species)
>> [1] "factor"
>>> write.table(iris,'data/myiris.tab')
>>> data(myiris)
>>> class(myiris$Species)
>> [1] "factor"
>>> rm(myiris)
>>> options(stringsAsFactors = FALSE)
>>> data(myiris)
>>> class(myiris$Species)
>> [1] "factor"
>>> myiris<-read.table("data/myiris.tab",header=TRUE)
>>> class(myiris$Species)
>> [1] "character"
>>
>> I am surprised to find that in the above
>> setting the global option stringsAsFactors = FALSE does NOT
effect how Species is being read in by the `data` function
>> whereas
>> setting the global option stringsAsFactors = FALSE DOES effect
how Species is being read in by read.table
>>
>> especially since data is documented as calling read.table.
>>
> To be explicit, it's documented as calling read.table(..., header >
TRUE) in this case, but it actually calls read.table(..., header > TRUE,
as.is = FALSE), which results in class(myiris$Species) of
> "factor".
>
> R>
myiris<-read.table("data/myiris.tab",header=TRUE,as.is=FALSE)
> R> class(myiris$Species)
> [1] "factor"
>
> So it seems like adding as.is = FALSE to the call in the documentation
> would clear this up.
>
>> In my opinion, one or the other should change (the behavior of data, or
the documentation).
>>
>> <bleep> <bleep>,
>>
>> ~ Malcolm
>>
>>
>>> -----Original Message-----
>>> From: peter dalgaard [mailto:pdalgd at gmail.com]
>>> Sent: Thursday, February 18, 2016 3:32 PM
>>> To: Cook, Malcolm <MEC at stowers.org>
>>> Cc: r-devel at stat.math.ethz.ch
>>> Subject: Re: [Rd] should `data` respect default.stringsAsFactors()?
>>>
>>> What the <bleep> are you on about? data() does many things,
only some of
>>> which call read.table() et al., and the ones that do have no
special treatment
>>> of stringsAsFactors.
>>>
>>> -pd
>>>
>>>> On 18 Feb 2016, at 21:25 , Cook, Malcolm <MEC at
stowers.org> wrote:
>>>>
>>>> Hiya,
>>>>
>>>> Probably been debated elsewhere....
>>>>
>>>> I note that R's `data` function does not respect
default.stringsAsFactors
>>>>
>>>> By my lights, it should, especially as it is documented to call
read.table,
>>> which DOES respect.
>>>>
>>>> Oh, but: http://r.789695.n4.nabble.com/stringsAsFactors-FALSE-
>>> tp921891p921893.html
>>>>
>>>> Compelling. I have to agree.
>>>>
>>>> So, I change my mind.
>>>>
>>>> By my lights, `data` should then be documented to NOT respect
>>> default.stringsAsFactors.
>>>>
>>>> Else?
>>>>
>>>> ~Malcolm Cook
>>>>
>>>> ______________________________________________
>>>> R-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>> --
>>> Peter Dalgaard, Professor,
>>> Center for Statistics, Copenhagen Business School
>>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>>> Phone: (+45)38153501
>>> Office: A 4.23
>>> Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
>
> --
> Joshua Ulrich | about.me/joshuaulrich
> FOSS Trading | www.fosstrading.com
> R/Finance 2016 | www.rinfinance.com
--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
Cook, Malcolm
2016-Feb-19 15:02 UTC
[Rd] should `data` respect default.stringsAsFactors()?
Hi, > Aha... Hadn't noticed that stringsAsFactors only works via as.is in read.table. > > Yes, the doc should probably be fixed. The code probably not Agreed. Is someone on-list authorized and willing to make the documentation change? I suppose I could learn what it takes to be a "player", but for such a trivial fix, it probably is overkill. Dissenting opinions?> -- packages> loading different data sets depending on user options is an even worse idea > than hav?ng the option in the first place... (I don't mean having the possibility, I > mean the default.stringsAsFactor thing). > > In general, read.table() gets many things wrong I agree with you that "read.table() gets many things wrong" and I too have my favorite workarounds - but that was not my concern. My concern is that data() does not work as documented. ~Malcolm> , if you don't set switches> and/or postprocess. E.g., even when you do intend to read factors, the > alphabetical level order is often not desired. My favourite workaround for > data() is to drop a corresponding foo.R file in the ./data directory. This will be > run in preference to loading foo.txt (or foo.csv, etc) and can contain, like, > > dd <- read.table(foo.txt,.....) > dd$cook <- factor(dd$cook, levels=c("rare","medium","well-done")) > > etc. > > -pd > > > > > On 19 Feb 2016, at 01:39 , Joshua Ulrich <josh.m.ulrich at gmail.com> wrote: > > > > On Thu, Feb 18, 2016 at 6:03 PM, Cook, Malcolm <MEC at stowers.org> > wrote: > >> Hi Peter, > >> > >> Sorry if I was not clear. Perhaps an example will make my point: > >> > >>> data(iris) > >>> class(iris$Species) > >> [1] "factor" > >>> write.table(iris,'data/myiris.tab') > >>> data(myiris) > >>> class(myiris$Species) > >> [1] "factor" > >>> rm(myiris) > >>> options(stringsAsFactors = FALSE) > >>> data(myiris) > >>> class(myiris$Species) > >> [1] "factor" > >>> myiris<-read.table("data/myiris.tab",header=TRUE) > >>> class(myiris$Species) > >> [1] "character" > >> > >> I am surprised to find that in the above > >> setting the global option stringsAsFactors = FALSE does NOT effect > how Species is being read in by the `data` function > >> whereas > >> setting the global option stringsAsFactors = FALSE DOES effect how > Species is being read in by read.table > >> > >> especially since data is documented as calling read.table. > >> > > To be explicit, it's documented as calling read.table(..., header > > TRUE) in this case, but it actually calls read.table(..., header > > TRUE, as.is = FALSE), which results in class(myiris$Species) of > > "factor". > > > > R> myiris<-read.table("data/myiris.tab",header=TRUE,as.is=FALSE) > > R> class(myiris$Species) > > [1] "factor" > > > > So it seems like adding as.is = FALSE to the call in the documentation > > would clear this up. > > > >> In my opinion, one or the other should change (the behavior of data, or the > documentation). > >> > >> <bleep> <bleep>, > >> > >> ~ Malcolm > >> > >> > >>> -----Original Message----- > >>> From: peter dalgaard [mailto:pdalgd at gmail.com] > >>> Sent: Thursday, February 18, 2016 3:32 PM > >>> To: Cook, Malcolm <MEC at stowers.org> > >>> Cc: r-devel at stat.math.ethz.ch > >>> Subject: Re: [Rd] should `data` respect default.stringsAsFactors()? > >>> > >>> What the <bleep> are you on about? data() does many things, only some > of > >>> which call read.table() et al., and the ones that do have no special > treatment > >>> of stringsAsFactors. > >>> > >>> -pd > >>> > >>>> On 18 Feb 2016, at 21:25 , Cook, Malcolm <MEC at stowers.org> wrote: > >>>> > >>>> Hiya, > >>>> > >>>> Probably been debated elsewhere.... > >>>> > >>>> I note that R's `data` function does not respect default.stringsAsFactors > >>>> > >>>> By my lights, it should, especially as it is documented to call read.table, > >>> which DOES respect. > >>>> > >>>> Oh, but: http://r.789695.n4.nabble.com/stringsAsFactors-FALSE- > >>> tp921891p921893.html > >>>> > >>>> Compelling. I have to agree. > >>>> > >>>> So, I change my mind. > >>>> > >>>> By my lights, `data` should then be documented to NOT respect > >>> default.stringsAsFactors. > >>>> > >>>> Else? > >>>> > >>>> ~Malcolm Cook > >>>> > >>>> ______________________________________________ > >>>> R-devel at r-project.org mailing list > >>>> https://stat.ethz.ch/mailman/listinfo/r-devel > >>> > >>> -- > >>> Peter Dalgaard, Professor, > >>> Center for Statistics, Copenhagen Business School > >>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark > >>> Phone: (+45)38153501 > >>> Office: A 4.23 > >>> Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >> > >> ______________________________________________ > >> R-devel at r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-devel > > > > > > > > -- > > Joshua Ulrich | about.me/joshuaulrich > > FOSS Trading | www.fosstrading.com > > R/Finance 2016 | www.rinfinance.com > > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Office: A 4.23 > Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com > > > > > > > >
peter dalgaard
2016-Feb-19 15:23 UTC
[Rd] should `data` respect default.stringsAsFactors()?
On 19 Feb 2016, at 16:02 , Cook, Malcolm <MEC at stowers.org> wrote:> Hi, > >> Aha... Hadn't noticed that stringsAsFactors only works via as.is in read.table. >> >> Yes, the doc should probably be fixed. The code probably not > > Agreed. > > Is someone on-list authorized and willing to make the documentation change? I suppose I could learn what it takes to be a "player", but for such a trivial fix, it probably is overkill. Dissenting opinions?I have fixed it for r-devel. -pd> >> -- packages >> loading different data sets depending on user options is an even worse idea >> than hav?ng the option in the first place... (I don't mean having the possibility, I >> mean the default.stringsAsFactor thing). >> >> In general, read.table() gets many things wrong > > I agree with you that "read.table() gets many things wrong" and I too have my favorite workarounds - but that was not my concern. My concern is that data() does not work as documented. > > ~Malcolm > >> , if you don't set switches >> and/or postprocess. E.g., even when you do intend to read factors, the >> alphabetical level order is often not desired. My favourite workaround for >> data() is to drop a corresponding foo.R file in the ./data directory. This will be >> run in preference to loading foo.txt (or foo.csv, etc) and can contain, like, >> >> dd <- read.table(foo.txt,.....) >> dd$cook <- factor(dd$cook, levels=c("rare","medium","well-done")) >> >> etc. >> >> -pd >> >> >> >>> On 19 Feb 2016, at 01:39 , Joshua Ulrich <josh.m.ulrich at gmail.com> wrote: >>> >>> On Thu, Feb 18, 2016 at 6:03 PM, Cook, Malcolm <MEC at stowers.org> >> wrote: >>>> Hi Peter, >>>> >>>> Sorry if I was not clear. Perhaps an example will make my point: >>>> >>>>> data(iris) >>>>> class(iris$Species) >>>> [1] "factor" >>>>> write.table(iris,'data/myiris.tab') >>>>> data(myiris) >>>>> class(myiris$Species) >>>> [1] "factor" >>>>> rm(myiris) >>>>> options(stringsAsFactors = FALSE) >>>>> data(myiris) >>>>> class(myiris$Species) >>>> [1] "factor" >>>>> myiris<-read.table("data/myiris.tab",header=TRUE) >>>>> class(myiris$Species) >>>> [1] "character" >>>> >>>> I am surprised to find that in the above >>>> setting the global option stringsAsFactors = FALSE does NOT effect >> how Species is being read in by the `data` function >>>> whereas >>>> setting the global option stringsAsFactors = FALSE DOES effect how >> Species is being read in by read.table >>>> >>>> especially since data is documented as calling read.table. >>>> >>> To be explicit, it's documented as calling read.table(..., header >>> TRUE) in this case, but it actually calls read.table(..., header >>> TRUE, as.is = FALSE), which results in class(myiris$Species) of >>> "factor". >>> >>> R> myiris<-read.table("data/myiris.tab",header=TRUE,as.is=FALSE) >>> R> class(myiris$Species) >>> [1] "factor" >>> >>> So it seems like adding as.is = FALSE to the call in the documentation >>> would clear this up. >>> >>>> In my opinion, one or the other should change (the behavior of data, or the >> documentation). >>>> >>>> <bleep> <bleep>, >>>> >>>> ~ Malcolm >>>> >>>> >>>>> -----Original Message----- >>>>> From: peter dalgaard [mailto:pdalgd at gmail.com] >>>>> Sent: Thursday, February 18, 2016 3:32 PM >>>>> To: Cook, Malcolm <MEC at stowers.org> >>>>> Cc: r-devel at stat.math.ethz.ch >>>>> Subject: Re: [Rd] should `data` respect default.stringsAsFactors()? >>>>> >>>>> What the <bleep> are you on about? data() does many things, only some >> of >>>>> which call read.table() et al., and the ones that do have no special >> treatment >>>>> of stringsAsFactors. >>>>> >>>>> -pd >>>>> >>>>>> On 18 Feb 2016, at 21:25 , Cook, Malcolm <MEC at stowers.org> wrote: >>>>>> >>>>>> Hiya, >>>>>> >>>>>> Probably been debated elsewhere.... >>>>>> >>>>>> I note that R's `data` function does not respect default.stringsAsFactors >>>>>> >>>>>> By my lights, it should, especially as it is documented to call read.table, >>>>> which DOES respect. >>>>>> >>>>>> Oh, but: http://r.789695.n4.nabble.com/stringsAsFactors-FALSE- >>>>> tp921891p921893.html >>>>>> >>>>>> Compelling. I have to agree. >>>>>> >>>>>> So, I change my mind. >>>>>> >>>>>> By my lights, `data` should then be documented to NOT respect >>>>> default.stringsAsFactors. >>>>>> >>>>>> Else? >>>>>> >>>>>> ~Malcolm Cook >>>>>> >>>>>> ______________________________________________ >>>>>> R-devel at r-project.org mailing list >>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel >>>>> >>>>> -- >>>>> Peter Dalgaard, Professor, >>>>> Center for Statistics, Copenhagen Business School >>>>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark >>>>> Phone: (+45)38153501 >>>>> Office: A 4.23 >>>>> Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> ______________________________________________ >>>> R-devel at r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> >>> >>> >>> -- >>> Joshua Ulrich | about.me/joshuaulrich >>> FOSS Trading | www.fosstrading.com >>> R/Finance 2016 | www.rinfinance.com >> >> -- >> Peter Dalgaard, Professor, >> Center for Statistics, Copenhagen Business School >> Solbjerg Plads 3, 2000 Frederiksberg, Denmark >> Phone: (+45)38153501 >> Office: A 4.23 >> Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com >> >> >> >> >> >> >> >> >-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com