Cook, Malcolm
2016-Feb-19 00:03 UTC
[Rd] should `data` respect default.stringsAsFactors()?
Hi Peter, Sorry if I was not clear. Perhaps an example will make my point:> data(iris) > class(iris$Species)[1] "factor"> write.table(iris,'data/myiris.tab') > data(myiris) > class(myiris$Species)[1] "factor"> rm(myiris) > options(stringsAsFactors = FALSE) > data(myiris) > class(myiris$Species)[1] "factor"> myiris<-read.table("data/myiris.tab",header=TRUE) > class(myiris$Species)[1] "character" I am surprised to find that in the above setting the global option stringsAsFactors = FALSE does NOT effect how Species is being read in by the `data` function whereas setting the global option stringsAsFactors = FALSE DOES effect how Species is being read in by read.table especially since data is documented as calling read.table. In my opinion, one or the other should change (the behavior of data, or the documentation). <bleep> <bleep>, ~ Malcolm > -----Original Message----- > From: peter dalgaard [mailto:pdalgd at gmail.com] > Sent: Thursday, February 18, 2016 3:32 PM > To: Cook, Malcolm <MEC at stowers.org> > Cc: r-devel at stat.math.ethz.ch > Subject: Re: [Rd] should `data` respect default.stringsAsFactors()? > > What the <bleep> are you on about? data() does many things, only some of > which call read.table() et al., and the ones that do have no special treatment > of stringsAsFactors. > > -pd > > > On 18 Feb 2016, at 21:25 , Cook, Malcolm <MEC at stowers.org> wrote: > > > > Hiya, > > > > Probably been debated elsewhere.... > > > > I note that R's `data` function does not respect default.stringsAsFactors > > > > By my lights, it should, especially as it is documented to call read.table, > which DOES respect. > > > > Oh, but: http://r.789695.n4.nabble.com/stringsAsFactors-FALSE- > tp921891p921893.html > > > > Compelling. I have to agree. > > > > So, I change my mind. > > > > By my lights, `data` should then be documented to NOT respect > default.stringsAsFactors. > > > > Else? > > > > ~Malcolm Cook > > > > ______________________________________________ > > R-devel at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Office: A 4.23 > Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com > > > > > > > >
Joshua Ulrich
2016-Feb-19 00:39 UTC
[Rd] should `data` respect default.stringsAsFactors()?
On Thu, Feb 18, 2016 at 6:03 PM, Cook, Malcolm <MEC at stowers.org> wrote:> Hi Peter, > > Sorry if I was not clear. Perhaps an example will make my point: > >> data(iris) >> class(iris$Species) > [1] "factor" >> write.table(iris,'data/myiris.tab') >> data(myiris) >> class(myiris$Species) > [1] "factor" >> rm(myiris) >> options(stringsAsFactors = FALSE) >> data(myiris) >> class(myiris$Species) > [1] "factor" >> myiris<-read.table("data/myiris.tab",header=TRUE) >> class(myiris$Species) > [1] "character" > > I am surprised to find that in the above > setting the global option stringsAsFactors = FALSE does NOT effect how Species is being read in by the `data` function > whereas > setting the global option stringsAsFactors = FALSE DOES effect how Species is being read in by read.table > > especially since data is documented as calling read.table. >To be explicit, it's documented as calling read.table(..., header TRUE) in this case, but it actually calls read.table(..., header TRUE, as.is = FALSE), which results in class(myiris$Species) of "factor". R> myiris<-read.table("data/myiris.tab",header=TRUE,as.is=FALSE) R> class(myiris$Species) [1] "factor" So it seems like adding as.is = FALSE to the call in the documentation would clear this up.> In my opinion, one or the other should change (the behavior of data, or the documentation). > > <bleep> <bleep>, > > ~ Malcolm > > > > -----Original Message----- > > From: peter dalgaard [mailto:pdalgd at gmail.com] > > Sent: Thursday, February 18, 2016 3:32 PM > > To: Cook, Malcolm <MEC at stowers.org> > > Cc: r-devel at stat.math.ethz.ch > > Subject: Re: [Rd] should `data` respect default.stringsAsFactors()? > > > > What the <bleep> are you on about? data() does many things, only some of > > which call read.table() et al., and the ones that do have no special treatment > > of stringsAsFactors. > > > > -pd > > > > > On 18 Feb 2016, at 21:25 , Cook, Malcolm <MEC at stowers.org> wrote: > > > > > > Hiya, > > > > > > Probably been debated elsewhere.... > > > > > > I note that R's `data` function does not respect default.stringsAsFactors > > > > > > By my lights, it should, especially as it is documented to call read.table, > > which DOES respect. > > > > > > Oh, but: http://r.789695.n4.nabble.com/stringsAsFactors-FALSE- > > tp921891p921893.html > > > > > > Compelling. I have to agree. > > > > > > So, I change my mind. > > > > > > By my lights, `data` should then be documented to NOT respect > > default.stringsAsFactors. > > > > > > Else? > > > > > > ~Malcolm Cook > > > > > > ______________________________________________ > > > R-devel at r-project.org mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > -- > > Peter Dalgaard, Professor, > > Center for Statistics, Copenhagen Business School > > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > > Phone: (+45)38153501 > > Office: A 4.23 > > Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com > > > > > > > > > > > > > > > > > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-- Joshua Ulrich | about.me/joshuaulrich FOSS Trading | www.fosstrading.com R/Finance 2016 | www.rinfinance.com
Michael Nelson
2016-Feb-19 04:58 UTC
[Rd] should `data` respect default.stringsAsFactors()?
As Peter pointed out. data loads data from packages. Various formats are supported. The package author(s) will decide how best to ship (and load) any such data. When you call `data(iris)`, it loads iris as it is defined in the datasets package The definition can be seen here: https://github.com/wch/r-source/blob/trunk/src/library/datasets/data/iris.R You will note that Species is explicitly a factor and it won't have been read in by read.table, but by being "source()d" because it is a .R file. Michael ________________________________________ From: R-devel [r-devel-bounces at r-project.org] on behalf of Cook, Malcolm [MEC at stowers.org] Sent: Friday, 19 February 2016 11:03 AM To: 'peter dalgaard' Cc: r-devel at stat.math.ethz.ch Subject: Re: [Rd] should `data` respect default.stringsAsFactors()? Hi Peter, Sorry if I was not clear. Perhaps an example will make my point:> data(iris) > class(iris$Species)[1] "factor"> write.table(iris,'data/myiris.tab') > data(myiris) > class(myiris$Species)[1] "factor"> rm(myiris) > options(stringsAsFactors = FALSE) > data(myiris) > class(myiris$Species)[1] "factor"> myiris<-read.table("data/myiris.tab",header=TRUE) > class(myiris$Species)[1] "character" I am surprised to find that in the above setting the global option stringsAsFactors = FALSE does NOT effect how Species is being read in by the `data` function whereas setting the global option stringsAsFactors = FALSE DOES effect how Species is being read in by read.table especially since data is documented as calling read.table. In my opinion, one or the other should change (the behavior of data, or the documentation). <bleep> <bleep>, ~ Malcolm > -----Original Message----- > From: peter dalgaard [mailto:pdalgd at gmail.com] > Sent: Thursday, February 18, 2016 3:32 PM > To: Cook, Malcolm <MEC at stowers.org> > Cc: r-devel at stat.math.ethz.ch > Subject: Re: [Rd] should `data` respect default.stringsAsFactors()? > > What the <bleep> are you on about? data() does many things, only some of > which call read.table() et al., and the ones that do have no special treatment > of stringsAsFactors. > > -pd > > > On 18 Feb 2016, at 21:25 , Cook, Malcolm <MEC at stowers.org> wrote: > > > > Hiya, > > > > Probably been debated elsewhere.... > > > > I note that R's `data` function does not respect default.stringsAsFactors > > > > By my lights, it should, especially as it is documented to call read.table, > which DOES respect. > > > > Oh, but: http://r.789695.n4.nabble.com/stringsAsFactors-FALSE- > tp921891p921893.html > > > > Compelling. I have to agree. > > > > So, I change my mind. > > > > By my lights, `data` should then be documented to NOT respect > default.stringsAsFactors. > > > > Else? > > > > ~Malcolm Cook > > > > ______________________________________________ > > R-devel at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Office: A 4.23 > Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com > > > > > > > > ______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
peter dalgaard
2016-Feb-19 09:02 UTC
[Rd] should `data` respect default.stringsAsFactors()?
Aha... Hadn't noticed that stringsAsFactors only works via as.is in read.table. Yes, the doc should probably be fixed. The code probably not -- packages loading different data sets depending on user options is an even worse idea than hav?ng the option in the first place... (I don't mean having the possibility, I mean the default.stringsAsFactor thing). In general, read.table() gets many things wrong, if you don't set switches and/or postprocess. E.g., even when you do intend to read factors, the alphabetical level order is often not desired. My favourite workaround for data() is to drop a corresponding foo.R file in the ./data directory. This will be run in preference to loading foo.txt (or foo.csv, etc) and can contain, like, dd <- read.table(foo.txt,.....) dd$cook <- factor(dd$cook, levels=c("rare","medium","well-done")) etc. -pd> On 19 Feb 2016, at 01:39 , Joshua Ulrich <josh.m.ulrich at gmail.com> wrote: > > On Thu, Feb 18, 2016 at 6:03 PM, Cook, Malcolm <MEC at stowers.org> wrote: >> Hi Peter, >> >> Sorry if I was not clear. Perhaps an example will make my point: >> >>> data(iris) >>> class(iris$Species) >> [1] "factor" >>> write.table(iris,'data/myiris.tab') >>> data(myiris) >>> class(myiris$Species) >> [1] "factor" >>> rm(myiris) >>> options(stringsAsFactors = FALSE) >>> data(myiris) >>> class(myiris$Species) >> [1] "factor" >>> myiris<-read.table("data/myiris.tab",header=TRUE) >>> class(myiris$Species) >> [1] "character" >> >> I am surprised to find that in the above >> setting the global option stringsAsFactors = FALSE does NOT effect how Species is being read in by the `data` function >> whereas >> setting the global option stringsAsFactors = FALSE DOES effect how Species is being read in by read.table >> >> especially since data is documented as calling read.table. >> > To be explicit, it's documented as calling read.table(..., header > TRUE) in this case, but it actually calls read.table(..., header > TRUE, as.is = FALSE), which results in class(myiris$Species) of > "factor". > > R> myiris<-read.table("data/myiris.tab",header=TRUE,as.is=FALSE) > R> class(myiris$Species) > [1] "factor" > > So it seems like adding as.is = FALSE to the call in the documentation > would clear this up. > >> In my opinion, one or the other should change (the behavior of data, or the documentation). >> >> <bleep> <bleep>, >> >> ~ Malcolm >> >> >>> -----Original Message----- >>> From: peter dalgaard [mailto:pdalgd at gmail.com] >>> Sent: Thursday, February 18, 2016 3:32 PM >>> To: Cook, Malcolm <MEC at stowers.org> >>> Cc: r-devel at stat.math.ethz.ch >>> Subject: Re: [Rd] should `data` respect default.stringsAsFactors()? >>> >>> What the <bleep> are you on about? data() does many things, only some of >>> which call read.table() et al., and the ones that do have no special treatment >>> of stringsAsFactors. >>> >>> -pd >>> >>>> On 18 Feb 2016, at 21:25 , Cook, Malcolm <MEC at stowers.org> wrote: >>>> >>>> Hiya, >>>> >>>> Probably been debated elsewhere.... >>>> >>>> I note that R's `data` function does not respect default.stringsAsFactors >>>> >>>> By my lights, it should, especially as it is documented to call read.table, >>> which DOES respect. >>>> >>>> Oh, but: http://r.789695.n4.nabble.com/stringsAsFactors-FALSE- >>> tp921891p921893.html >>>> >>>> Compelling. I have to agree. >>>> >>>> So, I change my mind. >>>> >>>> By my lights, `data` should then be documented to NOT respect >>> default.stringsAsFactors. >>>> >>>> Else? >>>> >>>> ~Malcolm Cook >>>> >>>> ______________________________________________ >>>> R-devel at r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> >>> -- >>> Peter Dalgaard, Professor, >>> Center for Statistics, Copenhagen Business School >>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark >>> Phone: (+45)38153501 >>> Office: A 4.23 >>> Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com >>> >>> >>> >>> >>> >>> >>> >>> >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > > > > -- > Joshua Ulrich | about.me/joshuaulrich > FOSS Trading | www.fosstrading.com > R/Finance 2016 | www.rinfinance.com-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
Cook, Malcolm
2016-Feb-19 14:54 UTC
[Rd] should `data` respect default.stringsAsFactors()?
Joshua,> On Thu, Feb 18, 2016 at 6:03 PM, Cook, Malcolm <MEC at stowers.org> wrote:> > Hi Peter, > > > > Sorry if I was not clear. Perhaps an example will make my point: > > > >> data(iris) > >> class(iris$Species) > > [1] "factor" > >> write.table(iris,'data/myiris.tab') > >> data(myiris) > >> class(myiris$Species) > > [1] "factor" > >> rm(myiris) > >> options(stringsAsFactors = FALSE) > >> data(myiris) > >> class(myiris$Species) > > [1] "factor" > >> myiris<-read.table("data/myiris.tab",header=TRUE) > >> class(myiris$Species) > > [1] "character" > > > > I am surprised to find that in the above > > setting the global option stringsAsFactors = FALSE does NOT effect > how Species is being read in by the `data` function > > whereas > > setting the global option stringsAsFactors = FALSE DOES effect how > Species is being read in by read.table > > > > especially since data is documented as calling read.table. > > > To be explicit, it's documented as calling read.table(..., header > TRUE) in this case, but it actually calls read.table(..., header > TRUE, as.is = FALSE), which results in class(myiris$Species) of > "factor". Aha - makes sense. > > R> myiris<-read.table("data/myiris.tab",header=TRUE,as.is=FALSE) > R> class(myiris$Species) > [1] "factor" > > So it seems like adding as.is = FALSE to the call in the documentation > would clear this up. I agree - thanks for digging into the source - you have unearthed the root cause. ~Malcolm > > In my opinion, one or the other should change (the behavior of data, or the > documentation). > > > > <bleep> <bleep>, > > > > ~ Malcolm > > > > > > > -----Original Message----- > > > From: peter dalgaard [mailto:pdalgd at gmail.com] > > > Sent: Thursday, February 18, 2016 3:32 PM > > > To: Cook, Malcolm <MEC at stowers.org> > > > Cc: r-devel at stat.math.ethz.ch > > > Subject: Re: [Rd] should `data` respect default.stringsAsFactors()? > > > > > > What the <bleep> are you on about? data() does many things, only some > of > > > which call read.table() et al., and the ones that do have no special > treatment > > > of stringsAsFactors. > > > > > > -pd > > > > > > > On 18 Feb 2016, at 21:25 , Cook, Malcolm <MEC at stowers.org> wrote: > > > > > > > > Hiya, > > > > > > > > Probably been debated elsewhere.... > > > > > > > > I note that R's `data` function does not respect default.stringsAsFactors > > > > > > > > By my lights, it should, especially as it is documented to call read.table, > > > which DOES respect. > > > > > > > > Oh, but: http://r.789695.n4.nabble.com/stringsAsFactors-FALSE- > > > tp921891p921893.html > > > > > > > > Compelling. I have to agree. > > > > > > > > So, I change my mind. > > > > > > > > By my lights, `data` should then be documented to NOT respect > > > default.stringsAsFactors. > > > > > > > > Else? > > > > > > > > ~Malcolm Cook > > > > > > > > ______________________________________________ > > > > R-devel at r-project.org mailing list > > > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > > > -- > > > Peter Dalgaard, Professor, > > > Center for Statistics, Copenhagen Business School > > > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > > > Phone: (+45)38153501 > > > Office: A 4.23 > > > Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com > > > > > > > > > > > > > > > > > > > > > > > > > > > > ______________________________________________ > > R-devel at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > -- > Joshua Ulrich | about.me/joshuaulrich > FOSS Trading | www.fosstrading.com > R/Finance 2016 | www.rinfinance.com