thr3ads.net - R devel - [Rd] Incorrect Import by Data for CSV File [Sep 2017]

If this information is useful, please help other people find it:
Share via:

Dario Strbenac

2017-Sep-25 07:00 UTC

[Rd] Incorrect Import by Data for CSV File

Good day,

The data function can import a variety of file formats, one of them being C.S.V.
Problematically, all of the table columns are collapsed into a single data frame
column. This occurs because "files ending .csv or .CSV are read using
read.table(..., header = TRUE, sep = ";", as.is=FALSE)". I
suggest that the semi-colon used as the column separator be changed to a comma.

--------------------------------------
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia

Prof Brian Ripley

2017-Sep-25 12:27 UTC

head link

[Rd] Incorrect Import by Data for CSV File

On 25/09/2017 08:00, Dario Strbenac wrote:> Good day,
> 
> The data function can import a variety of file formats, one of them being
C.S.V.
That isn't its documented purpose.  It was the original way for packages 
to provide datasets as needed (before lazy data was added).

Problematically, all of the table columns are collapsed into a single 
data frame column. This occurs because "files ending .csv or .CSV are 
read using read.table(..., header = TRUE, sep = ";",
as.is=FALSE)". I
suggest that the semi-colon used as the column separator be changed to a 
comma.

We suggest you read the documentation ... the (non-English-locales) 
version with a semicolon separator is one of four documented formats, 
and the English-language one is not.  Even if it were desirable it would 
not be possible to make a backwards-incompatible change after almost 20 
years.

It really isn't clear why anyone would want to use anything other than 
the second option (.rda) for data() unless other manipulations are 
needed (e.g. to attach a package).  But that option was not part of the 
original implementation.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Emeritus Professor of Applied Statistics, University of Oxford

peter dalgaard

2017-Sep-25 16:03 UTC

head link

[Rd] Incorrect Import by Data for CSV File

> On 25 Sep 2017, at 14:27 , Prof Brian Ripley <ripley at
stats.ox.ac.uk> wrote:
> 
> On 25/09/2017 08:00, Dario Strbenac wrote:
>> Good day,
>> The data function can import a variety of file formats, one of them
being C.S.V.
> 
> That isn't its documented purpose.  It was the original way for
packages to provide datasets as needed (before lazy data was added).
> 
> Problematically, all of the table columns are collapsed into a single data
frame column. This occurs because "files ending .csv or .CSV are read using
read.table(..., header = TRUE, sep = ";", as.is=FALSE)". I
suggest that the semi-colon used as the column separator be changed to a comma.
> 
> We suggest you read the documentation ... the (non-English-locales) version
with a semicolon separator is one of four documented formats, and the
English-language one is not.  Even if it were desirable it would not be possible
to make a backwards-incompatible change after almost 20 years.
> 
> It really isn't clear why anyone would want to use anything other than
the second option (.rda) for data() unless other manipulations are needed (e.g.
to attach a package).  But that option was not part of the original
implementation.
> 
It can be handy to have raw ascii data included in a package for people to see,
but then you can use the .R mechanism to read the data. It is done for a couple
of cases in the ISwR package, see e.g. the stroke.R and stroke.csv pair. This
also allows you to fix up other things that you have no chcance of specifying
directly in the file:

stroke <-  read.csv2("stroke.csv", na.strings=".")
names(stroke) <- tolower(names(stroke))
stroke <-  within(stroke,{
    sex <-
factor(sex,levels=0:1,labels=c("Female","Male"))
    dgn <- factor(dgn)
    coma <- factor(coma, levels=0:1,
labels=c("No","Yes"))
    minf <- factor(minf, levels=0:1,
labels=c("No","Yes"))
    diab <- factor(diab, levels=0:1,
labels=c("No","Yes"))
    han <- factor(han, levels=0:1, labels=c("No","Yes"))
    died <- as.Date(died, format="%d.%m.%Y")
    end <- pmin(died, as.Date("1996-01-01"), na.rm=TRUE)
    dstr <- as.Date(dstr,format="%d.%m.%Y")
    obsmonths <- as.numeric(end-dstr, "days")/30.6
    obsmonths[obsmonths==0] <- 0.1
    dead <- !is.na(died) & died < as.Date("1996-01-01")
    died[!dead] <- NA
    rm(end)
})


-pd

> -- 
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Emeritus Professor of Applied Statistics, University of Oxford
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

Seemingly Similar Threads

Search for more possibly parallel threads

R devel - Sep 2017 - Incorrect Import by Data for CSV File

[Rd] Incorrect Import by Data for CSV File

[Rd] Incorrect Import by Data for CSV File

[Rd] Incorrect Import by Data for CSV File

Seemingly Similar Threads