thr3ads.net - R devel - [Rd] On read.csv and write.csv [Jul 2021]

If this information is useful, please help other people find it:
Share via:

Stephen Ellison

2021-Jun-30 21:15 UTC

[Rd] On read.csv and write.csv

Apologies if this is a well-worn question; I haven?t found it so far but
there's a lot of r-dev and I may have missed it in the archives. In the mean
time:

I've managed to avoid writing csv files with R for a couple of decades but
we're swopping data with a collaborator and I've tripped over an
inconsistency between read.csv and write.csv that seems less than helpful.
The default line number behaviour for read.csv is to assume that, when the
number of items in the first row is one less than the number in the second, that
the first column contains row names. write.csv, however, includes an empty
string ("") as the first header entry over row names when writing. On
rereading, the original row names are then treated as data with unknown name,
replaced by "X".

That means that, unlike read.table and write.table,  something written with
write.csv is not read back correctly by read.csv .

Is that intentional?
And whether it is intentional or not, is it wise?

Example:

( D1 <- data.frame(A=letters[1:5], N=1:5, Y=rnorm(5) ) )
write.csv(D1, "temp.csv")

( D1w <- read.csv("temp.csv") )

# Note the unnecessary new X column ...
#Tidy up
unlink("temp.csv")

This differs from the parent .table defaults; write.table doesn?t add the extra
"" column label, so the object read back with read.table does not
contain an unwanted extra column.

Wouldn?t it be more sensible if write.csv() and read.csv() were consistent in
the same sense as read.table and write.table?
Or at least if there were a switch (as.read.csv=TRUE ?) to tell write.csv to
omit the initial "", or vice versa?

Currently using R version 4.1.0 on Windows, but this reproduces at least as far
back as 3.6

Steve E


*******************************************************************
This email and any attachments are confidential. Any use, copying or
disclosure other than by the intended recipient is unauthorised. If 
you have received this message in error, please notify the sender 
immediately via +44(0)20 8943 7000 or notify postmaster at lgcgroup.com 
and delete this message and any copies from your computer and network. 
LGC Limited. Registered in England 2991879. 
Registered office: Queens Road, Teddington, Middlesex, TW11 0LY, UK

Gabriel Becker

2021-Jun-30 23:02 UTC

head link

[Rd] On read.csv and write.csv

Hi Stephen,

Personally, I don't have super strong feelings about this, but
https://datatracker.ietf.org/doc/html/rfc4180#section-2 does say that the
optional header line should have the same number of fields as the data
records, so in as much as that is the "CSV specification", R's
read.csv
behavior is supporting an extension, whereas its write.csv is
outputting "standard" compliant csv.

It is possible that one or a few of the mentioned multitude of independent
specs do specify header can have one less, I don't know, but if so,
according to the ietf, its not overly common.

I can't even speak to whether that is why the behavior is as it is, but I
figured it was worth mentioning.

~G

On Wed, Jun 30, 2021 at 2:15 PM Stephen Ellison <S.Ellison at
lgcgroup.com>
wrote:
> Apologies if this is a well-worn question; I haven?t found it so far but
> there's a lot of r-dev and I may have missed it in the archives. In the
> mean time:
>
> I've managed to avoid writing csv files with R for a couple of decades
but
> we're swopping data with a collaborator and I've tripped over an
> inconsistency between read.csv and write.csv that seems less than helpful.
> The default line number behaviour for read.csv is to assume that, when the
> number of items in the first row is one less than the number in the second,
> that the first column contains row names. write.csv, however, includes an
> empty string ("") as the first header entry over row names when
writing. On
> rereading, the original row names are then treated as data with unknown
> name, replaced by "X".
>
> That means that, unlike read.table and write.table,  something written
> with write.csv is not read back correctly by read.csv .
>
> Is that intentional?
> And whether it is intentional or not, is it wise?
>
> Example:
>
> ( D1 <- data.frame(A=letters[1:5], N=1:5, Y=rnorm(5) ) )
> write.csv(D1, "temp.csv")
>
> ( D1w <- read.csv("temp.csv") )
>
> # Note the unnecessary new X column ...
> #Tidy up
> unlink("temp.csv")
>
> This differs from the parent .table defaults; write.table doesn?t add the
> extra "" column label, so the object read back with read.table
does not
> contain an unwanted extra column.
>
> Wouldn?t it be more sensible if write.csv() and read.csv() were consistent
> in the same sense as read.table and write.table?
> Or at least if there were a switch (as.read.csv=TRUE ?) to tell write.csv
> to omit the initial "", or vice versa?
>
> Currently using R version 4.1.0 on Windows, but this reproduces at least
> as far back as 3.6
>
> Steve E
>
>
> *******************************************************************
> This email and any attachments are confidential. Any u...{{dropped:15}}

Simon Urbanek

2021-Jul-01 02:18 UTC

head link

[Rd] On read.csv and write.csv

Stephen,

the "unhelpful" column are the row names. They are considered an
important part of a data frame and therefore the default (row.names = TRUE) is
to not lose them (as there is no way back once you do). If you don't want to
preserve the row names you can simply set row.names=FALSE.

Cheers,
Simon

PS: this is likely a question for R-help rather than R-devel


> On 1/07/2021, at 9:15 AM, Stephen Ellison <S.Ellison at LGCGroup.com>
wrote:
> 
> Apologies if this is a well-worn question; I haven?t found it so far but
there's a lot of r-dev and I may have missed it in the archives. In the mean
time:
> 
> I've managed to avoid writing csv files with R for a couple of decades
but we're swopping data with a collaborator and I've tripped over an
inconsistency between read.csv and write.csv that seems less than helpful.
> The default line number behaviour for read.csv is to assume that, when the
number of items in the first row is one less than the number in the second, that
the first column contains row names. write.csv, however, includes an empty
string ("") as the first header entry over row names when writing. On
rereading, the original row names are then treated as data with unknown name,
replaced by "X".
> 
> That means that, unlike read.table and write.table,  something written with
write.csv is not read back correctly by read.csv .
> 
> Is that intentional?
> And whether it is intentional or not, is it wise?
> 
> Example:
> 
> ( D1 <- data.frame(A=letters[1:5], N=1:5, Y=rnorm(5) ) )
> write.csv(D1, "temp.csv")
> 
> ( D1w <- read.csv("temp.csv") )
> 
> # Note the unnecessary new X column ...
> #Tidy up
> unlink("temp.csv")
> 
> This differs from the parent .table defaults; write.table doesn?t add the
extra "" column label, so the object read back with read.table does
not contain an unwanted extra column.
> 
> Wouldn?t it be more sensible if write.csv() and read.csv() were consistent
in the same sense as read.table and write.table?
> Or at least if there were a switch (as.read.csv=TRUE ?) to tell write.csv
to omit the initial "", or vice versa?
> 
> Currently using R version 4.1.0 on Windows, but this reproduces at least as
far back as 3.6
> 
> Steve E
> 
> 
> *******************************************************************
> This email and any attachments are confidential. Any u...{{dropped:13}}

Taras Zakharko

2021-Jul-01 07:55 UTC

head link

[Rd] On read.csv and write.csv

Stephen, 

I am sure one can find a lot of small issues and inconsistencies with R and it?s
standard library. It has to support a lot of legacy cruft and the design process
? especially in the early days ? focused on getting things done rather than
delivering a standard library of immaculate quality. And it is way too late to
make dramatic changes lest you want to risk breaking existing software. That
ship has sailed decades ago.

Personally, I have taught myself a while ago to always use explicit
configuration when using built-in functions, and in the last couple of years I
have completely replaced them in favor of other packages (such as readr) that
come with (arguably) more sane defaults and better diagnostics.

Best, 

Taras

> On 30 Jun 2021, at 23:15, Stephen Ellison <S.Ellison at LGCGroup.com>
wrote:
> 
> Apologies if this is a well-worn question; I haven?t found it so far but
there's a lot of r-dev and I may have missed it in the archives. In the mean
time:
> 
> I've managed to avoid writing csv files with R for a couple of decades
but we're swopping data with a collaborator and I've tripped over an
inconsistency between read.csv and write.csv that seems less than helpful.
> The default line number behaviour for read.csv is to assume that, when the
number of items in the first row is one less than the number in the second, that
the first column contains row names. write.csv, however, includes an empty
string ("") as the first header entry over row names when writing. On
rereading, the original row names are then treated as data with unknown name,
replaced by "X".
> 
> That means that, unlike read.table and write.table,  something written with
write.csv is not read back correctly by read.csv .
> 
> Is that intentional?
> And whether it is intentional or not, is it wise?
> 
> Example:
> 
> ( D1 <- data.frame(A=letters[1:5], N=1:5, Y=rnorm(5) ) )
> write.csv(D1, "temp.csv")
> 
> ( D1w <- read.csv("temp.csv") )
> 
> # Note the unnecessary new X column ...
> #Tidy up
> unlink("temp.csv")
> 
> This differs from the parent .table defaults; write.table doesn?t add the
extra "" column label, so the object read back with read.table does
not contain an unwanted extra column.
> 
> Wouldn?t it be more sensible if write.csv() and read.csv() were consistent
in the same sense as read.table and write.table?
> Or at least if there were a switch (as.read.csv=TRUE ?) to tell write.csv
to omit the initial "", or vice versa?
> 
> Currently using R version 4.1.0 on Windows, but this reproduces at least as
far back as 3.6
> 
> Steve E
> 
> 
> *******************************************************************
> This email and any attachments are confidential. Any u...{{dropped:13}}

R devel - Jul 2021 - On read.csv and write.csv

[Rd] On read.csv and write.csv

[Rd] On read.csv and write.csv

[Rd] On read.csv and write.csv

[Rd] On read.csv and write.csv