thr3ads.net - R help - [R] tidyverse: read_csv() misses column [Nov 2021]

If this information is useful, please help other people find it:
Share via:

Kevin Thorpe

2021-Nov-01 16:50 UTC

[R] tidyverse: read_csv() misses column

I do not have a specific answer to your particular problem. All I can say is
when a CSV import doesn?t work, it can mean there is something in the CSV file
that is unexpected. When read_csv() fails, I will try read.csv() to compare the
results.

Kevin

> On Nov 1, 2021, at 12:40 PM, Rich Shepard <rshepard at
appl-ecosys.com> wrote:
> 
> The data file, cor-disc.csv begins with:
> site_nbr,year,mon,day,hr,min,tz,disc
> 14171600,2009,10,23,00,00,PDT,8750
> 
> The first 7 columns are character strings; the 8th column is an integer.
> 
> After loading library(tidyverse) I ran read_csv() with this result:
>> cor_disc <- read_csv("../data/cor-disc.csv")
>                                                                            
Rows: 415263 Columns: 8
> ?? Column specification
????????????????????????????????????????????????????????????????????????????
> Delimiter: ","
> chr (5): mon, day, hr, min, tz
> dbl (2): site_nbr, year
> 
> ? Use `spec()` to retrieve the full column specification for this data.
> ? Specify the column types or set `show_col_types = FALSE` to quiet this
message.
> 
> 1. What happed to the values in column 'disc?'
> 
> 2. Why are site_nbr and year seen as doubles when they're character
strings?
> 
> I've not found answers in the book or in ?read_csv.
> 
> What am I missing?
> 
> Rich
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
Kevin E. Thorpe
Head of Biostatistics,  Applied Health Research Centre (AHRC)
Li Ka Shing Knowledge Institute of St. Michael?s Hospital
Assistant Professor, Dalla Lana School of Public Health
University of Toronto
email: kevin.thorpe at utoronto.ca  Tel: 416.864.5776  Fax: 416.864.3016

Rich Shepard

2021-Nov-01 17:01 UTC

head link

[R] tidyverse: read_csv() misses column

On Mon, 1 Nov 2021, Kevin Thorpe wrote:
> I do not have a specific answer to your particular problem. All I can say
> is when a CSV import doesn?t work, it can mean there is something in the
> CSV file that is unexpected. When read_csv() fails, I will try read.csv()
> to compare the results.
Kevin,

That's a thought. I'll do that.

Thanks,

Rich

Rich Shepard

2021-Nov-01 17:10 UTC

head link

[R] tidyverse: read_csv() misses column

On Mon, 1 Nov 2021, Kevin Thorpe wrote:
> I do not have a specific answer to your particular problem. All I can say
> is when a CSV import doesn?t work, it can mean there is something in the
> CSV file that is unexpected. When read_csv() fails, I will try read.csv()
> to compare the results.
Kevin,

Interesting that there's no error:
cor_disc <- read.csv("../data/cor-disc.csv", header = TRUE)
...
12496 14171600 2010   3  15 16  45 PDT 1060
12497 14171600 2010   3  15 17   0 PDT 1060
12498 14171600 2010   3  15 17  15 PDT 1050
12499 14171600 2010   3  15 17  45 PDT 1050
  [ reached 'max' / getOption("max.print") -- omitted 402856
rows ]> head(cor_disc)   site_nbr year mon day hr min  tz disc
1 14171600 2009  10  23  0   0 PDT 8750
2 14171600 2009  10  23  0  15 PDT 8750
3 14171600 2009  10  23  0  30 PDT 8750
4 14171600 2009  10  23  0  45 PDT 8750
5 14171600 2009  10  23  1   0 PDT 8750
6 14171600 2009  10  23  1  15 PDT 8750> str(cor_disc)'data.frame':	415355 obs. of  8 variables:
  $ site_nbr: chr  "14171600" "14171600"
"14171600" "14171600" ...
  $ year    : int  2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 ...
  $ mon     : int  10 10 10 10 10 10 10 10 10 10 ...
  $ day     : int  23 23 23 23 23 23 23 23 23 23 ...
  $ hr      : int  0 0 0 0 1 1 1 1 2 2 ...
  $ min     : int  0 15 30 45 0 15 30 45 0 15 ...
  $ tz      : chr  "PDT" "PDT" "PDT"
"PDT" ...
  $ disc    : int  8750 8750 8750 8750 8750 8750 8750 8730 8730 8730 ...

So, where might I look to see why tidyverse's read_csv() doesn't produce
the
same results?

Regards,

Rich

Jeff Newmiller

2021-Nov-01 17:21 UTC

head link

[R] tidyverse: read_csv() misses column

More explicitly... look at rows past the first row. If your csv has 300 rows and
column 1 has something non-numeric in row 299 then the whole column gets
imported as character data. Try

cor_disc[[ 1 ]] |> as.numeric() |> is.na() |> where()

to find suspect rows. You may want to read about the na argument to read_csv in
?read_csv.

On November 1, 2021 9:50:23 AM PDT, Kevin Thorpe <kevin.thorpe at
utoronto.ca> wrote:>I do not have a specific answer to your particular problem. All I can say is
when a CSV import doesn?t work, it can mean there is something in the CSV file
that is unexpected. When read_csv() fails, I will try read.csv() to compare the
results.
>
>Kevin
>
>
>> On Nov 1, 2021, at 12:40 PM, Rich Shepard <rshepard at
appl-ecosys.com> wrote:
>> 
>> The data file, cor-disc.csv begins with:
>> site_nbr,year,mon,day,hr,min,tz,disc
>> 14171600,2009,10,23,00,00,PDT,8750
>> 
>> The first 7 columns are character strings; the 8th column is an
integer.
>> 
>> After loading library(tidyverse) I ran read_csv() with this result:
>>> cor_disc <- read_csv("../data/cor-disc.csv")
>>                                                                        
Rows: 415263 Columns: 8
>> ?? Column specification
????????????????????????????????????????????????????????????????????????????
>> Delimiter: ","
>> chr (5): mon, day, hr, min, tz
>> dbl (2): site_nbr, year
>> 
>> ? Use `spec()` to retrieve the full column specification for this data.
>> ? Specify the column types or set `show_col_types = FALSE` to quiet
this message.
>> 
>> 1. What happed to the values in column 'disc?'
>> 
>> 2. Why are site_nbr and year seen as doubles when they're character
strings?
>> 
>> I've not found answers in the book or in ?read_csv.
>> 
>> What am I missing?
>> 
>> Rich
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
-- 
Sent from my phone. Please excuse my brevity.

R help - Nov 2021 - tidyverse: read_csv() misses column

[R] tidyverse: read_csv() misses column

[R] tidyverse: read_csv() misses column

[R] tidyverse: read_csv() misses column

[R] tidyverse: read_csv() misses column