I do not have a specific answer to your particular problem. All I can say is when a CSV import doesn?t work, it can mean there is something in the CSV file that is unexpected. When read_csv() fails, I will try read.csv() to compare the results. Kevin> On Nov 1, 2021, at 12:40 PM, Rich Shepard <rshepard at appl-ecosys.com> wrote: > > The data file, cor-disc.csv begins with: > site_nbr,year,mon,day,hr,min,tz,disc > 14171600,2009,10,23,00,00,PDT,8750 > > The first 7 columns are character strings; the 8th column is an integer. > > After loading library(tidyverse) I ran read_csv() with this result: >> cor_disc <- read_csv("../data/cor-disc.csv") > Rows: 415263 Columns: 8 > ?? Column specification ???????????????????????????????????????????????????????????????????????????? > Delimiter: "," > chr (5): mon, day, hr, min, tz > dbl (2): site_nbr, year > > ? Use `spec()` to retrieve the full column specification for this data. > ? Specify the column types or set `show_col_types = FALSE` to quiet this message. > > 1. What happed to the values in column 'disc?' > > 2. Why are site_nbr and year seen as doubles when they're character strings? > > I've not found answers in the book or in ?read_csv. > > What am I missing? > > Rich > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Kevin E. Thorpe Head of Biostatistics, Applied Health Research Centre (AHRC) Li Ka Shing Knowledge Institute of St. Michael?s Hospital Assistant Professor, Dalla Lana School of Public Health University of Toronto email: kevin.thorpe at utoronto.ca Tel: 416.864.5776 Fax: 416.864.3016
On Mon, 1 Nov 2021, Kevin Thorpe wrote:> I do not have a specific answer to your particular problem. All I can say > is when a CSV import doesn?t work, it can mean there is something in the > CSV file that is unexpected. When read_csv() fails, I will try read.csv() > to compare the results.Kevin, That's a thought. I'll do that. Thanks, Rich
On Mon, 1 Nov 2021, Kevin Thorpe wrote:> I do not have a specific answer to your particular problem. All I can say > is when a CSV import doesn?t work, it can mean there is something in the > CSV file that is unexpected. When read_csv() fails, I will try read.csv() > to compare the results.Kevin, Interesting that there's no error: cor_disc <- read.csv("../data/cor-disc.csv", header = TRUE) ... 12496 14171600 2010 3 15 16 45 PDT 1060 12497 14171600 2010 3 15 17 0 PDT 1060 12498 14171600 2010 3 15 17 15 PDT 1050 12499 14171600 2010 3 15 17 45 PDT 1050 [ reached 'max' / getOption("max.print") -- omitted 402856 rows ]> head(cor_disc)site_nbr year mon day hr min tz disc 1 14171600 2009 10 23 0 0 PDT 8750 2 14171600 2009 10 23 0 15 PDT 8750 3 14171600 2009 10 23 0 30 PDT 8750 4 14171600 2009 10 23 0 45 PDT 8750 5 14171600 2009 10 23 1 0 PDT 8750 6 14171600 2009 10 23 1 15 PDT 8750> str(cor_disc)'data.frame': 415355 obs. of 8 variables: $ site_nbr: chr "14171600" "14171600" "14171600" "14171600" ... $ year : int 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 ... $ mon : int 10 10 10 10 10 10 10 10 10 10 ... $ day : int 23 23 23 23 23 23 23 23 23 23 ... $ hr : int 0 0 0 0 1 1 1 1 2 2 ... $ min : int 0 15 30 45 0 15 30 45 0 15 ... $ tz : chr "PDT" "PDT" "PDT" "PDT" ... $ disc : int 8750 8750 8750 8750 8750 8750 8750 8730 8730 8730 ... So, where might I look to see why tidyverse's read_csv() doesn't produce the same results? Regards, Rich
More explicitly... look at rows past the first row. If your csv has 300 rows and column 1 has something non-numeric in row 299 then the whole column gets imported as character data. Try cor_disc[[ 1 ]] |> as.numeric() |> is.na() |> where() to find suspect rows. You may want to read about the na argument to read_csv in ?read_csv. On November 1, 2021 9:50:23 AM PDT, Kevin Thorpe <kevin.thorpe at utoronto.ca> wrote:>I do not have a specific answer to your particular problem. All I can say is when a CSV import doesn?t work, it can mean there is something in the CSV file that is unexpected. When read_csv() fails, I will try read.csv() to compare the results. > >Kevin > > >> On Nov 1, 2021, at 12:40 PM, Rich Shepard <rshepard at appl-ecosys.com> wrote: >> >> The data file, cor-disc.csv begins with: >> site_nbr,year,mon,day,hr,min,tz,disc >> 14171600,2009,10,23,00,00,PDT,8750 >> >> The first 7 columns are character strings; the 8th column is an integer. >> >> After loading library(tidyverse) I ran read_csv() with this result: >>> cor_disc <- read_csv("../data/cor-disc.csv") >> Rows: 415263 Columns: 8 >> ?? Column specification ???????????????????????????????????????????????????????????????????????????? >> Delimiter: "," >> chr (5): mon, day, hr, min, tz >> dbl (2): site_nbr, year >> >> ? Use `spec()` to retrieve the full column specification for this data. >> ? Specify the column types or set `show_col_types = FALSE` to quiet this message. >> >> 1. What happed to the values in column 'disc?' >> >> 2. Why are site_nbr and year seen as doubles when they're character strings? >> >> I've not found answers in the book or in ?read_csv. >> >> What am I missing? >> >> Rich >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >-- Sent from my phone. Please excuse my brevity.