Jeff, You wrote: " Pay attention to whether the read_csv call is configured to expect first line as header." Here is the code I'm using to extract one cell from a series of Excel files having the same physical format. library(plyr) library(readxl) files <- list.files(pattern="*.xls", full.names = TRUE) # Extract part average from cell c6 for all Excel files avgs <- lapply(files, read_excel, sheet="Flow Data", range=("c9")) # Write data to text file write.table(avgs ,"avgs.txt",sep="\t") I'm not sure where read_csv applies here. Thanks for your help! Thomas Subia -----Original Message----- From: Jeff Newmiller <jdnewmil at dcn.davis.ca.us> Sent: Wednesday, February 05, 2020 11:42 AM To: r-help at r-project.org; Thomas Subia <tsubia at imgprecision.com>; r-help at r-project.org Subject: Re: [R] readxl issue Pay attention to whether the read_csv call is configured to expect first line as header. On February 5, 2020 11:09:01 AM PST, Thomas Subia <tsubia at imgprecision.com> wrote:>Colleagues, > >I'm using readxl and dplyr to extract a specific cell from all >worksheets in a directory. >All of these worksheets have the same physical layout. > >Issue 1: Minus sign replaced by an X after data extraction. > >library(plyr) >library(readxl) > >files <- list.files(pattern="*.xls", full.names = FALSE) avgs <- >lapply(files, read_excel, sheet="Flow Data", range=("c9")) avg_list <- >as.data.frame(avgs) trans_avgs <- t(avg_list) >write.table(trans_avgs,"avgs.txt") > >Here are the first lines of the avgs.txt file. >"X.0.51571428571428557" >"X.0.5349795918367346" >"X.0.4895714285714286" >"X.0.5112448979591836" > >The original Excel file contains >-0.516 >-0.535 >-0.490 >-0.511 > >It appears that readxl is changing the - sign to an X. >Is there any feature in readxl which I can change so that readxl >extracts the minus sign? > >Issue 2: Duplicate dates contain additional characters > >dates <- lapply(files, read_excel, sheet="Flow Data", range=("h14")) >dates_list <- as.data.frame(dates) trans_dates <- t(dates_list) >write.table(trans_dates ,"dates.txt") > >Here are the first lines of the dates.txt file. > >"X43859" >"X43859.1" >"X43859.2" >"X43859.3" >"X43833" > >In Excel, this is what is recorded. > >1/29/2020 >1/29/2020 >1/29/2020 >1/29/2020 >1/3/2020 > >It appears that readxl is adding additional characters which are >signaling duplicate dates. >Is there any feature in readxl which can I can change to eliminate >these additional characters? > >Some advice would be appreciated. > > >Thomas Subia > >Statistician / Senior Quality Engineer >IMG Companies >225 Mountain Vista Parkway >Livermore, CA 94551 >T.?(925) 273-1106 >F.?(925) 273-1111 >E. tsubia at imgprecision.com > > >Precision Manufacturing for Emerging Technologies imgprecision.com > >The contents of this message, together with any attachments, are >intended only for the use of the individual or entity to which they are >addressed and may contain information that is legally privileged, >confidential and exempt from disclosure. If you are not the intended >recipient, you are hereby notified that any dissemination, >distribution, or copying of this message, or any attachment, is >strictly prohibited. If you have received this message in error, please >notify the original sender or IMG Companies, LLC at Tel: 925-273-1100 >immediately by telephone or by return E-mail and delete this message, >along with any attachments, from your computer. Thank you. > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.-- Sent from my phone. Please excuse my brevity.
On 6/02/20 11:01 am, Thomas Subia wrote:> Jeff, > > You wrote: " Pay attention to whether the read_csv call is configured to expect first line as header." > > Here is the code I'm using to extract one cell from a series of Excel files having the same physical format. > > library(plyr) > library(readxl) > files <- list.files(pattern="*.xls", full.names = TRUE) > # Extract part average from cell c6 for all Excel files > avgs <- lapply(files, read_excel, sheet="Flow Data", range=("c9")) > # Write data to text file > write.table(avgs ,"avgs.txt",sep="\t") > > I'm not sure where read_csv applies here.Jeff probably did not read your email as carefully as he might have, and made the unwarranted assumption that you'd be doing something sensible like using *.csv data files rather than mucking about with the notoriously perilous arcana of Excel. I *don't* muck about with Excel myself, unless absolutely forced to, so I can't make any specific suggestions, but I *can* make a general suggestion. When things go wrong, break the procedure down into simple atomic steps. * initially deal with just *one* of your *.xls files * read it in: xxx <- read_excel(<whatever>) * *don't* write it out to a text file yet, *look* at the <expletive deleted> thing in R first; print it (just type "xxx" or "head(xxx)" if it's large) * perhaps convert it to a data frame (it will be a "tibble" and tibbles confuse the issue): yyy <- as.data.frame(xxx) * investigate whether supplying some more arguments to read_excel() solves your problem * perhaps try read.xlsx from the xlsx package and see if any insights are revealed. * if you're still stumped, make available to the list one of the problematic *.xls files (note that you *cannot* attach such a file to your email; it will get stripped; you'll need to provide a URL from which the file can be obtained). Given such a file someone on the list may be able to help you. cheers, Rolf Turner -- Honorary Research Fellow Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276
My assessment was and is likely correct. My error was in writing the wrong function name... should have been read_excel whose parameters need to be adjusted to not look for a header line. On February 5, 2020 3:48:36 PM PST, Rolf Turner <r.turner at auckland.ac.nz> wrote:> >On 6/02/20 11:01 am, Thomas Subia wrote: > >> Jeff, >> >> You wrote: " Pay attention to whether the read_csv call is configured >to expect first line as header." >> >> Here is the code I'm using to extract one cell from a series of Excel >files having the same physical format. >> >> library(plyr) >> library(readxl) >> files <- list.files(pattern="*.xls", full.names = TRUE) >> # Extract part average from cell c6 for all Excel files >> avgs <- lapply(files, read_excel, sheet="Flow Data", range=("c9")) >> # Write data to text file >> write.table(avgs ,"avgs.txt",sep="\t") >> >> I'm not sure where read_csv applies here. > >Jeff probably did not read your email as carefully as he might have, >and >made the unwarranted assumption that you'd be doing something sensible >like using *.csv data files rather than mucking about with the >notoriously perilous arcana of Excel. > >I *don't* muck about with Excel myself, unless absolutely forced to, so > >I can't make any specific suggestions, but I *can* make a general >suggestion. When things go wrong, break the procedure down into simple > >atomic steps. > >* initially deal with just *one* of your *.xls files >* read it in: xxx <- read_excel(<whatever>) >* *don't* write it out to a text file yet, *look* at the <expletive >deleted> thing in R first; print it (just type "xxx" or "head(xxx)" if >it's large) >* perhaps convert it to a data frame (it will be a "tibble" and tibbles > >confuse the issue): yyy <- as.data.frame(xxx) >* investigate whether supplying some more arguments to read_excel() >solves your problem >* perhaps try read.xlsx from the xlsx package and see if any insights >are revealed. >* if you're still stumped, make available to the list one of the >problematic *.xls files (note that you *cannot* attach such a file to >your email; it will get stripped; you'll need to provide a URL from >which the file can be obtained). Given such a file someone on the list > >may be able to help you. > >cheers, > >Rolf Turner-- Sent from my phone. Please excuse my brevity.