thr3ads.net - R help - [R] [FORGED] Re: readxl issue [Feb 2020]

If this information is useful, please help other people find it:
Share via:

Thomas Subia

2020-Feb-05 22:01 UTC

[R] readxl issue

Jeff,

You wrote: " Pay attention to whether the read_csv call is configured to
expect first line as header."

Here is the code I'm using to extract one cell from a series of Excel files
having the same physical format.

library(plyr)
library(readxl)
files <- list.files(pattern="*.xls", full.names = TRUE)
# Extract part average from cell c6 for all Excel files
avgs <- lapply(files, read_excel, sheet="Flow Data",
range=("c9"))
# Write data to text file
write.table(avgs ,"avgs.txt",sep="\t")

I'm not sure where read_csv applies here.

Thanks for your help!

Thomas Subia

-----Original Message-----
From: Jeff Newmiller <jdnewmil at dcn.davis.ca.us> 
Sent: Wednesday, February 05, 2020 11:42 AM
To: r-help at r-project.org; Thomas Subia <tsubia at imgprecision.com>;
r-help at r-project.org
Subject: Re: [R] readxl issue

Pay attention to whether the read_csv call is configured to expect first line as
header.

On February 5, 2020 11:09:01 AM PST, Thomas Subia <tsubia at
imgprecision.com> wrote:>Colleagues,
>
>I'm using readxl and dplyr to extract a specific cell from all 
>worksheets in a directory.
>All of these worksheets have the same physical layout.
>
>Issue 1: Minus sign replaced by an X after data extraction.
>
>library(plyr)
>library(readxl)
>
>files <- list.files(pattern="*.xls", full.names = FALSE) avgs
<-
>lapply(files, read_excel, sheet="Flow Data",
range=("c9")) avg_list <-
>as.data.frame(avgs) trans_avgs <- t(avg_list)
>write.table(trans_avgs,"avgs.txt")
>
>Here are the first lines of the avgs.txt file.
>"X.0.51571428571428557" 
>"X.0.5349795918367346" 
>"X.0.4895714285714286" 
>"X.0.5112448979591836"
>
>The original Excel file contains
>-0.516
>-0.535
>-0.490
>-0.511
>
>It appears that readxl is changing the - sign to an X.
>Is there any feature in readxl which I can change so that readxl 
>extracts the minus sign?
>
>Issue 2: Duplicate dates contain additional characters
>
>dates <- lapply(files, read_excel, sheet="Flow Data",
range=("h14"))
>dates_list <- as.data.frame(dates) trans_dates <- t(dates_list) 
>write.table(trans_dates ,"dates.txt")
>
>Here are the first lines of the dates.txt file.
>
>"X43859" 
>"X43859.1" 
>"X43859.2" 
>"X43859.3" 
>"X43833"
>
>In Excel, this is what is recorded.
>
>1/29/2020
>1/29/2020
>1/29/2020
>1/29/2020
>1/3/2020
>
>It appears that readxl is adding additional characters which are 
>signaling duplicate dates.
>Is there any feature in readxl which can I can change to eliminate 
>these additional characters?
>
>Some advice would be appreciated.
>
>
>Thomas Subia
>
>Statistician / Senior Quality Engineer
>IMG Companies
>225 Mountain Vista Parkway
>Livermore, CA 94551
>T.?(925) 273-1106
>F.?(925) 273-1111
>E. tsubia at imgprecision.com
>
>
>Precision Manufacturing for Emerging Technologies imgprecision.com
>
>The contents of this message, together with any attachments, are 
>intended only for the use of the individual or entity to which they are 
>addressed and may contain information that is legally privileged, 
>confidential and exempt from disclosure. If you are not the intended 
>recipient, you are hereby notified that any dissemination, 
>distribution, or copying of this message, or any attachment, is 
>strictly prohibited. If you have received this message in error, please 
>notify the original sender or IMG Companies, LLC at Tel: 925-273-1100 
>immediately by telephone or by return E-mail and delete this message, 
>along with any attachments, from your computer. Thank you.
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see 
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
--
Sent from my phone. Please excuse my brevity.

Rolf Turner

2020-Feb-05 23:48 UTC

head link

[R] [FORGED] Re: readxl issue

On 6/02/20 11:01 am, Thomas Subia wrote:
> Jeff,
> 
> You wrote: " Pay attention to whether the read_csv call is configured
to expect first line as header."
> 
> Here is the code I'm using to extract one cell from a series of Excel
files having the same physical format.
> 
> library(plyr)
> library(readxl)
> files <- list.files(pattern="*.xls", full.names = TRUE)
> # Extract part average from cell c6 for all Excel files
> avgs <- lapply(files, read_excel, sheet="Flow Data",
range=("c9"))
> # Write data to text file
> write.table(avgs ,"avgs.txt",sep="\t")
> 
> I'm not sure where read_csv applies here.
Jeff probably did not read your email as carefully as he might have, and 
made the unwarranted assumption that you'd be doing something sensible 
like using *.csv data files rather than mucking about with the 
notoriously perilous arcana of Excel.

I *don't* muck about with Excel myself, unless absolutely forced to, so 
I can't make any specific suggestions, but I *can* make a general 
suggestion.  When things go wrong, break the procedure down into simple 
atomic steps.

* initially deal with just *one* of your *.xls files
* read it in:  xxx <- read_excel(<whatever>)
* *don't* write it out to a text file yet, *look* at the <expletive 
deleted> thing in R first; print it (just type "xxx" or
"head(xxx)" if
it's large)
* perhaps convert it to a data frame (it will be a "tibble" and
tibbles
confuse the issue): yyy <- as.data.frame(xxx)
* investigate whether supplying some more arguments to read_excel() 
solves your problem
* perhaps try read.xlsx from the xlsx package and see if any insights 
are revealed.
* if you're still stumped, make available to the list one of the 
problematic *.xls files (note that you *cannot* attach such a file to 
your email; it will get stripped; you'll need to provide a URL from 
which the file can be obtained).  Given such a file someone on the list 
may be able to help you.

cheers,

Rolf Turner

-- 
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

Jeff Newmiller

2020-Feb-06 01:04 UTC

head link

[R] [FORGED] Re: readxl issue

My assessment was and is likely correct.

My error was in writing the wrong function name... should have been read_excel
whose parameters need to be adjusted to not look for a header line.

On February 5, 2020 3:48:36 PM PST, Rolf Turner <r.turner at
auckland.ac.nz> wrote:>
>On 6/02/20 11:01 am, Thomas Subia wrote:
>
>> Jeff,
>> 
>> You wrote: " Pay attention to whether the read_csv call is
configured
>to expect first line as header."
>> 
>> Here is the code I'm using to extract one cell from a series of
Excel
>files having the same physical format.
>> 
>> library(plyr)
>> library(readxl)
>> files <- list.files(pattern="*.xls", full.names = TRUE)
>> # Extract part average from cell c6 for all Excel files
>> avgs <- lapply(files, read_excel, sheet="Flow Data",
range=("c9"))
>> # Write data to text file
>> write.table(avgs ,"avgs.txt",sep="\t")
>> 
>> I'm not sure where read_csv applies here.
>
>Jeff probably did not read your email as carefully as he might have,
>and 
>made the unwarranted assumption that you'd be doing something sensible 
>like using *.csv data files rather than mucking about with the 
>notoriously perilous arcana of Excel.
>
>I *don't* muck about with Excel myself, unless absolutely forced to, so
>
>I can't make any specific suggestions, but I *can* make a general 
>suggestion.  When things go wrong, break the procedure down into simple
>
>atomic steps.
>
>* initially deal with just *one* of your *.xls files
>* read it in:  xxx <- read_excel(<whatever>)
>* *don't* write it out to a text file yet, *look* at the <expletive 
>deleted> thing in R first; print it (just type "xxx" or
"head(xxx)" if
>it's large)
>* perhaps convert it to a data frame (it will be a "tibble" and
tibbles
>
>confuse the issue): yyy <- as.data.frame(xxx)
>* investigate whether supplying some more arguments to read_excel() 
>solves your problem
>* perhaps try read.xlsx from the xlsx package and see if any insights 
>are revealed.
>* if you're still stumped, make available to the list one of the 
>problematic *.xls files (note that you *cannot* attach such a file to 
>your email; it will get stripped; you'll need to provide a URL from 
>which the file can be obtained).  Given such a file someone on the list
>
>may be able to help you.
>
>cheers,
>
>Rolf Turner
-- 
Sent from my phone. Please excuse my brevity.

R help - Feb 2020 - [FORGED] Re: readxl issue

[R] readxl issue

[R] [FORGED] Re: readxl issue

[R] [FORGED] Re: readxl issue