thr3ads.net - R help - [R] readxl issue [Feb 2020]

If this information is useful, please help other people find it:
Share via:

Thomas Subia

2020-Feb-05 19:09 UTC

[R] readxl issue

Colleagues,

I'm using readxl and dplyr to extract a specific cell from all worksheets in
a directory.
All of these worksheets have the same physical layout.

Issue 1: Minus sign replaced by an X after data extraction.

library(plyr)
library(readxl)

files <- list.files(pattern="*.xls", full.names = FALSE)
avgs <- lapply(files, read_excel, sheet="Flow Data",
range=("c9"))
avg_list <- as.data.frame(avgs)
trans_avgs <- t(avg_list)
write.table(trans_avgs,"avgs.txt")

Here are the first lines of the avgs.txt file.
"X.0.51571428571428557" 
"X.0.5349795918367346" 
"X.0.4895714285714286" 
"X.0.5112448979591836"

The original Excel file contains
-0.516
-0.535
-0.490
-0.511

It appears that readxl is changing the - sign to an X.
Is there any feature in readxl which I can change so that readxl extracts the
minus sign?

Issue 2: Duplicate dates contain additional characters

dates <- lapply(files, read_excel, sheet="Flow Data",
range=("h14"))
dates_list <- as.data.frame(dates)
trans_dates <- t(dates_list)
write.table(trans_dates ,"dates.txt")

Here are the first lines of the dates.txt file.

"X43859" 
"X43859.1" 
"X43859.2" 
"X43859.3" 
"X43833"

In Excel, this is what is recorded.

1/29/2020
1/29/2020
1/29/2020
1/29/2020
1/3/2020

It appears that readxl is adding additional characters which are signaling
duplicate dates.
Is there any feature in readxl which can I can change to eliminate these
additional characters?

Some advice would be appreciated.


Thomas Subia 

Statistician / Senior Quality Engineer
IMG Companies?
225 Mountain Vista Parkway
Livermore, CA 94551
T.?(925) 273-1106
F.?(925) 273-1111
E. tsubia at imgprecision.com


Precision Manufacturing for Emerging Technologies
imgprecision.com?

The contents of this message, together with any attachments, are intended only
for the use of the individual or entity to which they are addressed and may
contain information that is legally privileged, confidential and exempt from
disclosure. If you are not the intended recipient, you are hereby notified that
any dissemination, distribution, or copying of this message, or any attachment,
is strictly prohibited. If you have received this message in error, please
notify the original sender or IMG Companies, LLC at Tel: 925-273-1100
immediately by telephone or by return E-mail and delete this message, along with
any attachments, from your computer. Thank you.

Jeff Newmiller

2020-Feb-05 19:42 UTC

head link

[R] readxl issue

Pay attention to whether the read_csv call is configured to expect first line as
header.

On February 5, 2020 11:09:01 AM PST, Thomas Subia <tsubia at
imgprecision.com> wrote:>Colleagues,
>
>I'm using readxl and dplyr to extract a specific cell from all
>worksheets in a directory.
>All of these worksheets have the same physical layout.
>
>Issue 1: Minus sign replaced by an X after data extraction.
>
>library(plyr)
>library(readxl)
>
>files <- list.files(pattern="*.xls", full.names = FALSE)
>avgs <- lapply(files, read_excel, sheet="Flow Data",
range=("c9"))
>avg_list <- as.data.frame(avgs)
>trans_avgs <- t(avg_list)
>write.table(trans_avgs,"avgs.txt")
>
>Here are the first lines of the avgs.txt file.
>"X.0.51571428571428557" 
>"X.0.5349795918367346" 
>"X.0.4895714285714286" 
>"X.0.5112448979591836"
>
>The original Excel file contains
>-0.516
>-0.535
>-0.490
>-0.511
>
>It appears that readxl is changing the - sign to an X.
>Is there any feature in readxl which I can change so that readxl
>extracts the minus sign?
>
>Issue 2: Duplicate dates contain additional characters
>
>dates <- lapply(files, read_excel, sheet="Flow Data",
range=("h14"))
>dates_list <- as.data.frame(dates)
>trans_dates <- t(dates_list)
>write.table(trans_dates ,"dates.txt")
>
>Here are the first lines of the dates.txt file.
>
>"X43859" 
>"X43859.1" 
>"X43859.2" 
>"X43859.3" 
>"X43833"
>
>In Excel, this is what is recorded.
>
>1/29/2020
>1/29/2020
>1/29/2020
>1/29/2020
>1/3/2020
>
>It appears that readxl is adding additional characters which are
>signaling duplicate dates.
>Is there any feature in readxl which can I can change to eliminate
>these additional characters?
>
>Some advice would be appreciated.
>
>
>Thomas Subia 
>
>Statistician / Senior Quality Engineer
>IMG Companies?
>225 Mountain Vista Parkway
>Livermore, CA 94551
>T.?(925) 273-1106
>F.?(925) 273-1111
>E. tsubia at imgprecision.com
>
>
>Precision Manufacturing for Emerging Technologies
>imgprecision.com?
>
>The contents of this message, together with any attachments, are
>intended only for the use of the individual or entity to which they are
>addressed and may contain information that is legally privileged,
>confidential and exempt from disclosure. If you are not the intended
>recipient, you are hereby notified that any dissemination,
>distribution, or copying of this message, or any attachment, is
>strictly prohibited. If you have received this message in error, please
>notify the original sender or IMG Companies, LLC at Tel: 925-273-1100
>immediately by telephone or by return E-mail and delete this message,
>along with any attachments, from your computer. Thank you.
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
-- 
Sent from my phone. Please excuse my brevity.

Thomas Subia

2020-Feb-05 22:01 UTC

head link

[R] readxl issue

Jeff,

You wrote: " Pay attention to whether the read_csv call is configured to
expect first line as header."

Here is the code I'm using to extract one cell from a series of Excel files
having the same physical format.

library(plyr)
library(readxl)
files <- list.files(pattern="*.xls", full.names = TRUE)
# Extract part average from cell c6 for all Excel files
avgs <- lapply(files, read_excel, sheet="Flow Data",
range=("c9"))
# Write data to text file
write.table(avgs ,"avgs.txt",sep="\t")

I'm not sure where read_csv applies here.

Thanks for your help!

Thomas Subia

-----Original Message-----
From: Jeff Newmiller <jdnewmil at dcn.davis.ca.us> 
Sent: Wednesday, February 05, 2020 11:42 AM
To: r-help at r-project.org; Thomas Subia <tsubia at imgprecision.com>;
r-help at r-project.org
Subject: Re: [R] readxl issue

Pay attention to whether the read_csv call is configured to expect first line as
header.

On February 5, 2020 11:09:01 AM PST, Thomas Subia <tsubia at
imgprecision.com> wrote:>Colleagues,
>
>I'm using readxl and dplyr to extract a specific cell from all 
>worksheets in a directory.
>All of these worksheets have the same physical layout.
>
>Issue 1: Minus sign replaced by an X after data extraction.
>
>library(plyr)
>library(readxl)
>
>files <- list.files(pattern="*.xls", full.names = FALSE) avgs
<-
>lapply(files, read_excel, sheet="Flow Data",
range=("c9")) avg_list <-
>as.data.frame(avgs) trans_avgs <- t(avg_list)
>write.table(trans_avgs,"avgs.txt")
>
>Here are the first lines of the avgs.txt file.
>"X.0.51571428571428557" 
>"X.0.5349795918367346" 
>"X.0.4895714285714286" 
>"X.0.5112448979591836"
>
>The original Excel file contains
>-0.516
>-0.535
>-0.490
>-0.511
>
>It appears that readxl is changing the - sign to an X.
>Is there any feature in readxl which I can change so that readxl 
>extracts the minus sign?
>
>Issue 2: Duplicate dates contain additional characters
>
>dates <- lapply(files, read_excel, sheet="Flow Data",
range=("h14"))
>dates_list <- as.data.frame(dates) trans_dates <- t(dates_list) 
>write.table(trans_dates ,"dates.txt")
>
>Here are the first lines of the dates.txt file.
>
>"X43859" 
>"X43859.1" 
>"X43859.2" 
>"X43859.3" 
>"X43833"
>
>In Excel, this is what is recorded.
>
>1/29/2020
>1/29/2020
>1/29/2020
>1/29/2020
>1/3/2020
>
>It appears that readxl is adding additional characters which are 
>signaling duplicate dates.
>Is there any feature in readxl which can I can change to eliminate 
>these additional characters?
>
>Some advice would be appreciated.
>
>
>Thomas Subia
>
>Statistician / Senior Quality Engineer
>IMG Companies
>225 Mountain Vista Parkway
>Livermore, CA 94551
>T.?(925) 273-1106
>F.?(925) 273-1111
>E. tsubia at imgprecision.com
>
>
>Precision Manufacturing for Emerging Technologies imgprecision.com
>
>The contents of this message, together with any attachments, are 
>intended only for the use of the individual or entity to which they are 
>addressed and may contain information that is legally privileged, 
>confidential and exempt from disclosure. If you are not the intended 
>recipient, you are hereby notified that any dissemination, 
>distribution, or copying of this message, or any attachment, is 
>strictly prohibited. If you have received this message in error, please 
>notify the original sender or IMG Companies, LLC at Tel: 925-273-1100 
>immediately by telephone or by return E-mail and delete this message, 
>along with any attachments, from your computer. Thank you.
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see 
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
--
Sent from my phone. Please excuse my brevity.

Abby Spurdle

2020-Feb-06 01:10 UTC

head link

[R] readxl issue

> I'm using readxl and dplyr to extract a specific cell from all
worksheets in a directory.
> All of these worksheets have the same physical layout.
I don't have access to Excel, so can't test this.

I have a suspicion that the problem is not readxl.
But rather the "as.data.frame" step.

Try reading one file at a time.
And inspect (1) your immediate object after reading the Excel file,
and then (2) the data.frame after the as.data.frame call.

R help - Feb 2020 - readxl issue

[R] readxl issue

[R] readxl issue

[R] readxl issue

[R] readxl issue