Colleagues, I'm using readxl and dplyr to extract a specific cell from all worksheets in a directory. All of these worksheets have the same physical layout. Issue 1: Minus sign replaced by an X after data extraction. library(plyr) library(readxl) files <- list.files(pattern="*.xls", full.names = FALSE) avgs <- lapply(files, read_excel, sheet="Flow Data", range=("c9")) avg_list <- as.data.frame(avgs) trans_avgs <- t(avg_list) write.table(trans_avgs,"avgs.txt") Here are the first lines of the avgs.txt file. "X.0.51571428571428557" "X.0.5349795918367346" "X.0.4895714285714286" "X.0.5112448979591836" The original Excel file contains -0.516 -0.535 -0.490 -0.511 It appears that readxl is changing the - sign to an X. Is there any feature in readxl which I can change so that readxl extracts the minus sign? Issue 2: Duplicate dates contain additional characters dates <- lapply(files, read_excel, sheet="Flow Data", range=("h14")) dates_list <- as.data.frame(dates) trans_dates <- t(dates_list) write.table(trans_dates ,"dates.txt") Here are the first lines of the dates.txt file. "X43859" "X43859.1" "X43859.2" "X43859.3" "X43833" In Excel, this is what is recorded. 1/29/2020 1/29/2020 1/29/2020 1/29/2020 1/3/2020 It appears that readxl is adding additional characters which are signaling duplicate dates. Is there any feature in readxl which can I can change to eliminate these additional characters? Some advice would be appreciated. Thomas Subia Statistician / Senior Quality Engineer IMG Companies? 225 Mountain Vista Parkway Livermore, CA 94551 T.?(925) 273-1106 F.?(925) 273-1111 E. tsubia at imgprecision.com Precision Manufacturing for Emerging Technologies imgprecision.com? The contents of this message, together with any attachments, are intended only for the use of the individual or entity to which they are addressed and may contain information that is legally privileged, confidential and exempt from disclosure. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this message, or any attachment, is strictly prohibited. If you have received this message in error, please notify the original sender or IMG Companies, LLC at Tel: 925-273-1100 immediately by telephone or by return E-mail and delete this message, along with any attachments, from your computer. Thank you.
Pay attention to whether the read_csv call is configured to expect first line as header. On February 5, 2020 11:09:01 AM PST, Thomas Subia <tsubia at imgprecision.com> wrote:>Colleagues, > >I'm using readxl and dplyr to extract a specific cell from all >worksheets in a directory. >All of these worksheets have the same physical layout. > >Issue 1: Minus sign replaced by an X after data extraction. > >library(plyr) >library(readxl) > >files <- list.files(pattern="*.xls", full.names = FALSE) >avgs <- lapply(files, read_excel, sheet="Flow Data", range=("c9")) >avg_list <- as.data.frame(avgs) >trans_avgs <- t(avg_list) >write.table(trans_avgs,"avgs.txt") > >Here are the first lines of the avgs.txt file. >"X.0.51571428571428557" >"X.0.5349795918367346" >"X.0.4895714285714286" >"X.0.5112448979591836" > >The original Excel file contains >-0.516 >-0.535 >-0.490 >-0.511 > >It appears that readxl is changing the - sign to an X. >Is there any feature in readxl which I can change so that readxl >extracts the minus sign? > >Issue 2: Duplicate dates contain additional characters > >dates <- lapply(files, read_excel, sheet="Flow Data", range=("h14")) >dates_list <- as.data.frame(dates) >trans_dates <- t(dates_list) >write.table(trans_dates ,"dates.txt") > >Here are the first lines of the dates.txt file. > >"X43859" >"X43859.1" >"X43859.2" >"X43859.3" >"X43833" > >In Excel, this is what is recorded. > >1/29/2020 >1/29/2020 >1/29/2020 >1/29/2020 >1/3/2020 > >It appears that readxl is adding additional characters which are >signaling duplicate dates. >Is there any feature in readxl which can I can change to eliminate >these additional characters? > >Some advice would be appreciated. > > >Thomas Subia > >Statistician / Senior Quality Engineer >IMG Companies? >225 Mountain Vista Parkway >Livermore, CA 94551 >T.?(925) 273-1106 >F.?(925) 273-1111 >E. tsubia at imgprecision.com > > >Precision Manufacturing for Emerging Technologies >imgprecision.com? > >The contents of this message, together with any attachments, are >intended only for the use of the individual or entity to which they are >addressed and may contain information that is legally privileged, >confidential and exempt from disclosure. If you are not the intended >recipient, you are hereby notified that any dissemination, >distribution, or copying of this message, or any attachment, is >strictly prohibited. If you have received this message in error, please >notify the original sender or IMG Companies, LLC at Tel: 925-273-1100 >immediately by telephone or by return E-mail and delete this message, >along with any attachments, from your computer. Thank you. > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.-- Sent from my phone. Please excuse my brevity.
Jeff, You wrote: " Pay attention to whether the read_csv call is configured to expect first line as header." Here is the code I'm using to extract one cell from a series of Excel files having the same physical format. library(plyr) library(readxl) files <- list.files(pattern="*.xls", full.names = TRUE) # Extract part average from cell c6 for all Excel files avgs <- lapply(files, read_excel, sheet="Flow Data", range=("c9")) # Write data to text file write.table(avgs ,"avgs.txt",sep="\t") I'm not sure where read_csv applies here. Thanks for your help! Thomas Subia -----Original Message----- From: Jeff Newmiller <jdnewmil at dcn.davis.ca.us> Sent: Wednesday, February 05, 2020 11:42 AM To: r-help at r-project.org; Thomas Subia <tsubia at imgprecision.com>; r-help at r-project.org Subject: Re: [R] readxl issue Pay attention to whether the read_csv call is configured to expect first line as header. On February 5, 2020 11:09:01 AM PST, Thomas Subia <tsubia at imgprecision.com> wrote:>Colleagues, > >I'm using readxl and dplyr to extract a specific cell from all >worksheets in a directory. >All of these worksheets have the same physical layout. > >Issue 1: Minus sign replaced by an X after data extraction. > >library(plyr) >library(readxl) > >files <- list.files(pattern="*.xls", full.names = FALSE) avgs <- >lapply(files, read_excel, sheet="Flow Data", range=("c9")) avg_list <- >as.data.frame(avgs) trans_avgs <- t(avg_list) >write.table(trans_avgs,"avgs.txt") > >Here are the first lines of the avgs.txt file. >"X.0.51571428571428557" >"X.0.5349795918367346" >"X.0.4895714285714286" >"X.0.5112448979591836" > >The original Excel file contains >-0.516 >-0.535 >-0.490 >-0.511 > >It appears that readxl is changing the - sign to an X. >Is there any feature in readxl which I can change so that readxl >extracts the minus sign? > >Issue 2: Duplicate dates contain additional characters > >dates <- lapply(files, read_excel, sheet="Flow Data", range=("h14")) >dates_list <- as.data.frame(dates) trans_dates <- t(dates_list) >write.table(trans_dates ,"dates.txt") > >Here are the first lines of the dates.txt file. > >"X43859" >"X43859.1" >"X43859.2" >"X43859.3" >"X43833" > >In Excel, this is what is recorded. > >1/29/2020 >1/29/2020 >1/29/2020 >1/29/2020 >1/3/2020 > >It appears that readxl is adding additional characters which are >signaling duplicate dates. >Is there any feature in readxl which can I can change to eliminate >these additional characters? > >Some advice would be appreciated. > > >Thomas Subia > >Statistician / Senior Quality Engineer >IMG Companies >225 Mountain Vista Parkway >Livermore, CA 94551 >T.?(925) 273-1106 >F.?(925) 273-1111 >E. tsubia at imgprecision.com > > >Precision Manufacturing for Emerging Technologies imgprecision.com > >The contents of this message, together with any attachments, are >intended only for the use of the individual or entity to which they are >addressed and may contain information that is legally privileged, >confidential and exempt from disclosure. If you are not the intended >recipient, you are hereby notified that any dissemination, >distribution, or copying of this message, or any attachment, is >strictly prohibited. If you have received this message in error, please >notify the original sender or IMG Companies, LLC at Tel: 925-273-1100 >immediately by telephone or by return E-mail and delete this message, >along with any attachments, from your computer. Thank you. > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.-- Sent from my phone. Please excuse my brevity.
> I'm using readxl and dplyr to extract a specific cell from all worksheets in a directory. > All of these worksheets have the same physical layout.I don't have access to Excel, so can't test this. I have a suspicion that the problem is not readxl. But rather the "as.data.frame" step. Try reading one file at a time. And inspect (1) your immediate object after reading the Excel file, and then (2) the data.frame after the as.data.frame call.