Earl F. Glynn
2005-Nov-02 18:45 UTC
[R] RODBC and Excel: Wrong Data Type Assumed on Import
The first column in my Excel sheet has mostly numbers but I need to treat it as character data:> library(RODBC) > channel <- odbcConnectExcel("U:/efg/lab/R/Plasmid/construct list.xls") > plasmid <- sqlFetch(channel,"Sheet1", as.is=TRUE) > odbcClose(channel)> names(plasmid)[1] "Plasmid Number" "Plasmid" "Concentration" "Comments" "Lost" # How is the type decided? I need a character type.> class(plasmid$"Plasmid Number")[1] "numeric"> typeof(plasmid$"Plasmid Number")[1] "double"> plasmid$"Plasmid Number"[273:276][1] 274 NA NA 276 The two NAs are supposed to be 275a and 275b. I tried the "as.is=TRUE" but that didn't help. I consulted Section 4, Relational databases, in the R Data Import/Export document (for Version 2.2.0). Section 4.2.2, Data types, was not helpful. In particular, this did not seem helpful: "The more comprehensive of the R interface packages hide the type conversion issues from the user." Section 4.3.2, Package RODBC, provided a "simple example of using ODBC .. with a(sic) Excel spreadsheet" but is silent on how to control the data type on import. Could the documentation be expanded to address this issue? I really need to show "Plasmid 275a" and "Plasmid 275b" instead of "Plasmid NA". Thanks for any help with this. efg -- Earl F. Glynn Scientific Programmer Bioinformatics Department Stowers Institute for Medical Research
Hi As I now exclusively use copy paste method to transfer data from Excel to R I tried it and I got correctly a factor column when there were some non numeric data in Excel. Ctrl-C in Excel mydf<-read.delim("clipboard") in R Are you sure that a respective column in Excel has values 275a and 275b in it? If yes I had tried to define colClasses vector for your columns. HTH Petr On 2 Nov 2005 at 12:45, Earl F. Glynn wrote: To: r-help at stat.math.ethz.ch From: "Earl F. Glynn" <efg at stowers-institute.org> Date sent: Wed, 2 Nov 2005 12:45:53 -0600 Subject: [R] RODBC and Excel: Wrong Data Type Assumed on Import> The first column in my Excel sheet has mostly numbers but I need to > treat it as character data: > > > library(RODBC) > > channel <- odbcConnectExcel("U:/efg/lab/R/Plasmid/construct > > list.xls") plasmid <- sqlFetch(channel,"Sheet1", as.is=TRUE) > > odbcClose(channel) > > > names(plasmid) > [1] "Plasmid Number" "Plasmid" "Concentration" "Comments" > "Lost" > > # How is the type decided? I need a character type. > > class(plasmid$"Plasmid Number") > [1] "numeric" > > typeof(plasmid$"Plasmid Number") > [1] "double" > > > plasmid$"Plasmid Number"[273:276] > [1] 274 NA NA 276 > > The two NAs are supposed to be 275a and 275b. I tried the > "as.is=TRUE" but that didn't help. > > I consulted Section 4, Relational databases, in the R Data > Import/Export document (for Version 2.2.0). > > Section 4.2.2, Data types, was not helpful. In particular, this did > not seem helpful: "The more comprehensive of the R interface packages > hide the type conversion issues from the user." > > Section 4.3.2, Package RODBC, provided a "simple example of using ODBC > .. with a(sic) Excel spreadsheet" but is silent on how to control the > data type on import. Could the documentation be expanded to address > this issue? > > I really need to show "Plasmid 275a" and "Plasmid 275b" instead of > "Plasmid NA". > > Thanks for any help with this. > > efg > -- > Earl F. Glynn > Scientific Programmer > Bioinformatics Department > Stowers Institute for Medical Research > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.htmlPetr Pikal petr.pikal at precheza.cz
From my experience (somewhat of a guess): 1. Excel uses the first 16 rows of data to determine if a column is numeric or character. The data type which is most common in the first 16 rows will then be used for the whole column. If you sort the data so that at least the first 9 rows have character data, you may find this allows the data to be interpreted as character. There is supposedly a registy setting that can control how many lines to use (instead of 16), but I have not had success with the setting. I suspect that ODBC uses JET4, which may be the real source of the problem. See more here: http://www.dicks-blog.com/archives/2004/06/03/external-data-mixed-data-types/ 2. The gregmisc bundle has a different read.xls function that uses a Perl script (xls2csv) and seems to be safer with mixed-type columns. Requires a working version of Perl. Best, Kevin Wright The first column in my Excel sheet has mostly numbers but I need to treat it as character data:> library(RODBC)<http://tolstoy.newcastle.edu.au/R/help/05/09/11324.html#14938qlink1> *> channel <- odbcConnectExcel("U:/efg/lab/R/Plasmid/construct list.xls") * *> plasmid <- sqlFetch(channel,"Sheet1", as.is=TRUE) * *> odbcClose(channel) *> names(plasmid)[1] "Plasmid Number" "Plasmid" "Concentration" "Comments" "Lost" # How is the type decided? I need a character type.> class(plasmid$"Plasmid Number")[1] "numeric"> typeof(plasmid$"Plasmid Number")[1] "double"> plasmid$"Plasmid Number"[273:276][1] 274 NA NA 276 The two NAs are supposed to be 275a and 275b. I tried the "as.is=TRUE" but that didn't help. I consulted Section 4, Relational databases, in the R Data Import/Export document (for Version 2.2.0). Section 4.2.2, Data types, was not helpful. In particular, this did not seem helpful: "The more comprehensive of the R interface packages hide the type conversion issues from the user." Section 4.3.2, Package RODBC, provided a "simple example of using ODBC .. with a(sic) Excel spreadsheet" but is silent on how to control the data type on import. Could the documentation be expanded to address this issue? I really need to show "Plasmid 275a" and "Plasmid 275b" instead of "Plasmid NA". Thanks for any help with this. efg -- Earl F. Glynn Scientific Programmer Bioinformatics Department [[alternative HTML version deleted]]
Seemingly Similar Threads
- Read Windows-like .INI files into R data structure?
- Don't understand write.csv default: Why is column name for column of row names blank?
- french secondary boxplot
- Suggestion about "R equivalent of Splus peaks() function"
- Anything like dir.choose (similar to file.choose) in R?