Mark Na
2009-May-12 21:59 UTC
[R] Specifying data type when creating a dataframe using RODBC
H R-helpers, I am using the following code to make a dataframe from an Excel spreadsheet: library(RODBC) channel <- odbcConnectExcel("Spreadsheet.xls") Data <- sqlFetch(channel, "Tab1") odbcClose(channel) One column (several, actually) in the spreadsheet contains integers in its first few rows but later values in these columns contain a mixture of numbers, letters and symbols (it's an ID variable, containing e.g., 12, 14, 19, 19B, 19C, 19/20) R creates this column as a numeric variable (I think because its first few variables are numbers) but as soon as R gets to the non-numeric values (e.g., 19/20) it replaces them with NA. So, my question is: how can I specify that certain columns are to be read as character variables BEFORE the dataframe is created? I have tried using as.character() in the third line (above) but it creates a very long first column containing all of my data... Thanks for any help you might provide, Mark Na
Prof Brian Ripley
2009-May-13 05:48 UTC
[R] Specifying data type when creating a dataframe using RODBC
On Tue, 12 May 2009, Mark Na wrote:> H R-helpers, > > I am using the following code to make a dataframe from an Excel spreadsheet: > > library(RODBC) > channel <- odbcConnectExcel("Spreadsheet.xls") > Data <- sqlFetch(channel, "Tab1") > odbcClose(channel) > > One column (several, actually) in the spreadsheet contains integers in > its first few rows but later values in these columns contain a mixture > of numbers, letters and symbols (it's an ID variable, containing e.g., > 12, 14, 19, 19B, 19C, 19/20) > > R creates this column as a numeric variable (I think because its first > few variables are numbers) but as soon as R gets to the non-numeric > values (e.g., 19/20) it replaces them with NA.R doesn't do that: nor does RODBC. You can try reading the column with 'as.is = TRUE' (see ?sqlFetch), but this looks like a well-known Excel/ODBC driver bug where Excel decides the column is numeric based on the first few entries and hence exports it as numbers (and NULLs, the SQL version of NAs).> So, my question is: how can I specify that certain columns are to be > read as character variables BEFORE the dataframe is created?You can try specifying the column type in Excel, so it is recorded on the spreadsheet. I am not sure that works in all cases.> I have tried using as.character() in the third line (above) but it > creates a very long first column containing all of my data... > > Thanks for any help you might provide, > > Mark Na > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
J Dougherty
2009-May-14 02:50 UTC
[R] Specifying data type when creating a dataframe using RODBC
On Tuesday 12 May 2009 02:59:30 pm Mark Na wrote:> H R-helpers, >...> > One column (several, actually) in the spreadsheet contains integers in > its first few rows but later values in these columns contain a mixture > of numbers, letters and symbols (it's an ID variable, containing e.g., > 12, 14, 19, 19B, 19C, 19/20) > > R creates this column as a numeric variable (I think because its first > few variables are numbers) but as soon as R gets to the non-numeric > values (e.g., 19/20) it replaces them with NA.This is much more likely to be an Excel issue, and if the data is important at all you would be better off keeping in Access. Excel has several automatic features that can result in damaged data. See the article by B.D.McCullough and David A. Heiser in Computational Statistics and Data Analysis 52(2008) 4570-4578.> > So, my question is: how can I specify that certain columns are to be > read as character variables BEFORE the dataframe is created? > > I have tried using as.character() in the third line (above) but it > creates a very long first column containing all of my data... > > Thanks for any help you might provide, > > Mark Na > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html and provide commented, minimal, > self-contained, reproducible code.