I have a large Sweave report that reads data from a database file. Some of the columns are 1-character strings containing only +, - or NA. An example for such a table is shown below, and can be downloaded for easier testing from http://www.menne-biomed.de/uni/test.zip (For security reasons, the file is zipped) table test hp hp1 + a - + library(RODBC) channel = odbcConnectAccess("test.mdb") ret = sqlQuery(channel,"select * from test") odbcClose(channel) str(ret) # 'data.frame': 2 obs. of 2 variables: # $ hp : num 0 0 # $ hp1: Factor w/ 2 levels "+","a": 2 1 Note that the column hp with "+" and "-" only is read as numeric 0, but when there is only other character such as in hp1, the conversion occurs. In R 2.6.2 (or was it an earlier version of RODBC?), column hp was treated as factor. Is this a new feature I have to live with, or an ... ahem ... issue? I know that with as.is I can get around this, but it need a lot of explicit programming for the columns I don't want to be as.issed Disclaimer: -- Yes, I know I should have reported this earlier, but the problem of having to re-create the report came up today. -- Yes, I should have reported this on the windows/devel r-help or directly to the author (of RODBC; or base?), so I feel guilty in advance that this is the wrong list. -- Yes, I have read the NEWS, and could not find something related. -- Yes, I cannot rule out this is a user error. Dieter --------------------------- R version 2.7.0 (2008-04-22) i386-pc-mingw32 locale: LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=Germ an_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] RODBC_1.2-3> >
Prof Brian Ripley
2008-Apr-30 10:49 UTC
[R] + and - in RODBC : no longer considered factors
It is nothing to do with RODBC, which follows read.table here: % cat > foo.txt x + - ...> read.table("foo.txt", header=TRUE)x 1 0 2 0 and that uses> type.convert(c("+", "-"))[1] 0 0> type.convert(c("+", "a"))[1] + a Levels: + a Whereas 2.6.2 did> type.convert(c("+", "-"))[1] + - Levels: + - The difference is related to a change to deciding in R (and not the OS) what a 'numeric field' is: o Parsing and scanning of numerical constants is now done by R's own C code. This ensures cross-platform consistency, and mitigates the effects of setting LC_NUMERIC (within base R it only applies to output -- packages may differ). The format accepted is more general than before and includes binary exponents in hexadecimal constants: see ?NumericConstants for details. There's a comment in the sources that numeric fields with no digits should perhaps be regarded as non-numeric, so this can easily be changed. On Wed, 30 Apr 2008, Dieter Menne wrote:> I have a large Sweave report that reads data from a database file. Some of > the columns are 1-character strings containing only +, - or NA. An example > for such a table is shown below, and can be downloaded for easier testing > from > > http://www.menne-biomed.de/uni/test.zip > > (For security reasons, the file is zipped) > > table test > > hp hp1 > + a > - + > > > library(RODBC) > channel = odbcConnectAccess("test.mdb") > ret = sqlQuery(channel,"select * from test") > odbcClose(channel) > str(ret) > # 'data.frame': 2 obs. of 2 variables: > # $ hp : num 0 0 > # $ hp1: Factor w/ 2 levels "+","a": 2 1 > > > Note that the column hp with "+" and "-" only is read as numeric 0, but > when there is only other character such as in hp1, the conversion occurs. > > In R 2.6.2 (or was it an earlier version of RODBC?), column hp was treated > as factor. > > Is this a new feature I have to live with, or an ... ahem ... issue? I know > that with as.is I can get around this, but it need a lot of explicit > programming for the columns I don't want to be as.issed > > Disclaimer: > -- Yes, I know I should have reported this earlier, but the problem of > having > to re-create the report came up today. > -- Yes, I should have reported this on the windows/devel r-help or directly > to the author (of RODBC; or base?), so I feel guilty in advance that this is > > the wrong list. > -- Yes, I have read the NEWS, and could not find something related. > -- Yes, I cannot rule out this is a user error. > > > Dieter > > > --------------------------- > > R version 2.7.0 (2008-04-22) > i386-pc-mingw32 > > locale: > LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=Germ > an_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] RODBC_1.2-3 >> >> > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595