Claudia Beleites
2012-Aug-14 17:00 UTC
[Rd] as.numeric and as.character with locale using comma as separator
Dear all, summary: My LC_NUMERIC is changed from C to de_DE by library (qtbase). [which shouldn't happen according to the warning when setting it back manually]. I posted an issue at their github repository, but maybe the behaviour is a bit more of general interest. However, if LC_NUMERIC is changed, as.character () uses the decimal separator that belongs to LC_NUMERIC (and not options ()$OutDec as I supposed). as.double () (= as.numeric ()) doesn't, though. That causes trouble with constructs like as.numeric (as.character (x)) long version: as.character seems to take into account my locale (de_DE) which uses comma as decimal separator:> x <- rnorm (3) > x[1] -0,004238328 -0,919358537 -1,654543297> as.character(x)[1] "-0,00423832753479965" "-0,919358536523751" "-1,65454329680873" whereas as.numeric () doesn't:> as.numeric (as.character(x))[1] NA NA NA Warnmeldung: NAs durch Umwandlung erzeugt> as.numeric (gsub (",", ".", as.character(x)))[1] -0,004238328 -0,919358537 -1,654543297 I did not see any mention in the help of as.numeric nor as.character of this. Note also the output of example (as.character):> example (as.character)as.chr> form <- y ~ a + b + c as.chr> as.character(form) ## length 3 [1] "~" "y" "a + b + c" as.chr> deparse(form) ## like the input [1] "y ~ a + b + c" as.chr> a0 <- 11/999 # has a repeating decimal representation as.chr> (a1 <- as.character(a0)) [1] "0,011011011011011" as.chr> format(a0, digits=16) # shows one more digit [1] "0,01101101101101101" as.chr> a2 <- as.numeric(a1) as.chr> a2 - a0 # normally around -1e-17 [1] NA as.chr> as.character(a2) # normally different from a1 [1] NA as.chr> print(c(a0, a2), digits = 16) [1] 0,01101101101101101 NA Warnmeldung: In eval(expr, envir, enclos) : NAs durch Umwandlung erzeugt *session info*> sessionInfo ()R version 2.15.1 (2012-06-22) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] de_DE.UTF-8 attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base other attached packages: [1] Hmisc_3.9-3 survival_2.36-14 plumbr_0.6.6 cranvas_0.8 [5] maps_2.2-6 scales_0.2.1 qtpaint_0.9.0 qtbase_1.0.5 [9] idendro_1.0 loaded via a namespace (and not attached): [1] cluster_1.14.2 colorspace_1.1-1 dichromat_1.2-4 [4] grid_2.15.1 labeling_0.1 lattice_0.20-6 [7] munsell_0.3 objectProperties_0.6.5 objectSignals_0.10.2 [10] plyr_1.7.1 RColorBrewer_1.0-5 SearchTrees_0.5.1 [13] stringr_0.6 tools_2.15.1 tourr_0.5.2 Note that> options ()$OutDec[1] "." In fresh R sessions I have locale: [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8 [5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=de_DE.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C It seems qtbase is the culprit:> x[1] -0.2290188 -0.1884703 0.2507179> library (qtbase) > x[1] -0,2290188 -0,1884703 0,2507179 After setting the numeric locale back to C:> Sys.setlocale ("LC_NUMERIC", "C")[1] "C" Warnmeldung: In Sys.setlocale("LC_NUMERIC", "C") : das Setzen von 'LC_NUMERIC' kann bewirken, dass R sich komisch benimmt as.numeric (as.character (x)) works as supposed (also output has decimal points again) Best, Claudia -- Claudia Beleites Spectroscopy/Imaging Institute of Photonic Technology Albert-Einstein-Str. 9 07745 Jena Germany email: claudia.beleites at ipht-jena.de phone: +49 3641 206-133 fax: +49 2641 206-399