Joris Meys
2014-Nov-03 15:07 UTC
[Rd] Unexplicable difference between 2 R installations regarding reading numbers
Dear all, A colleague of mine reported a problem that I fail to understand completely. He has a number of .csv files that look all very straightforward, and they all read in perfectly well using read.csv() on both his and my computer. When we try the exact same R version on the university server however, suddenly all numeric variables turn into factors. The problem is resolved by deleting the last digits of every number in the .csv file. Using as.numeric() on the values works as well. Anybody a clue as to what might cause this problem? If needed, I can send an example of a .csv file. Example output on server:> X <- read.csv("Originelen/Originelen/heavymetals.csv") > levels(X[[2]])[1] "11.140969600635804" "11.548972671055257" "11.98554898321271" [4] "16.317868213178677" "17.179218967921898" "18.596573461949852" [7] "18.786014405762298" "18.87978032658098" "23.604106448719225" [10] "26.75482955698816" "27.33829851044687" "29.26619704952923" [13] "33.07842352705811" "39.296270581233884" "4.8696848424212105" [16] "5.5751725517655295" "6.0256909109049195" "9.117975845892804" [19] "9.26944194868723"> str(X)'data.frame': 19 obs. of 18 variables: $ ID : int 1 2 3 4 5 6 7 8 9 10 ... $ Cd5 : Factor w/ 19 levels "11.140969600635804",..: 3 8 6 12 11 10 2 5 14 13 ... $ Cd20 : Factor w/ 19 levels "10.160499999999999",..: 2 8 10 12 5 6 18 9 11 4 ... $ Cr5 : Factor w/ 19 levels "118.43421710855425",..: 6 11 10 17 16 15 7 13 19 18 ... $ Cr20 : Factor w/ 19 levels "100.48101898101898",..: 9 15 14 17 13 11 6 16 18 12 ... $ Cu5 : Factor w/ 19 levels "101.8005401620486",..: 8 17 16 15 14 12 9 18 19 1 ... $ Cu20 : Factor w/ 19 levels "103.67346938775509",..: 11 18 19 2 16 17 14 3 4 1 ... $ Fe5 : Factor w/ 19 levels "17239.349496158833",..: 3 8 10 9 12 14 7 16 19 18 ... $ Fe20 : Factor w/ 19 levels "17701.77893264042",..: 3 14 16 18 10 15 6 17 19 13 ... $ Mn5 : Factor w/ 19 levels "440.37211163349",..: 10 14 4 5 3 17 2 7 18 6 ... $ Mn20 : Factor w/ 19 levels "375.19156134938805",..: 12 2 6 3 1 9 11 7 8 5 ... $ Ni5 : Factor w/ 19 levels "19.54255213010077",..: 4 12 8 10 11 16 6 14 19 18 ... $ Ni20 : Factor w/ 19 levels "21.295222866280234",..: 8 13 15 18 12 16 7 17 19 14 ... $ Pb5 : Factor w/ 19 levels "125.5616926977306",..: 1 11 14 9 13 8 5 12 15 16 ... $ Pb20 : Factor w/ 19 levels "106.96930306969303",..: 3 8 11 12 9 10 4 13 14 15 ... $ Zn5 : Factor w/ 19 levels "1024.909963985594",..: 17 4 7 5 8 3 18 6 9 10 ... $ Zn20 : Factor w/ 19 levels "1247.816195886593",..: 15 4 5 7 2 1 16 6 8 3 ... $ river: int 1 1 1 1 1 1 1 1 1 1 ... Using as.numeric(levels(X[[2]])) works perfectly fine though... Session info both server and my own computer :> sessionInfo()R version 3.1.0 (2014-04-10) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=Dutch_Belgium.1252 LC_CTYPE=Dutch_Belgium.1252 [3] LC_MONETARY=Dutch_Belgium.1252 LC_NUMERIC=C [5] LC_TIME=Dutch_Belgium.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_3.1.0 -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Mathematical Modelling, Statistics and Bio-Informatics tel : +32 (0)9 264 61 79 Joris.Meys at Ugent.be ------------------------------- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]]
Simon Urbanek
2014-Nov-03 15:41 UTC
[Rd] Unexplicable difference between 2 R installations regarding reading numbers
R version. NEWS for 3.1.0: type.convert() (and hence by default read.table() returns a character vector or factor when representing a numeric input as a double would lose accuracy. Similarly for complex inputs. NEWS for 3.1.1: type.convert(), read.table() and similar read.*() functions get a new numerals argument, specifying how numeric input is converted when its conversion to double precision loses accuracy. The default value, allow.loss allows accuracy loss, as in R versions before 3.1.0. On Nov 3, 2014, at 10:07 AM, Joris Meys <jorismeys at gmail.com> wrote:> Dear all, > > A colleague of mine reported a problem that I fail to understand > completely. He has a number of .csv files that look all very > straightforward, and they all read in perfectly well using read.csv() on > both his and my computer. > > When we try the exact same R version on the university server however, > suddenly all numeric variables turn into factors. The problem is resolved > by deleting the last digits of every number in the .csv file. Using > as.numeric() on the values works as well. > > Anybody a clue as to what might cause this problem? If needed, I can send > an example of a .csv file. > > Example output on server: > >> X <- read.csv("Originelen/Originelen/heavymetals.csv") >> levels(X[[2]]) > [1] "11.140969600635804" "11.548972671055257" "11.98554898321271" > [4] "16.317868213178677" "17.179218967921898" "18.596573461949852" > [7] "18.786014405762298" "18.87978032658098" "23.604106448719225" > [10] "26.75482955698816" "27.33829851044687" "29.26619704952923" > [13] "33.07842352705811" "39.296270581233884" "4.8696848424212105" > [16] "5.5751725517655295" "6.0256909109049195" "9.117975845892804" > [19] "9.26944194868723" >> str(X) > 'data.frame': 19 obs. of 18 variables: > $ ID : int 1 2 3 4 5 6 7 8 9 10 ... > $ Cd5 : Factor w/ 19 levels "11.140969600635804",..: 3 8 6 12 11 10 2 5 > 14 13 ... > $ Cd20 : Factor w/ 19 levels "10.160499999999999",..: 2 8 10 12 5 6 18 9 > 11 4 ... > $ Cr5 : Factor w/ 19 levels "118.43421710855425",..: 6 11 10 17 16 15 7 > 13 19 18 ... > $ Cr20 : Factor w/ 19 levels "100.48101898101898",..: 9 15 14 17 13 11 6 > 16 18 12 ... > $ Cu5 : Factor w/ 19 levels "101.8005401620486",..: 8 17 16 15 14 12 9 18 > 19 1 ... > $ Cu20 : Factor w/ 19 levels "103.67346938775509",..: 11 18 19 2 16 17 14 > 3 4 1 ... > $ Fe5 : Factor w/ 19 levels "17239.349496158833",..: 3 8 10 9 12 14 7 16 > 19 18 ... > $ Fe20 : Factor w/ 19 levels "17701.77893264042",..: 3 14 16 18 10 15 6 17 > 19 13 ... > $ Mn5 : Factor w/ 19 levels "440.37211163349",..: 10 14 4 5 3 17 2 7 18 6 > ... > $ Mn20 : Factor w/ 19 levels "375.19156134938805",..: 12 2 6 3 1 9 11 7 8 > 5 ... > $ Ni5 : Factor w/ 19 levels "19.54255213010077",..: 4 12 8 10 11 16 6 14 > 19 18 ... > $ Ni20 : Factor w/ 19 levels "21.295222866280234",..: 8 13 15 18 12 16 7 > 17 19 14 ... > $ Pb5 : Factor w/ 19 levels "125.5616926977306",..: 1 11 14 9 13 8 5 12 > 15 16 ... > $ Pb20 : Factor w/ 19 levels "106.96930306969303",..: 3 8 11 12 9 10 4 13 > 14 15 ... > $ Zn5 : Factor w/ 19 levels "1024.909963985594",..: 17 4 7 5 8 3 18 6 9 > 10 ... > $ Zn20 : Factor w/ 19 levels "1247.816195886593",..: 15 4 5 7 2 1 16 6 8 3 > ... > $ river: int 1 1 1 1 1 1 1 1 1 1 ... > > Using as.numeric(levels(X[[2]])) works perfectly fine though... > > Session info both server and my own computer : > >> sessionInfo() > R version 3.1.0 (2014-04-10) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=Dutch_Belgium.1252 LC_CTYPE=Dutch_Belgium.1252 > [3] LC_MONETARY=Dutch_Belgium.1252 LC_NUMERIC=C > [5] LC_TIME=Dutch_Belgium.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > loaded via a namespace (and not attached): > [1] tools_3.1.0 > > -- > Joris Meys > Statistical consultant > > Ghent University > Faculty of Bioscience Engineering > Department of Mathematical Modelling, Statistics and Bio-Informatics > > tel : +32 (0)9 264 61 79 > Joris.Meys at Ugent.be > ------------------------------- > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >