overeem at knmi.nl
2006-Jun-13 14:01 UTC
[Rd] undesirable rounding off due to 'read.table' (PR#8974)
Full_Name: Aart Overeem Version: 2.2.0 OS: Linux Submission from: (NULL) (145.23.254.155) Construct a dataframe consisting of several variables by using 'data.frame' and 'cbind' and write it to a file with 'write.table'. The file consists of headers and values, such as 12.4283675334551 (so 13 numbers behind the decimal point). If this dataframe is read with 'read.table(filename, skip = 1)' or 'read.table(filename, header = TRUE') the values only have 7 numbers behind the decimal point, e.g. 12.42837. So, the reading rounds off the values. This is not mentioned in the manual. Although the values still have many numbers behind the decimal point, rounding off is, in my view, never desirable.
Hin-Tak Leung
2006-Jun-13 14:22 UTC
[Rd] undesirable rounding off due to 'read.table' (PR#8974)
overeem at knmi.nl wrote:> Full_Name: Aart Overeem > Version: 2.2.0 > OS: Linux > Submission from: (NULL) (145.23.254.155) > > > Construct a dataframe consisting of several variables by using 'data.frame' and > 'cbind' and write it to a file with 'write.table'. The file consists of headers > and values, such as 12.4283675334551 (so 13 numbers behind the decimal point). > If this dataframe is read with 'read.table(filename, skip = 1)' or > 'read.table(filename, header = TRUE') the values only have 7 numbers behind the > decimal point, e.g. 12.42837. So, the reading rounds off the values. This is not > mentioned in the manual. Although the values still have many numbers behind the > decimal point, rounding off is, in my view, never desirable.Hmm, this is probably due to conversion by the scanf family of functions (I don't know the precise location or mechanism of R doing it, this is a guess). It is mentioned in my manpage of sscanf: f Matches an optionally signed floating-point number; the next pointer must be a pointer to float. e Equivalent to f. g Equivalent to f. E Equivalent to f. a (C99) Equivalent to f. So printf/fprintf/sprintf and scanf/sscanf/fscanf are not symmetrical, and you lose precision from 15 (double) to 7 (float). It is a generic problem with ANSI C's printf/scanf, not specific to R. Why don't you use save() or save.image() instead for saving and reloading data.frame ? It is *much faster*, you get much smaller file, and also more accurate. Just my two cents. HTL
Gavin Simpson
2006-Jun-13 14:38 UTC
[Rd] undesirable rounding off due to 'read.table' (PR#8974)
On Tue, 2006-06-13 at 16:01 +0200, overeem at knmi.nl wrote:> Full_Name: Aart Overeem > Version: 2.2.0You are asked not to report bugs on out-dated versions of R...> OS: Linux > Submission from: (NULL) (145.23.254.155) > > > Construct a dataframe consisting of several variables by using 'data.frame' and > 'cbind' and write it to a file with 'write.table'. The file consists of headers > and values, such as 12.4283675334551 (so 13 numbers behind the decimal point). > If this dataframe is read with 'read.table(filename, skip = 1)' or > 'read.table(filename, header = TRUE') the values only have 7 numbers behind the > decimal point, e.g. 12.42837. So, the reading rounds off the values. This is not > mentioned in the manual. Although the values still have many numbers behind the > decimal point, rounding off is, in my view, never desirable.Works for me in R 2.3.1 (patched) Are you mistaking the printed representation of your data.frame for the real thing. E.g.: # dummy data dat <- as.data.frame(matrix(rnorm(100)+ 0.000000000000012, ncol = 10)) # not that reading/writing has anything to do with this, but just to # prove it write.table(dat, file = "~/tmp/temp.csv", sep = ",") dat <- read.table("~/tmp/temp.csv", sep = ",", header = TRUE) dat options(digits = 14) dat or print(dat, digits = 14) G Ps. Wasn't sure about the etiquette of replying to R-bugs in recipients, so deleted it in case this caused further work for the maintainer(s) of the bug repository. Sorry if this isn't desirable. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% * Note new Address, Telephone & Fax numbers from 6th April 2006 * %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson ECRC & ENSIS [t] +44 (0)20 7679 0522 UCL Department of Geography [f] +44 (0)20 7679 0565 Pearson Building [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street [w] http://www.ucl.ac.uk/~ucfagls/cv/ London, UK. [w] http://www.ucl.ac.uk/~ucfagls/ WC1E 6BT. %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Duncan Murdoch
2006-Jun-13 14:38 UTC
[Rd] undesirable rounding off due to 'read.table' (PR#8974)
overeem at knmi.nl wrote:> Full_Name: Aart Overeem > Version: 2.2.0 > OS: Linux > Submission from: (NULL) (145.23.254.155) > > > Construct a dataframe consisting of several variables by using 'data.frame' and > 'cbind' and write it to a file with 'write.table'. The file consists of headers > and values, such as 12.4283675334551 (so 13 numbers behind the decimal point). > If this dataframe is read with 'read.table(filename, skip = 1)' or > 'read.table(filename, header = TRUE') the values only have 7 numbers behind the > decimal point, e.g. 12.42837. So, the reading rounds off the values. This is not > mentioned in the manual. Although the values still have many numbers behind the > decimal point, rounding off is, in my view, never desirable.I don't see this. Try the following script: > x <- data.frame(a=12.4283675334551) > x a 1 12.42837 > write.table(x,'test') > y <- read.table('test') > y a 1 12.42837 > y$a-x$a [1] 0 If y$a had been rounded to 5 decimal places, then we would see a nonzero difference at the end. I think you are being confused by the display, which only shows 5 decimal places, even though more are maintained internally. For example, > options(digits=20) > y a 1 12.4283675334551 Please be careful about what you submit as a bug report. Duncan Murdoch
ripley at stats.ox.ac.uk
2006-Jun-13 14:51 UTC
[Rd] undesirable rounding off due to 'read.table' (PR#8974)
I believe this to be a false report. It is *printing* that rounds off the numbers, not the reading. You provide no evidence of your assertions: here is a simple counter-example:> A <- data.frame(a=12.4283675334551) > write.table(A, "foo") > AA <- read.table("foo") > Aa 1 12.42837> AAa 1 12.42837> print(AA, digits=16)a 1 12.4283675334551> AA-Aa 1 0 The last just happens to be true in this example, as there would normally be a small representation error. On Tue, 13 Jun 2006, overeem at knmi.nl wrote:> Full_Name: Aart Overeem > Version: 2.2.0You are explicitly instructed to upgrade before reporting a bug.> OS: Linux > Submission from: (NULL) (145.23.254.155) > > > Construct a dataframe consisting of several variables by using > 'data.frame' and 'cbind' and write it to a file with 'write.table'. The > file consists of headers and values, such as 12.4283675334551 (so 13 > numbers behind the decimal point).That is, 15 significant digits.> If this dataframe is read with 'read.table(filename, skip = 1)' or > 'read.table(filename, header = TRUE') the values only have 7 numbers > behind the decimal point, e.g. 12.42837.Your example has *five*, and is printed to 7 significant digits, the default for printing.> So, the reading rounds off the values. This is not mentioned in the > manual. Although the values still have many numbers behind the decimal > point, rounding off is, in my view, never desirable.Nor is submitting false reports. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595