Dear Users! I encountered with some problem in data reading while I challenged R (and me too) in a validation point of view. In this issue, I tried to utilize some reference datasets ( http://www.itl.nist.gov/div898/strd/index.html). And the result departed a bit from my expectations. This dataset dedicated to challenge cancellation and accumulation errors (case SmLs07), that's why this uncommon look of txt file. Treatment Response 1 1000000000000.4 1 1000000000000.3 1 1000000000000.5 ...... 2 1000000000000.2 2 1000000000000.4 ..... 3 1000000000000.4 3 1000000000000.6 3 1000000000000.4 ......... then after a read.table() I expect the same set instead I've got this: Treatment Response 1 1 1000000000000.4000244 2 1 1000000000000.3000488 3 1 1000000000000.5000000 ......... 22 2 1000000000000.3000488 23 2 1000000000000.1999512 24 2 1000000000000.4000244 ....... 58 3 1000000000000.4000244 59 3 1000000000000.5999756 60 3 1000000000000.4000244 61 3 1000000000000.5999756 62 3 1000000000000.4000244 ...... a lots of number from the space. I assume that these numbers come from the binary representation of such a tricky decimal numbers but my question is how can I avoid this feature of the binary representation? Moreover, I wondered that it may raise some question in a regulated environment. [[alternative HTML version deleted]]
Hi Istvan, That's most unusual, and quite unlikely (and much larger than the usual floating-point rounding errors). Please provide a reproducible example. I assume you got the data from here: http://www.itl.nist.gov/div898/strd/anova/SmLs07.dat What did you do with it then? How did you delete the header rows? What R code did you use to read it in? What OS and version of R are you working with? R has been well-validated; it's more likely that you did something sub-optimal while importing the data. Sarah On Fri, May 4, 2012 at 9:54 AM, Istvan Nemeth <furgeurge at gmail.com> wrote:> Dear Users! > > I encountered with some problem in data reading while I challenged R (and > me too) in a validation point of view. > In this issue, I tried to utilize some reference datasets ( > http://www.itl.nist.gov/div898/strd/index.html). > And the result departed a bit from my expectations. This dataset dedicated > to challenge cancellation and accumulation errors (case SmLs07), that's why > this uncommon look of txt file. > > Treatment ? Response > ? ? ? ? ? 1 ? ?1000000000000.4 > ? ? ? ? ? 1 ? ?1000000000000.3 > ? ? ? ? ? 1 ? ?1000000000000.5 > ? ? ? ? ? ...... > ? ? ? ? ? 2 ? ?1000000000000.2 > ? ? ? ? ? 2 ? ?1000000000000.4 > ? ? ? ? ? ..... > ? ? ? ? ? 3 ? ?1000000000000.4 > ? ? ? ? ? 3 ? ?1000000000000.6 > ? ? ? ? ? 3 ? ?1000000000000.4 > ? ? ? ? ? ......... > then after a read.table() I expect the same set instead I've got this: > > ? ?Treatment ? ? ? ? ? ? ?Response > 1 ? ? ? ? ? 1 1000000000000.4000244 > 2 ? ? ? ? ? 1 1000000000000.3000488 > 3 ? ? ? ? ? 1 1000000000000.5000000 > ......... > 22 ? ? ? ? ?2 1000000000000.3000488 > 23 ? ? ? ? ?2 1000000000000.1999512 > 24 ? ? ? ? ?2 1000000000000.4000244 > ....... > 58 ? ? ? ? ?3 1000000000000.4000244 > 59 ? ? ? ? ?3 1000000000000.5999756 > 60 ? ? ? ? ?3 1000000000000.4000244 > 61 ? ? ? ? ?3 1000000000000.5999756 > 62 ? ? ? ? ?3 1000000000000.4000244 > ...... > a lots of number from the space. I assume that these numbers come from the > binary representation of such a tricky decimal numbers but my question is > how can I avoid this feature of the binary representation? > > Moreover, I wondered that it may raise some question in a regulated > environment. >-- Sarah Goslee http://www.functionaldiversity.org
Hi Istvan, Your OS and version of R (eg sessionInfo() ) would also be useful, as would sending your reply to the R-help list and not just to me. Sarah ---------- Forwarded message ---------- From: Istvan Nemeth <furgeurge at gmail.com> Date: Fri, May 4, 2012 at 10:20 AM Subject: Re: [R] read-in, error??? To: Sarah Goslee <sarah.goslee at gmail.com> Dear Sarah, Ctrl-C & Ctrl-V the data from the page to a txt file. Then delete the unwanted "Data:" part the code is: LibPath <- getwd() options(digits=20) SmLs07 <- read.table(file.path(LibPath,"SmLs07.txt"),header=T,colClasses = "numeric") SmLs07$TrtF <- factor(SmLs07$Treatment) lm02 <- lm(Response~TrtF,data=SmLs07) anova(lm02) summary(lm02) I hope it helps to reproduce the phenomena. Thanks, Istv?n 2012/5/4 Sarah Goslee <sarah.goslee at gmail.com>> > Hi Istvan, > > That's most unusual, and quite unlikely (and much larger than the > usual floating-point rounding errors). > > Please provide a reproducible example. I assume you got the data from here: > http://www.itl.nist.gov/div898/strd/anova/SmLs07.dat > > What did you do with it then? How did you delete the header rows? > > What R code did you use to read it in? > > What OS and version of R are you working with? > > R has been well-validated; it's more likely that you did something > sub-optimal while importing the data. > > Sarah > > On Fri, May 4, 2012 at 9:54 AM, Istvan Nemeth <furgeurge at gmail.com> wrote: > > Dear Users! > > > > I encountered with some problem in data reading while I challenged R (and > > me too) in a validation point of view. > > In this issue, I tried to utilize some reference datasets ( > > http://www.itl.nist.gov/div898/strd/index.html). > > And the result departed a bit from my expectations. This dataset dedicated > > to challenge cancellation and accumulation errors (case SmLs07), that's why > > this uncommon look of txt file. > > > > Treatment ? Response > > ? ? ? ? ? 1 ? ?1000000000000.4 > > ? ? ? ? ? 1 ? ?1000000000000.3 > > ? ? ? ? ? 1 ? ?1000000000000.5 > > ? ? ? ? ? ...... > > ? ? ? ? ? 2 ? ?1000000000000.2 > > ? ? ? ? ? 2 ? ?1000000000000.4 > > ? ? ? ? ? ..... > > ? ? ? ? ? 3 ? ?1000000000000.4 > > ? ? ? ? ? 3 ? ?1000000000000.6 > > ? ? ? ? ? 3 ? ?1000000000000.4 > > ? ? ? ? ? ......... > > then after a read.table() I expect the same set instead I've got this: > > > > ? ?Treatment ? ? ? ? ? ? ?Response > > 1 ? ? ? ? ? 1 1000000000000.4000244 > > 2 ? ? ? ? ? 1 1000000000000.3000488 > > 3 ? ? ? ? ? 1 1000000000000.5000000 > > ......... > > 22 ? ? ? ? ?2 1000000000000.3000488 > > 23 ? ? ? ? ?2 1000000000000.1999512 > > 24 ? ? ? ? ?2 1000000000000.4000244 > > ....... > > 58 ? ? ? ? ?3 1000000000000.4000244 > > 59 ? ? ? ? ?3 1000000000000.5999756 > > 60 ? ? ? ? ?3 1000000000000.4000244 > > 61 ? ? ? ? ?3 1000000000000.5999756 > > 62 ? ? ? ? ?3 1000000000000.4000244 > > ...... > > a lots of number from the space. I assume that these numbers come from the > > binary representation of such a tricky decimal numbers but my question is > > how can I avoid this feature of the binary representation? > > > > Moreover, I wondered that it may raise some question in a regulated > > environment. > > > > -- > Sarah Goslee > http://www.functionaldiversity.org-- Sarah Goslee http://www.stringpage.com http://www.sarahgoslee.com http://www.functionaldiversity.org
You didn't mention it, but did you use something like options(digits=20) before displaying that data? In any case, > 1000000000000.4000244 == 1000000000000.4 [1] TRUE because R uses the IEEE-754 double precision floating point arithmetic that all modern computers support. That gives you 52 binary digits of precision, somewhat less than 17 decimal digits, so your difference in the 18th digit is ignored. If you need more than 16 decimal digits of precision, you could break the numbers into parts (via string manipulation, before reading them as numbers) and or use a high precision package like Rmpfr to manipulate them (it will be slow and has limited functionality). Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf > Of Istvan Nemeth > Sent: Friday, May 04, 2012 6:55 AM > To: r-help at r-project.org > Subject: [R] read-in, error??? > > Dear Users! > > I encountered with some problem in data reading while I challenged R (and > me too) in a validation point of view. > In this issue, I tried to utilize some reference datasets ( > http://www.itl.nist.gov/div898/strd/index.html). > And the result departed a bit from my expectations. This dataset dedicated > to challenge cancellation and accumulation errors (case SmLs07), that's why > this uncommon look of txt file. > > Treatment Response > 1 1000000000000.4 > 1 1000000000000.3 > 1 1000000000000.5 > ...... > 2 1000000000000.2 > 2 1000000000000.4 > ..... > 3 1000000000000.4 > 3 1000000000000.6 > 3 1000000000000.4 > ......... > then after a read.table() I expect the same set instead I've got this: > > Treatment Response > 1 1 1000000000000.4000244 > 2 1 1000000000000.3000488 > 3 1 1000000000000.5000000 > ......... > 22 2 1000000000000.3000488 > 23 2 1000000000000.1999512 > 24 2 1000000000000.4000244 > ....... > 58 3 1000000000000.4000244 > 59 3 1000000000000.5999756 > 60 3 1000000000000.4000244 > 61 3 1000000000000.5999756 > 62 3 1000000000000.4000244 > ...... > a lots of number from the space. I assume that these numbers come from the > binary representation of such a tricky decimal numbers but my question is > how can I avoid this feature of the binary representation? > > Moreover, I wondered that it may raise some question in a regulated > environment. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.