Graham Williams
2011-Feb-10 11:37 UTC
[Rd] R 2.12.1 Windows 32bit and 64bit - are numerical differences expected?
Should one expect minor numerical differences between 64bit and 32bit R on Windows? Hunting around the lists I've not been able to find a definitive answer yet. Seems plausible using different precision arithmetic, but waned to confirm from those who might know for sure. BACKGROUND A colleague was trying to replicate some modelling results (from a soon to be published book) using rpart, ada, and randomForest, for example. My 64bit Linux and 64bit Windows 7 always agree (so far), but not their 32bit Windows. I've distilled it to a few simple lines of code to replicate the differences (but had to stay with the weather dataset from rattle since could not replicate on standard datasets yet). library(rpart) library(rattle) set.seed(41) model <- rpart(RainTomorrow ~ ., data=weather[-c(1, 2, 23)], control=rpart.control(minbucket=0)) print(model$cptable) Final row on 32bit: 9 0.01000000 23 0.1515152 1.1060606 0.1158273 Final row on 64bit: 9 0.01000000 23 0.1515152 1.0909091 0.1152273 Pretty minor, but different. I've not found any seed other than 41 (only tried a few) that results in a difference. library(ada) # using rpart underneath set.seed(41) model <- ada(RainTomorrow ~ ., data=weather[-c(1, 2, 23)]) print(model) On 32bit: Train Error: 0.057 On 64bit: Train Error: 0.055 Changing the seed to 42, for example, brings them into sync. library(randomForest) set.seed(41) model <- randomForest(RainTomorrow ~ ., data=weather[-c(1, 2, 23)], importance=TRUE, na.action=na.roughfix) print(model) On 32bit: OOB estimate of error rate: 12.84% On 64bit: OOB estimate of error rate: 11.75%> sessionInfo()R version 2.12.1 (2010-12-16) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252 [3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C [5] LC_TIME=English_Australia.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] randomForest_4.5-36 pmml_1.2.27 XML_3.2-0.2 [4] colorspace_1.0-1 RGtk2_2.20.3 ada_2.0-2 [7] rattle_2.6.2 rpart_3.1-47 loaded via a namespace (and not attached): [1] tools_2.12.1> sessionInfo()R version 2.12.1 (2010-12-16) Platform: x86_64-pc-mingw32/x64 (64-bit) ... Thanks, Graham [[alternative HTML version deleted]]
Duncan Murdoch
2011-Feb-10 12:39 UTC
[Rd] R 2.12.1 Windows 32bit and 64bit - are numerical differences expected?
On 11-02-10 6:37 AM, Graham Williams wrote:> Should one expect minor numerical differences between 64bit and 32bit R on > Windows? Hunting around the lists I've not been able to find a definitive > answer yet. Seems plausible using different precision arithmetic, but waned > to confirm from those who might know for sure.I think our goal is that those results should be as close as possible. R uses the same precision in both 32 bit and 64 bit; the differences are all in pointers, not floating point values. However, the two versions use different run-time libraries, and it is possible that there are precision differences coming from there. I think we'd be interested in knowing what they are even if they are beyond our control, so I would appreciate it if you could track down where the difference arises. Duncan Murdoch> > BACKGROUND > > A colleague was trying to replicate some modelling results (from a soon to > be published book) using rpart, ada, and randomForest, for example. My 64bit > Linux and 64bit Windows 7 always agree (so far), but not their 32bit > Windows. I've distilled it to a few simple lines of code to replicate the > differences (but had to stay with the weather dataset from rattle since > could not replicate on standard datasets yet). > > library(rpart) > library(rattle) > set.seed(41) > model<- rpart(RainTomorrow ~ ., data=weather[-c(1, 2, > 23)], control=rpart.control(minbucket=0)) > print(model$cptable) > > Final row on 32bit: 9 0.01000000 23 0.1515152 1.1060606 0.1158273 > Final row on 64bit: 9 0.01000000 23 0.1515152 1.0909091 0.1152273 > > Pretty minor, but different. I've not found any seed other than 41 (only > tried a few) that results in a difference. > > library(ada) # using rpart underneath > set.seed(41) > model<- ada(RainTomorrow ~ ., data=weather[-c(1, 2, 23)]) > print(model) > > On 32bit: Train Error: 0.057 > On 64bit: Train Error: 0.055 > > Changing the seed to 42, for example, brings them into sync. > > library(randomForest) > set.seed(41) > model<- randomForest(RainTomorrow ~ ., data=weather[-c(1, 2, 23)], > importance=TRUE, na.action=na.roughfix) > print(model) > > On 32bit: OOB estimate of error rate: 12.84% > On 64bit: OOB estimate of error rate: 11.75% > > >> sessionInfo() > R version 2.12.1 (2010-12-16) > Platform: i386-pc-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252 > [3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C > [5] LC_TIME=English_Australia.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] randomForest_4.5-36 pmml_1.2.27 XML_3.2-0.2 > [4] colorspace_1.0-1 RGtk2_2.20.3 ada_2.0-2 > [7] rattle_2.6.2 rpart_3.1-47 > > loaded via a namespace (and not attached): > [1] tools_2.12.1 > >> sessionInfo() > R version 2.12.1 (2010-12-16) > Platform: x86_64-pc-mingw32/x64 (64-bit) > ... > > > Thanks, > Graham > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Petr Savicky
2011-Feb-10 13:33 UTC
[Rd] R 2.12.1 Windows 32bit and 64bit - are numerical differences expected?
On Thu, Feb 10, 2011 at 10:37:09PM +1100, Graham Williams wrote:> Should one expect minor numerical differences between 64bit and 32bit R on > Windows? Hunting around the lists I've not been able to find a definitive > answer yet. Seems plausible using different precision arithmetic, but waned > to confirm from those who might know for sure.One of the sources for the difference between platforms are different settings of the compiler. On Intel processors, the options may influence, whether the registers use 80 bit or 64 bit representation of floating point numbers. In memory, it is always 64 bit. Testing, whether there is a difference between registers and memory may be done for example using the code #include <stdio.h> #define n 3 int main(int agc, char *argv[]) { double x[n]; int i; for (i=0; i<n; i++) { x[i] = 1.0/(i + 5); } for (i=0; i<n; i++) { if (x[i] != 1.0/(i + 5)) { printf("difference for %d\n", i); } } return 0; } If the compiler uses SSE arithmetic (-mfpmath=sse), then the output is empty. If Intel's extended arithmetic is used, then we get difference for 0 difference for 1 difference for 2 On 32 bit Linuxes, the default was Intel's extended arithmetic, while on 64 bit Linuxes, the default is sometimes SSE. I do not know the situation on Windows. Another source of difference is different optimization of expressions. It is sometimes possible to obtain identical results on different platforms, however, it cannot be generally guaranteed. For tree construction, even minor differences in rounding may influence comparisons and this may result in a different form of the tree. Petr Savicky.