Dear Duncan, I had been thinking about FAQ 7.31. I tried to create a dummy dataset with the same structure to replicate the problem with the need of sending my dataset. However all of them gave identical() results between 32-bit and 64-bit. Note that coef()$fRow is a 1266 x 6 data.frame. Is it correct to infer that tiny difference between 32-bit and 64-bit are possible but have a low probability of occurring? signif() makes indeed more sense than round(). Using 20 digits gives identical results, 21 digits gives non identical results. Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2015-06-03 18:09 GMT+02:00 Duncan Murdoch <murdoch.duncan at gmail.com>:> On 03/06/2015 11:56 AM, Thierry Onkelinx wrote: > > Dear all, > > > > I'm a bit puzzled by the difference in an object when created in R 32-bit > > and R 64-bit. > > > > Consider the code below. test.rda is available at > > > https://drive.google.com/file/d/0BzBrlGSuB9n-NFBWeC1TR093Sms/view?usp=sharing > > > > # Run in R 3.2.0 Windows 32-bit, lme4 1.1-8 > > library(lme4) > > load("test.rda") > > coef.32 <- coef(test) > > save(coef.32, file = "32bit.rda") > > > > # Run in R 3.2.0 Windows 64-bit, lme4 1.1-8 > > library(lme4) > > load("~/test.rda") > > coef.64 <- coef(test) > > save(coef.64, file = "64bit.rda") > > > > > > # Compare the results > > # Run in R 3.2.0 Windows 32-bit, lme4 1.1-8 > > # Run in R 3.2.0 Windows 64-bit, lme4 1.1-8 > > library(lme4) > > load("32bit.rda") > > load("64bit.rda") > > identical(coef.32, coef.64) # FALSE > > identical(coef.32$fRow, coef.64$fRow) # FALSE > > identical(coef.32$fLocation, coef.64$fLocation) # TRUE > > identical(coef.32$fSubLocation, coef.64$fSubLocation) # TRUE > > > > The first comparison is FALSE, because the second is FALSE. But why is > the > > second FALSE and the third and fourth TRUE? > > > > My goal is the calculate a SHA1 hash on the coef(test) to track if the > > coefficients of test have changed. I'd like to get the same hash on a > > 32-bit and 64-bit system. A simple hack would be to calculate the hash on > > round(coef(test), 20). Is that a good or bad idea? > > > > identical(round(coef.32$fRow, 20), round(coef.64$fRow, 20)) # TRUE > > Different math libraries round differently, so small differences are > expected. This is FAQ 7.31. In many cases the 32 bit calculations are > more accurate, because they tend to use more 80 bit extended precision > intermediate values, but that is not guaranteed. > > Rounding before comparing makes sense, but I would use signif() instead > of round(), I would choose a relatively small number of significant > digits, and I would expect to see a few false positives: if the true > value is 0 but some "random" noise is added, I'd expect values rounded > by signif() to be unequal. > > Duncan Murdoch > > > > > Best regards, > > > > ir. Thierry Onkelinx > > Instituut voor natuur- en bosonderzoek / Research Institute for Nature > and > > Forest > > team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance > > Kliniekstraat 25 > > 1070 Anderlecht > > Belgium > > > > To call in the statistician after the experiment is done may be no more > > than asking him to perform a post-mortem examination: he may be able to > say > > what the experiment died of. ~ Sir Ronald Aylmer Fisher > > The plural of anecdote is not data. ~ Roger Brinner > > The combination of some data and an aching desire for an answer does not > > ensure that a reasonable answer can be extracted from a given body of > data. > > ~ John Tukey > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > >[[alternative HTML version deleted]]
On 04/06/2015 3:59 AM, Thierry Onkelinx wrote:> Dear Duncan, > > I had been thinking about FAQ 7.31. I tried to create a dummy dataset > with the same structure to replicate the problem with the need of > sending my dataset. However all of them gave identical() results between > 32-bit and 64-bit. Note that coef()$fRow is a 1266 x 6 data.frame. Is it > correct to infer that tiny difference between 32-bit and 64-bit are > possible but have a low probability of occurring?Differences are rare, but it's hard to assign a probability to them. Duncan Murdoch> > signif() makes indeed more sense than round(). Using 20 digits gives > identical results, 21 digits gives non identical results. > > Best regards, > > ir. Thierry Onkelinx > Instituut voor natuur- en bosonderzoek / Research Institute for Nature > and Forest > team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance > Kliniekstraat 25 > 1070 Anderlecht > Belgium > > To call in the statistician after the experiment is done may be no more > than asking him to perform a post-mortem examination: he may be able to > say what the experiment died of. ~ Sir Ronald Aylmer Fisher > The plural of anecdote is not data. ~ Roger Brinner > The combination of some data and an aching desire for an answer does not > ensure that a reasonable answer can be extracted from a given body of > data. ~ John Tukey > > 2015-06-03 18:09 GMT+02:00 Duncan Murdoch <murdoch.duncan at gmail.com > <mailto:murdoch.duncan at gmail.com>>: > > On 03/06/2015 11:56 AM, Thierry Onkelinx wrote: > > Dear all, > > > > I'm a bit puzzled by the difference in an object when created in R > 32-bit > > and R 64-bit. > > > > Consider the code below. test.rda is available at > > > https://drive.google.com/file/d/0BzBrlGSuB9n-NFBWeC1TR093Sms/view?usp=sharing > > > > # Run in R 3.2.0 Windows 32-bit, lme4 1.1-8 > > library(lme4) > > load("test.rda") > > coef.32 <- coef(test) > > save(coef.32, file = "32bit.rda") > > > > # Run in R 3.2.0 Windows 64-bit, lme4 1.1-8 > > library(lme4) > > load("~/test.rda") > > coef.64 <- coef(test) > > save(coef.64, file = "64bit.rda") > > > > > > # Compare the results > > # Run in R 3.2.0 Windows 32-bit, lme4 1.1-8 > > # Run in R 3.2.0 Windows 64-bit, lme4 1.1-8 > > library(lme4) > > load("32bit.rda") > > load("64bit.rda") > > identical(coef.32, coef.64) # FALSE > > identical(coef.32$fRow, coef.64$fRow) # FALSE > > identical(coef.32$fLocation, coef.64$fLocation) # TRUE > > identical(coef.32$fSubLocation, coef.64$fSubLocation) # TRUE > > > > The first comparison is FALSE, because the second is FALSE. But > why is the > > second FALSE and the third and fourth TRUE? > > > > My goal is the calculate a SHA1 hash on the coef(test) to track if the > > coefficients of test have changed. I'd like to get the same hash on a > > 32-bit and 64-bit system. A simple hack would be to calculate the > hash on > > round(coef(test), 20). Is that a good or bad idea? > > > > identical(round(coef.32$fRow, 20), round(coef.64$fRow, 20)) # TRUE > > Different math libraries round differently, so small differences are > expected. This is FAQ 7.31. In many cases the 32 bit calculations are > more accurate, because they tend to use more 80 bit extended precision > intermediate values, but that is not guaranteed. > > Rounding before comparing makes sense, but I would use signif() instead > of round(), I would choose a relatively small number of significant > digits, and I would expect to see a few false positives: if the true > value is 0 but some "random" noise is added, I'd expect values rounded > by signif() to be unequal. > > Duncan Murdoch > > > > > Best regards, > > > > ir. Thierry Onkelinx > > Instituut voor natuur- en bosonderzoek / Research Institute for Nature and > > Forest > > team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance > > Kliniekstraat 25 > > 1070 Anderlecht > > Belgium > > > > To call in the statistician after the experiment is done may be no more > > than asking him to perform a post-mortem examination: he may be able to say > > what the experiment died of. ~ Sir Ronald Aylmer Fisher > > The plural of anecdote is not data. ~ Roger Brinner > > The combination of some data and an aching desire for an answer does not > > ensure that a reasonable answer can be extracted from a given body of data. > > ~ John Tukey > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org <mailto:R-help at r-project.org> mailing list -- > To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > >
"low probability of occurring" was just statisticians lingo for "rare" ;-) ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2015-06-04 11:53 GMT+02:00 Duncan Murdoch <murdoch.duncan at gmail.com>:> On 04/06/2015 3:59 AM, Thierry Onkelinx wrote: > > Dear Duncan, > > > > I had been thinking about FAQ 7.31. I tried to create a dummy dataset > > with the same structure to replicate the problem with the need of > > sending my dataset. However all of them gave identical() results between > > 32-bit and 64-bit. Note that coef()$fRow is a 1266 x 6 data.frame. Is it > > correct to infer that tiny difference between 32-bit and 64-bit are > > possible but have a low probability of occurring? > > Differences are rare, but it's hard to assign a probability to them. > > Duncan Murdoch > > > > > signif() makes indeed more sense than round(). Using 20 digits gives > > identical results, 21 digits gives non identical results. > > > > Best regards, > > > > ir. Thierry Onkelinx > > Instituut voor natuur- en bosonderzoek / Research Institute for Nature > > and Forest > > team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance > > Kliniekstraat 25 > > 1070 Anderlecht > > Belgium > > > > To call in the statistician after the experiment is done may be no more > > than asking him to perform a post-mortem examination: he may be able to > > say what the experiment died of. ~ Sir Ronald Aylmer Fisher > > The plural of anecdote is not data. ~ Roger Brinner > > The combination of some data and an aching desire for an answer does not > > ensure that a reasonable answer can be extracted from a given body of > > data. ~ John Tukey > > > > 2015-06-03 18:09 GMT+02:00 Duncan Murdoch <murdoch.duncan at gmail.com > > <mailto:murdoch.duncan at gmail.com>>: > > > > On 03/06/2015 11:56 AM, Thierry Onkelinx wrote: > > > Dear all, > > > > > > I'm a bit puzzled by the difference in an object when created in R > > 32-bit > > > and R 64-bit. > > > > > > Consider the code below. test.rda is available at > > > > > > https://drive.google.com/file/d/0BzBrlGSuB9n-NFBWeC1TR093Sms/view?usp=sharing > > > > > > # Run in R 3.2.0 Windows 32-bit, lme4 1.1-8 > > > library(lme4) > > > load("test.rda") > > > coef.32 <- coef(test) > > > save(coef.32, file = "32bit.rda") > > > > > > # Run in R 3.2.0 Windows 64-bit, lme4 1.1-8 > > > library(lme4) > > > load("~/test.rda") > > > coef.64 <- coef(test) > > > save(coef.64, file = "64bit.rda") > > > > > > > > > # Compare the results > > > # Run in R 3.2.0 Windows 32-bit, lme4 1.1-8 > > > # Run in R 3.2.0 Windows 64-bit, lme4 1.1-8 > > > library(lme4) > > > load("32bit.rda") > > > load("64bit.rda") > > > identical(coef.32, coef.64) # FALSE > > > identical(coef.32$fRow, coef.64$fRow) # FALSE > > > identical(coef.32$fLocation, coef.64$fLocation) # TRUE > > > identical(coef.32$fSubLocation, coef.64$fSubLocation) # TRUE > > > > > > The first comparison is FALSE, because the second is FALSE. But > > why is the > > > second FALSE and the third and fourth TRUE? > > > > > > My goal is the calculate a SHA1 hash on the coef(test) to track if > the > > > coefficients of test have changed. I'd like to get the same hash > on a > > > 32-bit and 64-bit system. A simple hack would be to calculate the > > hash on > > > round(coef(test), 20). Is that a good or bad idea? > > > > > > identical(round(coef.32$fRow, 20), round(coef.64$fRow, 20)) # TRUE > > > > Different math libraries round differently, so small differences are > > expected. This is FAQ 7.31. In many cases the 32 bit calculations > are > > more accurate, because they tend to use more 80 bit extended > precision > > intermediate values, but that is not guaranteed. > > > > Rounding before comparing makes sense, but I would use signif() > instead > > of round(), I would choose a relatively small number of significant > > digits, and I would expect to see a few false positives: if the true > > value is 0 but some "random" noise is added, I'd expect values > rounded > > by signif() to be unequal. > > > > Duncan Murdoch > > > > > > > > Best regards, > > > > > > ir. Thierry Onkelinx > > > Instituut voor natuur- en bosonderzoek / Research Institute for > Nature and > > > Forest > > > team Biometrie & Kwaliteitszorg / team Biometrics & Quality > Assurance > > > Kliniekstraat 25 > > > 1070 Anderlecht > > > Belgium > > > > > > To call in the statistician after the experiment is done may be no > more > > > than asking him to perform a post-mortem examination: he may be > able to say > > > what the experiment died of. ~ Sir Ronald Aylmer Fisher > > > The plural of anecdote is not data. ~ Roger Brinner > > > The combination of some data and an aching desire for an answer > does not > > > ensure that a reasonable answer can be extracted from a given body > of data. > > > ~ John Tukey > > > > > > [[alternative HTML version deleted]] > > > > > > ______________________________________________ > > > R-help at r-project.org <mailto:R-help at r-project.org> mailing list -- > > To UNSUBSCRIBE and more, see > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > > >[[alternative HTML version deleted]]