I am working on a reproducible computing platform for which I would like to be able to _exactly_ reproduce an R object. However, I am experiencing unexpected randomness in some calculations. I have a hard time finding out exactly how it occurs. The code below illustrates the issue. mylm1 <- lm(dist~speed, data=cars); mylm2 <- lm(dist~speed, data=cars); identical(mylm1, mylm2); #TRUE makelm <- function(){ return(lm(dist~speed, data=cars)); } mylm1 <- makelm(); mylm2 <- makelm(); identical(mylm1, mylm2); #FALSE When inspecting both objects there seem to be some rounding differences. Setting a seed does not make a difference. Is there any way I can remove this randomness and exactly reproduce the object every time? -- View this message in context: http://r.789695.n4.nabble.com/Randomness-not-due-to-seed-tp3678082p3678082.html Sent from the R devel mailing list archive at Nabble.com.
Did you actually see some rounding differences? The lm objects made in the calls to maklm will differ in the environments attached to the formula (because you made the formula in the function). If I change both copies of that .Environment attribute to .GlobalEnv (or any other environment), then identical reports the objects are the same: > attr(attr(mylm1$model, "terms"), ".Environment") <- .GlobalEnv > attr(mylm1$terms, ".Environment") <- .GlobalEnv > attr(attr(mylm2$model, "terms"), ".Environment") <- .GlobalEnv > attr(mylm2$terms, ".Environment") <- .GlobalEnv > identical(mylm1, mylm2) [1] TRUE Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com> -----Original Message----- > From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org] On Behalf Of jeroen00ms > Sent: Tuesday, July 19, 2011 6:13 AM > To: r-devel at r-project.org > Subject: [Rd] Randomness not due to seed > > I am working on a reproducible computing platform for which I would like to > be able to _exactly_ reproduce an R object. However, I am experiencing > unexpected randomness in some calculations. I have a hard time finding out > exactly how it occurs. The code below illustrates the issue. > > mylm1 <- lm(dist~speed, data=cars); > mylm2 <- lm(dist~speed, data=cars); > identical(mylm1, mylm2); #TRUE > > makelm <- function(){ > return(lm(dist~speed, data=cars)); > } > > mylm1 <- makelm(); > mylm2 <- makelm(); > identical(mylm1, mylm2); #FALSE > > When inspecting both objects there seem to be some rounding differences. > Setting a seed does not make a difference. Is there any way I can remove > this randomness and exactly reproduce the object every time? > > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Randomness-not-due-to-seed- > tp3678082p3678082.html > Sent from the R devel mailing list archive at Nabble.com. > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
----------------------------------------> Date: Tue, 19 Jul 2011 06:13:01 -0700 > From: jeroen.ooms at stat.ucla.edu > To: r-devel at r-project.org > Subject: [Rd] Randomness not due to seed > > I am working on a reproducible computing platform for which I would like to > be able to _exactly_ reproduce an R object. However, I am experiencing > unexpected randomness in some calculations. I have a hard time finding out > exactly how it occurs. The code below illustrates the issue. > > mylm1 <- lm(dist~speed, data=cars); > mylm2 <- lm(dist~speed, data=cars); > identical(mylm1, mylm2); #TRUE > > makelm <- function(){ > return(lm(dist~speed, data=cars)); > } > > mylm1 <- makelm(); > mylm2 <- makelm(); > identical(mylm1, mylm2); #FALSE > > When inspecting both objects there seem to be some rounding differences. > Setting a seed does not make a difference. Is there any way I can remove > this randomness and exactly reproduce the object every time?I don't know if anyone had a specific answer for this but in general floating point is not something for which you want to make bitwise equality tests. You can check the Intel website for some references but IIRC the FPU can start your calculation with bits or settings ( flushing denorms to zero for example) left over from the last user although I can't document that.? for example, you can probably find more like this suggesting that changes in alignmnet and rounding in preamble code can be significant, http://software.intel.com/en-us/articles/consistency-of-floating-point-results-using-the-intel-compiler/ and of course if your algorithm is numerically sensitive results could change a lot. Now its also possible you have unitiliazed or corrupt memory, but you would need to consider that you will not get bit wise reproduvibility. You can of course go to java if you really want that LOL.> > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Randomness-not-due-to-seed-tp3678082p3678082.html > Sent from the R devel mailing list archive at Nabble.com. > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
It does not look like your calculation is using the random number generator, so the other responses are probably more to the point. However, beware that setting the seed is not enough to guarantee the same random numbers. You need to also make sure you are using the same uniform RNG and any other generators you use, such as the normal generator. R has a large selection of possibilities. Your start up settings could change the default behaviour. Also, relying on the default will be a bit risky if you are interested in reproducible calculations, because the R default could change in the future (as it has in the past, and as has the Splus generator in the past). If the RNG is important for your reproducible calculations then you might want to look at the examples and tests in the setRNG package. Paul> -----Original Message----- > From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r- > project.org] On Behalf Of jeroen00ms > Sent: July 19, 2011 9:13 AM > To: r-devel at r-project.org > Subject: [Rd] Randomness not due to seed > > I am working on a reproducible computing platform for which I would > like to > be able to _exactly_ reproduce an R object. However, I am experiencing > unexpected randomness in some calculations. I have a hard time finding > out > exactly how it occurs. The code below illustrates the issue. > > mylm1 <- lm(dist~speed, data=cars); > mylm2 <- lm(dist~speed, data=cars); > identical(mylm1, mylm2); #TRUE > > makelm <- function(){ > return(lm(dist~speed, data=cars)); > } > > mylm1 <- makelm(); > mylm2 <- makelm(); > identical(mylm1, mylm2); #FALSE > > When inspecting both objects there seem to be some rounding > differences. > Setting a seed does not make a difference. Is there any way I can > remove > this randomness and exactly reproduce the object every time? > > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Randomness- > not-due-to-seed-tp3678082p3678082.html > Sent from the R devel mailing list archive at Nabble.com. > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel=================================================================================== La version fran?aise suit le texte anglais. ------------------------------------------------------------------------------------ This email may contain privileged and/or confidential information, and the Bank of Canada does not waive any related rights. Any distribution, use, or copying of this email or the information it contains by other than the intended recipient is unauthorized. If you received this email in error please delete it immediately from your system and notify the sender promptly by email that you have done so. ------------------------------------------------------------------------------------ Le pr?sent courriel peut contenir de l'information privil?gi?e ou confidentielle. La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute diffusion, utilisation ou copie de ce courriel ou des renseignements qu'il contient par une personne autre que le ou les destinataires d?sign?s est interdite. Si vous recevez ce courriel par erreur, veuillez le supprimer imm?diatement et envoyer sans d?lai ? l'exp?diteur un message ?lectronique pour l'aviser que vous avez ?limin? de votre ordinateur toute copie du courriel re?u.
On Tue, Jul 19, 2011 at 8:13 AM, jeroen00ms <jeroen.ooms at stat.ucla.edu> wrote:> I am working on a reproducible computing platform for which I would like to > be able to _exactly_ reproduce an R object. However, I am experiencing > unexpected randomness in some calculations. I have a hard time finding out > exactly how it occurs. The code below illustrates the issue. > > mylm1 <- lm(dist~speed, data=cars); > mylm2 <- lm(dist~speed, data=cars); > identical(mylm1, mylm2); #TRUE > > makelm <- function(){ > ? ? ? ?return(lm(dist~speed, data=cars)); > } > > mylm1 <- makelm(); > mylm2 <- makelm(); > identical(mylm1, mylm2); #FALSE > > When inspecting both objects there seem to be some rounding differences. > Setting a seed does not make a difference. Is there any way I can remove > this randomness and exactly reproduce the object every time? >William Dunlap was correct. Observe in the sequence of comparisons below, the difference in the "terms" object is causing the identical to fail: Everything else associated with this model--the coefficients, the r-square, cov matrix, etc, exactly match.> mylm1 <- lm(dist~speed, data=cars); > mylm2 <- lm(dist~speed, data=cars); > identical(mylm1, mylm2); #TRUE[1] TRUE> makelm <- function(){+ return(lm(dist~speed, data=cars)); + }> mylm1 <- makelm(); > mylm2 <- makelm(); > identical(mylm1, mylm2); #FALSE[1] FALSE> identical(coef(mylm1), coef(mylm2))[1] TRUE> identical(summary(mylm1), summary(mylm2))[1] FALSE> identical(coef(summary(mylm1)), coef(summary(mylm2)))[1] TRUE> all.equal(mylm1, mylm2)[1] TRUE> identical(summary(mylm1)$r.squared, summary(mylm2)$r.squared)[1] TRUE> identical(summary(mylm1)$adj.r.squared, summary(mylm2)$adj.r.squared)[1] TRUE> identical(summary(mylm1)$sigma, summary(mylm2)$sigma)[1] TRUE> identical(summary(mylm1)$fstatistic, summary(mylm2)$fstatistic)[1] TRUE> identical(summary(mylm1)$residuals, summary(mylm2)$residuals)[1] TRUE> identical(summary(mylm1)$cov.unscaled, summary(mylm2)$cov.unscaled)[1] TRUE> identical(summary(mylm1)$call, summary(mylm2)$call)[1] TRUE> identical(summary(mylm1)$terms, summary(mylm2)$terms)[1] FALSE> summary(mylm2)$termsdist ~ speed attr(,"variables") list(dist, speed) attr(,"factors") speed dist 0 speed 1 attr(,"term.labels") [1] "speed" attr(,"order") [1] 1 attr(,"intercept") [1] 1 attr(,"response") [1] 1 attr(,".Environment") <environment: 0x1b76ae0> attr(,"predvars") list(dist, speed) attr(,"dataClasses") dist speed "numeric" "numeric"> > summary(mylm1)$termsdist ~ speed attr(,"variables") list(dist, speed) attr(,"factors") speed dist 0 speed 1 attr(,"term.labels") [1] "speed" attr(,"order") [1] 1 attr(,"intercept") [1] 1 attr(,"response") [1] 1 attr(,".Environment") <environment: 0x1cf06b8> attr(,"predvars") list(dist, speed) attr(,"dataClasses") dist speed "numeric" "numeric" -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas