thr3ads.net - R devel - [Rd] Randomness not due to seed [Jul 2011]

If this information is useful, please help other people find it:
Share via:

jeroen00ms

2011-Jul-19 13:13 UTC

[Rd] Randomness not due to seed

I am working on a reproducible computing platform for which I would like to
be able to _exactly_ reproduce an R object. However, I am experiencing
unexpected randomness in some calculations. I have a hard time finding out
exactly how it occurs. The code below illustrates the issue. 

mylm1 <- lm(dist~speed, data=cars);
mylm2 <- lm(dist~speed, data=cars);
identical(mylm1, mylm2); #TRUE

makelm <- function(){
	return(lm(dist~speed, data=cars));
}

mylm1 <- makelm();
mylm2 <- makelm();
identical(mylm1, mylm2); #FALSE

When inspecting both objects there seem to be some rounding differences.
Setting a seed does not make a difference. Is there any way I can remove
this randomness and exactly reproduce the object every time?





--
View this message in context:
http://r.789695.n4.nabble.com/Randomness-not-due-to-seed-tp3678082p3678082.html
Sent from the R devel mailing list archive at Nabble.com.

William Dunlap

2011-Jul-19 22:06 UTC

head link

[Rd] Randomness not due to seed

Did you actually see some rounding differences?

The lm objects made in the calls to maklm will
differ in the environments attached to the formula
(because you made the formula in the function).  If
I change both copies of that .Environment attribute
to .GlobalEnv (or any other environment), then identical
reports the objects are the same:

  > attr(attr(mylm1$model, "terms"), ".Environment")
<- .GlobalEnv
  > attr(mylm1$terms, ".Environment") <- .GlobalEnv
  > attr(attr(mylm2$model, "terms"), ".Environment")
<- .GlobalEnv
  > attr(mylm2$terms, ".Environment") <- .GlobalEnv
  > identical(mylm1, mylm2)
  [1] TRUE 

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 
> -----Original Message-----
> From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at
r-project.org] On Behalf Of jeroen00ms
> Sent: Tuesday, July 19, 2011 6:13 AM
> To: r-devel at r-project.org
> Subject: [Rd] Randomness not due to seed
> 
> I am working on a reproducible computing platform for which I would like to
> be able to _exactly_ reproduce an R object. However, I am experiencing
> unexpected randomness in some calculations. I have a hard time finding out
> exactly how it occurs. The code below illustrates the issue.
> 
> mylm1 <- lm(dist~speed, data=cars);
> mylm2 <- lm(dist~speed, data=cars);
> identical(mylm1, mylm2); #TRUE
> 
> makelm <- function(){
> 	return(lm(dist~speed, data=cars));
> }
> 
> mylm1 <- makelm();
> mylm2 <- makelm();
> identical(mylm1, mylm2); #FALSE
> 
> When inspecting both objects there seem to be some rounding differences.
> Setting a seed does not make a difference. Is there any way I can remove
> this randomness and exactly reproduce the object every time?
> 
> 
> 
> 
> 
> --
> View this message in context:
http://r.789695.n4.nabble.com/Randomness-not-due-to-seed-
> tp3678082p3678082.html
> Sent from the R devel mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

Mike Marchywka

2011-Jul-20 00:01 UTC

head link

[Rd] Randomness not due to seed

----------------------------------------> Date: Tue, 19 Jul 2011 06:13:01 -0700
> From: jeroen.ooms at stat.ucla.edu
> To: r-devel at r-project.org
> Subject: [Rd] Randomness not due to seed
>
> I am working on a reproducible computing platform for which I would like to
> be able to _exactly_ reproduce an R object. However, I am experiencing
> unexpected randomness in some calculations. I have a hard time finding out
> exactly how it occurs. The code below illustrates the issue.
>
> mylm1 <- lm(dist~speed, data=cars);
> mylm2 <- lm(dist~speed, data=cars);
> identical(mylm1, mylm2); #TRUE
>
> makelm <- function(){
> return(lm(dist~speed, data=cars));
> }
>
> mylm1 <- makelm();
> mylm2 <- makelm();
> identical(mylm1, mylm2); #FALSE
>
> When inspecting both objects there seem to be some rounding differences.
> Setting a seed does not make a difference. Is there any way I can remove
> this randomness and exactly reproduce the object every time?
I don't know if anyone had a specific answer for this but in general
floating point is not
something for which you want to make bitwise equality tests. You can check the
Intel
website for some references but IIRC the FPU can start your calculation with
bits or
settings ( flushing denorms to zero for example) left over from the last user
although I can't document that.?

for example, you can probably find more like this suggesting that changes in
alignmnet
and rounding in preamble code can be significant, 

http://software.intel.com/en-us/articles/consistency-of-floating-point-results-using-the-intel-compiler/

and of course if your algorithm is numerically sensitive results could change a
lot. Now
its also possible you have unitiliazed or corrupt memory, but you would need to 
consider that you will not get bit wise reproduvibility. You can of course go to
java
if you really want that LOL. 

>
>
>
>
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/Randomness-not-due-to-seed-tp3678082p3678082.html
> Sent from the R devel mailing list archive at Nabble.com.
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

Paul Gilbert

2011-Jul-20 14:42 UTC

head link

[Rd] Randomness not due to seed

It does not look like your calculation is using the random number generator, so
the other responses are probably more to the point.

However, beware that setting the seed is not enough to guarantee the same random
numbers. You need to also make sure you are using the same uniform RNG and any
other generators you use, such as the normal generator. R has a large selection
of possibilities. Your start up settings could change the default behaviour.
Also, relying on the default will be a bit risky if you are interested in
reproducible calculations, because the R default could change in the future (as
it has in the past, and as has the Splus generator in the past).

If the RNG is important for your reproducible calculations then you might want
to look at the examples and tests in the setRNG package.

Paul
> -----Original Message-----
> From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-
> project.org] On Behalf Of jeroen00ms
> Sent: July 19, 2011 9:13 AM
> To: r-devel at r-project.org
> Subject: [Rd] Randomness not due to seed
> 
> I am working on a reproducible computing platform for which I would
> like to
> be able to _exactly_ reproduce an R object. However, I am experiencing
> unexpected randomness in some calculations. I have a hard time finding
> out
> exactly how it occurs. The code below illustrates the issue.
> 
> mylm1 <- lm(dist~speed, data=cars);
> mylm2 <- lm(dist~speed, data=cars);
> identical(mylm1, mylm2); #TRUE
> 
> makelm <- function(){
> 	return(lm(dist~speed, data=cars));
> }
> 
> mylm1 <- makelm();
> mylm2 <- makelm();
> identical(mylm1, mylm2); #FALSE
> 
> When inspecting both objects there seem to be some rounding
> differences.
> Setting a seed does not make a difference. Is there any way I can
> remove
> this randomness and exactly reproduce the object every time?
> 
> 
> 
> 
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/Randomness-
> not-due-to-seed-tp3678082p3678082.html
> Sent from the R devel mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel===================================================================================
La version fran?aise suit le texte anglais.

------------------------------------------------------------------------------------

This email may contain privileged and/or confidential information, and the Bank
of
Canada does not waive any related rights. Any distribution, use, or copying of
this
email or the information it contains by other than the intended recipient is
unauthorized. If you received this email in error please delete it immediately
from
your system and notify the sender promptly by email that you have done so. 

------------------------------------------------------------------------------------

Le pr?sent courriel peut contenir de l'information privil?gi?e ou
confidentielle.
La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute
diffusion,
utilisation ou copie de ce courriel ou des renseignements qu'il contient par
une
personne autre que le ou les destinataires d?sign?s est interdite. Si vous
recevez
ce courriel par erreur, veuillez le supprimer imm?diatement et envoyer sans
d?lai ?
l'exp?diteur un message ?lectronique pour l'aviser que vous avez ?limin?
de votre
ordinateur toute copie du courriel re?u.

Paul Johnson

2011-Jul-25 16:49 UTC

head link

[Rd] Randomness not due to seed

On Tue, Jul 19, 2011 at 8:13 AM, jeroen00ms <jeroen.ooms at stat.ucla.edu>
wrote:> I am working on a reproducible computing platform for which I would like to
> be able to _exactly_ reproduce an R object. However, I am experiencing
> unexpected randomness in some calculations. I have a hard time finding out
> exactly how it occurs. The code below illustrates the issue.
>
> mylm1 <- lm(dist~speed, data=cars);
> mylm2 <- lm(dist~speed, data=cars);
> identical(mylm1, mylm2); #TRUE
>
> makelm <- function(){
> ? ? ? ?return(lm(dist~speed, data=cars));
> }
>
> mylm1 <- makelm();
> mylm2 <- makelm();
> identical(mylm1, mylm2); #FALSE
>
> When inspecting both objects there seem to be some rounding differences.
> Setting a seed does not make a difference. Is there any way I can remove
> this randomness and exactly reproduce the object every time?
>
William Dunlap was correct.  Observe in the sequence of comparisons
below, the difference in the "terms" object is causing the identical
to fail: Everything else associated with this model--the coefficients,
the r-square, cov matrix, etc, exactly match.

> mylm1 <- lm(dist~speed, data=cars);
> mylm2 <- lm(dist~speed, data=cars);
> identical(mylm1, mylm2); #TRUE
[1] TRUE> makelm <- function(){+        return(lm(dist~speed, data=cars));
+ }> mylm1 <- makelm();
> mylm2 <- makelm();
> identical(mylm1, mylm2); #FALSE
[1] FALSE> identical(coef(mylm1), coef(mylm2))
[1] TRUE> identical(summary(mylm1), summary(mylm2))
[1] FALSE> identical(coef(summary(mylm1)), coef(summary(mylm2)))
[1] TRUE> all.equal(mylm1, mylm2)
[1] TRUE> identical(summary(mylm1)$r.squared, summary(mylm2)$r.squared)
[1] TRUE> identical(summary(mylm1)$adj.r.squared, summary(mylm2)$adj.r.squared)
[1] TRUE> identical(summary(mylm1)$sigma, summary(mylm2)$sigma)
[1] TRUE> identical(summary(mylm1)$fstatistic, summary(mylm2)$fstatistic)
[1] TRUE> identical(summary(mylm1)$residuals, summary(mylm2)$residuals)
[1] TRUE> identical(summary(mylm1)$cov.unscaled, summary(mylm2)$cov.unscaled)
[1] TRUE> identical(summary(mylm1)$call, summary(mylm2)$call)
[1] TRUE> identical(summary(mylm1)$terms, summary(mylm2)$terms)[1] FALSE
> summary(mylm2)$termsdist ~ speed
attr(,"variables")
list(dist, speed)
attr(,"factors")
      speed
dist      0
speed     1
attr(,"term.labels")
[1] "speed"
attr(,"order")
[1] 1
attr(,"intercept")
[1] 1
attr(,"response")
[1] 1
attr(,".Environment")
<environment: 0x1b76ae0>
attr(,"predvars")
list(dist, speed)
attr(,"dataClasses")
     dist     speed
"numeric" "numeric">
> summary(mylm1)$termsdist ~ speed
attr(,"variables")
list(dist, speed)
attr(,"factors")
      speed
dist      0
speed     1
attr(,"term.labels")
[1] "speed"
attr(,"order")
[1] 1
attr(,"intercept")
[1] 1
attr(,"response")
[1] 1
attr(,".Environment")
<environment: 0x1cf06b8>
attr(,"predvars")
list(dist, speed)
attr(,"dataClasses")
     dist     speed
"numeric" "numeric"




-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

Seemingly Similar Threads

Search for more apparently analagous threads

R devel - Jul 2011 - Randomness not due to seed

[Rd] Randomness not due to seed

[Rd] Randomness not due to seed

[Rd] Randomness not due to seed

[Rd] Randomness not due to seed

[Rd] Randomness not due to seed

Seemingly Similar Threads