thr3ads.net - R devel - [Rd] Which function can change RNG state? [Feb 2015]

If this information is useful, please help other people find it:
Share via:

otoomet

2015-Feb-08 03:52 UTC

[Rd] Which function can change RNG state?

Today I struggled for hours to understand some unexpected package test
results.  It turned out that this is because package "parallel",
buried deep
in my dependencies, calls runif() during it's initialization and in this way
changes the random number sequence.   This seems to be a part of a more
general question--which kind of functions can we trust if we want to
preserve random numbers, and the rest of the environment, like graphical
parameters?    

One could put all the burden on the end user--never trust any of the calls
you make, it is your responsibility to save the RNG state if you wish. 
However, this might be very cumbersome if you have to do this around all
cat(), print() and similar calls that most likely will never tinker with
random numbers.   For instance, can I be sure that
set.seed(0); print(runif(1)); print(rnorm(1))
will always print the same numbers, also in the future version of R?  There
are also commands that we definitely expect to change the RNG state, like
random numbers, bootstrapping, etc.  I just did not expect that library() is
one of them...

I think it would be nice to have some sort of guidelines here.  The only
related sentence I am aware of is in the CRAN policy:
- Packages should not modify the global environment (user?s workspace). 

Any thoughts?

Cheers,
Ott



--
View this message in context:
http://r.789695.n4.nabble.com/Which-function-can-change-RNG-state-tp4702935.html
Sent from the R devel mailing list archive at Nabble.com.

Dirk Eddelbuettel

2015-Feb-08 14:33 UTC

head link

[Rd] Which function can change RNG state?

On 7 February 2015 at 19:52, otoomet wrote:
| random numbers.   For instance, can I be sure that
| set.seed(0); print(runif(1)); print(rnorm(1))
| will always print the same numbers, also in the future version of R?  There

Yes, pretty much.

I've been lurking here over fifteen years, and while I am getting old and
forgetful I can remember exactly one such change where behaviour was changed,
and (one of the) generators was altered---if memory serves in the earlier
days of R 1.* days . [ Goes digging...] Yes, see `help(RNGkind)` which
details that R 1.7.0 made a change when "Buggy Kinderman-Ramage" was
added as
the old value, and "Kinderman-Ramage" was repaired.  There once was a
similar
fix in the very early days of the Mersenne-Twister which is why the GNU GSL
has two variants with suffixes _1998 and _1998.

So your issue seems like pilot error to me:  don't attach the parallel
package
if you do not plan to work in parallel.  But "do if you do", and see
its fine
vignette on how it provides you reproducibility for multiple RNG streams.

In general, you can very much trust R (and R Core) in these matters. 

Dirk

-- 
http://dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org

Gábor Csárdi

2015-Feb-08 14:40 UTC

head link

[Rd] Which function can change RNG state?

On Sat, Feb 7, 2015 at 10:52 PM, otoomet <otoomet at gmail.com> wrote:
[...]
> random numbers.   For instance, can I be sure that
> set.seed(0); print(runif(1)); print(rnorm(1))
> will always print the same numbers, also in the future version of R?

I don't know if there is intention to keep this reproducible across R
versions, but it is already not reproducible across platforms (with the
same R version):
http://stackoverflow.com/questions/21212326/floating-point-arithmetic-and-reproducibility
FYI.

Gabor

	[[alternative HTML version deleted]]

Paul Gilbert

2015-Feb-09 05:03 UTC

head link

[Rd] Which function can change RNG state?

On 02/08/2015 09:33 AM, Dirk Eddelbuettel wrote:>
> On 7 February 2015 at 19:52, otoomet wrote:
> | random numbers.   For instance, can I be sure that
> | set.seed(0); print(runif(1)); print(rnorm(1))
> | will always print the same numbers, also in the future version of R? 
There
>
> Yes, pretty much.
This is nearly correct. The user could change the uniform or normal 
generator, since there are options other than the defaults, which would 
mean the result would be different. And obviously if they changed print 
precision then the printed result may be truncated differently.

I think you could prepare for future versions of R by saving information 
about the generators you are using. The precedent has already been set 
(R-1.7.0) that the default could change if there is a good reason. A 
good reason might be that the RNG is found not to be so good relative to 
others that become available. But I think the old generator would 
continue to be available, so people can reproduce old results. (Package 
setRNG has some utilities to help save and reset, but there is nothing 
especially difficult or fancy, just a few details that need to be 
remembered.)>
> I've been lurking here over fifteen years, and while I am getting old
and
> forgetful I can remember exactly one such change where behaviour was
changed,
> and (one of the) generators was altered---if memory serves in the earlier
> days of R 1.* days . [ Goes digging...] Yes, see `help(RNGkind)` which
> details that R 1.7.0 made a change when "Buggy Kinderman-Ramage"
was added as
> the old value, and "Kinderman-Ramage" was repaired.  There once
was a similar
> fix in the very early days of the Mersenne-Twister which is why the GNU GSL
> has two variants with suffixes _1998 and _1998.
I seem to recall a bit of change around R-0.49 but old and forgetful 
would cover this too. For me, a bigger change was an unadvertised change 
in Splus - they compiled against a different math library at some point. 
This changed the lower bits in results, mostly insignificant but 
accumulated simulation results could amount to something fairly 
important. The amount of time I spent trying to find why results would 
not reproduce was one of my main motivations for starting to use
R.>
> So your issue seems like pilot error to me:  don't attach the parallel
package
> if you do not plan to work in parallel.  But "do if you do", and
see its fine
> vignette on how it provides you reproducibility for multiple RNG streams.
>
> In general, you can very much trust R (and R Core) in these matters.
>
> Dirk
On 02/08/2015 09:40 AM, G?bor Cs?rdi wrote:> On Sat, Feb 7, 2015 at
 > I don't know if there is intention to keep this reproducible across R
 > versions, but it is already not reproducible across platforms (with
 >the same R version):
 > 
http://stackoverflow.com/questions/21212326/floating-point-arithmetic-and-reproducibility

The situation is better in some respects, and worse in others, than what 
is described on stackoverflow. I think the point is made pretty well 
there that you should not be trying to reproduce results beyond machine 
precision. My experience is that you can compare within a fuzz of 1e-14 
usually, even across platforms. (The package setRNG on CRAN has a 
function random.number.test() which is run in the package's tests/ and 
makes uniform and normal comparisons to 1e-14. It has passed checks on 
all R platforms since 2004. Actual, the checks have been done since 
about 1995 but they were part of package dse earlier.)  If you 
accumulate lots of lower order parts (eg sum(simulated - true) in a long 
monte-carlo) then the fuzz may need to get much larger, especially 
comparing across platforms. And you will have trouble with numerically 
unstable calculations. Once-upon-a-time I was annoyed by this, but then 
I realized that it was better not to do unstable calculations.

In addition to not being reproducible beyond machine precision across R 
versions and across platforms, you can really not be guaranteed even on 
the same platform and same version of R. You may get different results 
if you upgrade the OS and there has been a change in the math libraries. 
In my experience this happens rather often. I don't think there is any 
specific 32 vs 64 bit issue, but math libraries sometimes do things a 
bit differently on different processors (eg processor bug fixes) so you 
can occasionally get differences with everything the same except the 
hardware.

On 02/07/2015 10:52 PM, otoomet wrote:
 > It turned out that this is because package "parallel", buried
deep
 > in my dependencies, calls runif() during it's initialization and
 > in this way changes the random number sequence.

Guessing a bit about what you are saying: 1/you set the random seed 
2/you did some things which included loading package parallel 3/you ran 
some things for which you expected to get results comparable to some 
previous run when you did 1/ and 2/ in the reverse order.

If I understand this correctly, I suggest you always do everything 
exactly the same after you set the seed. There are lots of things that 
could generate random numbers without you really knowing. Thus, it is 
usually better to set the seed immediately before you start doing 
anything where you want the seed to have a known state. (There is an 
even better suggestion in the somewhat dated vignette with package setRNG.)

Finally, if you do intend to use parallel sometimes then you have 
additional considerations. You would like to get the same results no 
matter how many machines you are using. This may place some constraints 
on the generators you use, not all are equally easy to use in parallel. 
So if you are hoping to get the same results in parallel as you get on a 
single machine then you better start out using generators on the single 
machine that you will be able to use in parallel.

Paul

Apparently Analagous Threads

Search for more possibly parallel threads

R devel - Feb 2015 - Which function can change RNG state?

[Rd] Which function can change RNG state?

[Rd] Which function can change RNG state?

[Rd] Which function can change RNG state?

[Rd] Which function can change RNG state?

Apparently Analagous Threads