On 08/30/2016 06:29 PM, Duncan Murdoch wrote:> I don't see evidence of a bug. There have been several versions of the > MT; we may be using a different version than you are. Ours is the > 1999/10/28 version; the web page you cite uses one from 2002. > > Perhaps the newer version fixes some problems, and then it would be > worth considering a change. But changing the default RNG definitely > introduces problems in reproducibility,Well "problems in reproducibility" is a bit vague. Results would always be reproducible by specifying kind="Mersenne-Twister" or kind="Buggy Kinderman-Ramage" for older results, so there is no problem reproducing results. The only problem is that users expecting to reproduce results twenty years later will need to know what random generator they used. (BTW, they may also need to record information about the normal or other generator, as well as the seed.) Of course, these changes are recorded pretty well for R, so the history of "default" can always be found. I think it is a mistake to encourage users into thinking they do not need to keep track of some information if they want reproducibility. Perhaps the default should be changed more often in order to encourage better user habits. More seriously, I think "default" should continue to be something that is currently considered to be good. So, if there really is a known problem, then I think "default" should be changed. (And, no I did not get burned by the R 1.7.0 change in the default generator. I got burned by a much earlier, unadvertised, and more subtle change in the Splus generator.) Paul Gilbert so it's not obvious that we> would do it. > > Duncan Murdoch > > > On 30/08/2016 5:45 PM, Mark Roberts wrote: >> Whomever, >> >> I recently sent the "bug report" below toR-core at r-project.org and have >> just been asked to instead submit it to you. >> >> Although I am basically not an R user, I have installed version 3.3.1 >> and am also the author of a statistics program written in Visual Basic >> that contains a component which correctly implements the Mersenne >> Twister (MT) algorithm. I believe that it is not possible to generate >> the correct stream of pseudorandom numbers using the MT default random >> number generator in R, and am not the first person to notice this. Here >> is a posted 2013 entry >> (www.r-bloggers.com/reproducibility-and-randomness/) on an R website >> that asserts that the SAS computer program implementation of the MT >> algorithm produces different numbers than R does when using the same >> starting seed number. The author of this post didn?t get anyone to >> respond to his query about the reason for this SAS vs. R discrepancy. >> >> There are two ways of initializing the original MT computer program >> (written in C) so that an identical stream of numbers can be repeatedly >> generated: 1) with a particular integer seed number, and 2) with a >> particular array of integers. In the 'compilation and usage' section >> of this webpage (https://github.com/cslarsen/mersenne-twister) there is >> a listing of the first 200 random numbers the MT algorithm should >> produce for seed number = 1. The inventors of the Mersenne Twister >> random number generator provided two different sets of the first 1000 >> numbers produced by a correctly coded 32-bit implementation of the MT >> algorithm when initializing it with a particular array of integers at: >> www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/CODES/mt19937ar.out. >> [There is a link to this output at: >> www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/emt19937ar.html.] >> >> My statistics program obtains exactly those 200 numbers from the first >> site mentioned in the previous paragraph and also obtains those same >> numbers from the second website (though I didn't check all 2000 values). >> Assuming that the MT code within R uses the 32-bit MT algorithm, I >> suspect that the current version of R can't do that. If you (i.e., >> anyone who might knowledgeably respond to this report) is able to >> duplicate those reference test-values, then please send me the R code to >> initialize the MT code within R to successfully do that, and I apologize >> for having wasted your time. If you (collectively) can't do that, then R >> is very likely using incorrectly implemented MT code. And if this >> latter possibility is true, it seems to me that this is something that >> should be fixed. >> >> Mark Roberts, Ph.D. >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
I wonder how useful a (set of?) "time machine" functions which look up /infer things like this based on a date would be. Could ease the pain of changes generally, though not remove it completely. ~G On Wed, Aug 31, 2016 at 5:45 PM, Paul Gilbert <pgilbert902 at gmail.com> wrote:> > > On 08/30/2016 06:29 PM, Duncan Murdoch wrote: > >> I don't see evidence of a bug. There have been several versions of the >> MT; we may be using a different version than you are. Ours is the >> 1999/10/28 version; the web page you cite uses one from 2002. >> >> Perhaps the newer version fixes some problems, and then it would be >> worth considering a change. But changing the default RNG definitely >> introduces problems in reproducibility, >> > > Well "problems in reproducibility" is a bit vague. Results would always be > reproducible by specifying kind="Mersenne-Twister" or kind="Buggy > Kinderman-Ramage" for older results, so there is no problem reproducing > results. The only problem is that users expecting to reproduce results > twenty years later will need to know what random generator they used. (BTW, > they may also need to record information about the normal or other > generator, as well as the seed.) Of course, these changes are recorded > pretty well for R, so the history of "default" can always be found. > > I think it is a mistake to encourage users into thinking they do not need > to keep track of some information if they want reproducibility. Perhaps the > default should be changed more often in order to encourage better user > habits. > > More seriously, I think "default" should continue to be something that is > currently considered to be good. So, if there really is a known problem, > then I think "default" should be changed. > > (And, no I did not get burned by the R 1.7.0 change in the default > generator. I got burned by a much earlier, unadvertised, and more subtle > change in the Splus generator.) > > Paul Gilbert > > > so it's not obvious that we > >> would do it. >> >> Duncan Murdoch >> >> >> On 30/08/2016 5:45 PM, Mark Roberts wrote: >> >>> Whomever, >>> >>> I recently sent the "bug report" below toR-core at r-project.org and have >>> just been asked to instead submit it to you. >>> >>> Although I am basically not an R user, I have installed version 3.3.1 >>> and am also the author of a statistics program written in Visual Basic >>> that contains a component which correctly implements the Mersenne >>> Twister (MT) algorithm. I believe that it is not possible to generate >>> the correct stream of pseudorandom numbers using the MT default random >>> number generator in R, and am not the first person to notice this. Here >>> is a posted 2013 entry >>> (www.r-bloggers.com/reproducibility-and-randomness/) on an R website >>> that asserts that the SAS computer program implementation of the MT >>> algorithm produces different numbers than R does when using the same >>> starting seed number. The author of this post didn?t get anyone to >>> respond to his query about the reason for this SAS vs. R discrepancy. >>> >>> There are two ways of initializing the original MT computer program >>> (written in C) so that an identical stream of numbers can be repeatedly >>> generated: 1) with a particular integer seed number, and 2) with a >>> particular array of integers. In the 'compilation and usage' section >>> of this webpage (https://github.com/cslarsen/mersenne-twister) there is >>> a listing of the first 200 random numbers the MT algorithm should >>> produce for seed number = 1. The inventors of the Mersenne Twister >>> random number generator provided two different sets of the first 1000 >>> numbers produced by a correctly coded 32-bit implementation of the MT >>> algorithm when initializing it with a particular array of integers at: >>> www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/CODES/mt19937ar.out. >>> [There is a link to this output at: >>> www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/emt19937ar.html.] >>> >>> My statistics program obtains exactly those 200 numbers from the first >>> site mentioned in the previous paragraph and also obtains those same >>> numbers from the second website (though I didn't check all 2000 values). >>> Assuming that the MT code within R uses the 32-bit MT algorithm, I >>> suspect that the current version of R can't do that. If you (i.e., >>> anyone who might knowledgeably respond to this report) is able to >>> duplicate those reference test-values, then please send me the R code to >>> initialize the MT code within R to successfully do that, and I apologize >>> for having wasted your time. If you (collectively) can't do that, then R >>> is very likely using incorrectly implemented MT code. And if this >>> latter possibility is true, it seems to me that this is something that >>> should be fixed. >>> >>> Mark Roberts, Ph.D. >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-devel at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> >>> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Gabriel Becker, PhD Associate Scientist (Bioinformatics) Genentech Research [[alternative HTML version deleted]]
>>>>> Gabriel Becker <gmbecker at ucdavis.edu> >>>>> on Thu, 1 Sep 2016 08:34:31 -0700 writes:> I wonder how useful a (set of?) "time machine" functions > which look up /infer things like this based on a date > would be. Could ease the pain of changes generally, though > not remove it completely. Such a set (possibly of size one) may be quite useful, notably if it got an intuitive interface. I'd recommend to partly follow options() here, i.e., the oc <- compatibilityR("2000-02-29") would set random number generators (and other changeable defaults) to those that were in effect when R 1.0.0 was released, *and* a later call compatibilityR (oc) # reset to previous state would do what the comment says. > On Wed, Aug 31, 2016 at 5:45 PM, Paul Gilbert > <pgilbert902 at gmail.com> wrote: >> >> >> On 08/30/2016 06:29 PM, Duncan Murdoch wrote: >> >>> I don't see evidence of a bug. There have been several >>> versions of the MT; we may be using a different version >>> than you are. Ours is the 1999/10/28 version; the web >>> page you cite uses one from 2002. >>> >>> Perhaps the newer version fixes some problems, and then >>> it would be worth considering a change. But changing >>> the default RNG definitely introduces problems in >>> reproducibility, >>> >> >> Well "problems in reproducibility" is a bit >> vague. Results would always be reproducible by specifying >> kind="Mersenne-Twister" or kind="Buggy Kinderman-Ramage" >> for older results, so there is no problem reproducing >> results. The only problem is that users expecting to >> reproduce results twenty years later will need to know >> what random generator they used. (BTW, they may also need >> to record information about the normal or other >> generator, as well as the seed.) Of course, these changes >> are recorded pretty well for R, so the history of >> "default" can always be found. >> >> I think it is a mistake to encourage users into thinking >> they do not need to keep track of some information if >> they want reproducibility. Perhaps the default should be >> changed more often in order to encourage better user >> habits. >> >> More seriously, I think "default" should continue to be >> something that is currently considered to be good. So, if >> there really is a known problem, then I think "default" >> should be changed. >> >> (And, no I did not get burned by the R 1.7.0 change in >> the default generator. I got burned by a much earlier, >> unadvertised, and more subtle change in the Splus >> generator.) >> >> Paul Gilbert >> >> >> so it's not obvious that we >> >>> would do it. >>> >>> Duncan Murdoch >>> >>> >>> On 30/08/2016 5:45 PM, Mark Roberts wrote: >>> >>>> Whomever, >>>> >>>> I recently sent the "bug report" below >>>> toR-core at r-project.org and have just been asked to >>>> instead submit it to you. >>>> >>>> Although I am basically not an R user, I have installed >>>> version 3.3.1 and am also the author of a statistics >>>> program written in Visual Basic that contains a >>>> component which correctly implements the Mersenne >>>> Twister (MT) algorithm. I believe that it is not >>>> possible to generate the correct stream of pseudorandom >>>> numbers using the MT default random number generator in >>>> R, and am not the first person to notice this. Here is >>>> a posted 2013 entry >>>> (www.r-bloggers.com/reproducibility-and-randomness/) on >>>> an R website that asserts that the SAS computer program >>>> implementation of the MT algorithm produces different >>>> numbers than R does when using the same starting seed >>>> number. The author of this post didn?t get anyone to >>>> respond to his query about the reason for this SAS >>>> vs. R discrepancy. >>>> >>>> There are two ways of initializing the original MT >>>> computer program (written in C) so that an identical >>>> stream of numbers can be repeatedly generated: 1) with >>>> a particular integer seed number, and 2) with a >>>> particular array of integers. In the 'compilation and >>>> usage' section of this webpage >>>> (https://github.com/cslarsen/mersenne-twister) there is >>>> a listing of the first 200 random numbers the MT >>>> algorithm should produce for seed number = 1. The >>>> inventors of the Mersenne Twister random number >>>> generator provided two different sets of the first 1000 >>>> numbers produced by a correctly coded 32-bit >>>> implementation of the MT algorithm when initializing it >>>> with a particular array of integers at: >>>> www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/CODES/mt19937ar.out. >>>> [There is a link to this output at: >>>> www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/emt19937ar.html.] >>>> >>>> My statistics program obtains exactly those 200 numbers >>>> from the first site mentioned in the previous paragraph >>>> and also obtains those same numbers from the second >>>> website (though I didn't check all 2000 values). >>>> Assuming that the MT code within R uses the 32-bit MT >>>> algorithm, I suspect that the current version of R >>>> can't do that. If you (i.e., anyone who might >>>> knowledgeably respond to this report) is able to >>>> duplicate those reference test-values, then please send >>>> me the R code to initialize the MT code within R to >>>> successfully do that, and I apologize for having wasted >>>> your time. If you (collectively) can't do that, then R >>>> is very likely using incorrectly implemented MT code. >>>> And if this latter possibility is true, it seems to me >>>> that this is something that should be fixed. >>>> >>>> Mark Roberts, Ph.D. >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-devel at r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-devel >>>> >>>> >>> ______________________________________________ >>> R-devel at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > -- > Gabriel Becker, PhD Associate Scientist (Bioinformatics) > Genentech Research > [[alternative HTML version deleted]] > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel