thr3ads.net - R devel - [Rd] Extreme bunching of random values from runif with Mersenne-Twister seed [Nov 2017]

If this information is useful, please help other people find it:
Share via:

Tirthankar Chakravarty

2017-Nov-03 18:30 UTC

[Rd] Extreme bunching of random values from runif with Mersenne-Twister seed

Bill,

Appreciate the point that both you and Serguei are making, but the sequence
in question is not a selected or filtered set. These are values as observed
in a sequence from a  mechanism described below. The probabilities required
to generate this exact sequence in the wild seem staggering to me.

T

On Fri, Nov 3, 2017 at 11:27 PM, William Dunlap <wdunlap at tibco.com>
wrote:
> Another other generator is subject to the same problem with the same
> probabilitiy.
>
> > Filter(function(s){set.seed(s,
kind="Knuth-TAOCP-2002");runif(1,17,26)>25.99},
> 1:10000)
>  [1]  280  415  826 1372 2224 2544 3270 3594 3809 4116 4236 5018 5692 7043
> 7212 7364 7747 9256 9491 9568 9886
>
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Fri, Nov 3, 2017 at 10:31 AM, Tirthankar Chakravarty <
> tirthankar.lists at gmail.com> wrote:
>
>>
>> Bill,
>>
>> I have clarified this on SO, and I will copy that clarification in
here:
>>
>> "Sure, we tested them on other 8-digit numbers as well & we
could not
>> replicate. However, these are honest-to-goodness numbers generated by a
>> non-adversarial system that has no conception of these numbers being
used
>> for anything other than a unique key for an entity -- these are not a
>> specially constructed edge case. Would be good to know what seeds will
and
>> will not work, and why."
>>
>> These numbers are generated by an application that serves a form, and
>> associates form IDs in a sequence. The application calls our API
depending
>> on the form values entered by users, which in turn calls our R code
that
>> executes some code that needs an RNG. Since the API has to be
stateless, to
>> be able to replicate the results for possible debugging, we need to
draw
>> random numbers in a way that we can replicate the results of the API
>> response -- we use the form ID as seeds.
>>
>> I repeat, there is no design or anything adversarial about the way that
>> these numbers were generated -- the system generating these numbers and
>> the users entering inputs have no conception of our use of an RNG --
this
>> is meant to just be a random sequence of form IDs. This issue was
>> discovered completely by chance when the output of the API was observed
to
>> be highly non-random. It is possible that it is a 1/10^8 chance, but
that
>> is hard to believe, given that the API hit depends on user input. Note
also
>> that the issue goes away when we use a different RNG as mentioned
below.
>>
>> T
>>
>> On Fri, Nov 3, 2017 at 9:58 PM, William Dunlap <wdunlap at
tibco.com> wrote:
>>
>>> The random numbers in a stream initialized with one seed should
have
>>> about the desired distribution.  You don't win by changing the
seed all the
>>> time.  Your seeds caused the first numbers of a bunch of streams to
be
>>> about the same, but the second and subsequent entries in each
stream do
>>> look uniformly distributed.
>>>
>>> You didn't say what your 'upstream process' was, but it
is easy to come
>>> up with seeds that give about the same first value:
>>>
>>> > Filter(function(s){set.seed(s);runif(1,17,26)>25.99},
1:10000)
>>>  [1]  514  532 1951 2631 3974 4068 4229 6092 6432 7264 9090
>>>
>>>
>>>
>>> Bill Dunlap
>>> TIBCO Software
>>> wdunlap tibco.com
>>>
>>> On Fri, Nov 3, 2017 at 12:49 AM, Tirthankar Chakravarty <
>>> tirthankar.lists at gmail.com> wrote:
>>>
>>>> This is cross-posted from SO (https://stackoverflow.com/q/4
>>>> 7079702/1414455),
>>>> but I now feel that this needs someone from R-Devel to help
understand
>>>> why
>>>> this is happening.
>>>>
>>>> We are facing a weird situation in our code when using R's
[`runif`][1]
>>>> and
>>>> setting seed with `set.seed` with the `kind = NULL` option
(which
>>>> resolves,
>>>> unless I am mistaken, to `kind = "default"`; the
default being
>>>> `"Mersenne-Twister"`).
>>>>
>>>> We set the seed using (8 digit) unique IDs generated by an
upstream
>>>> system,
>>>> before calling `runif`:
>>>>
>>>>     seeds = c(
>>>>       "86548915", "86551615",
"86566163", "86577411", "86584144",
>>>>       "86584272", "86620568",
"86724613", "86756002", "86768593",
>>>> "86772411",
>>>>       "86781516", "86794389",
"86805854", "86814600", "86835092",
>>>> "86874179",
>>>>       "86876466", "86901193",
"86987847", "86988080")
>>>>
>>>>     random_values = sapply(seeds, function(x) {
>>>>       set.seed(x)
>>>>       y = runif(1, 17, 26)
>>>>       return(y)
>>>>     })
>>>>
>>>> This gives values that are **extremely** bunched together.
>>>>
>>>>     > summary(random_values)
>>>>        Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>>>>       25.13   25.36   25.66   25.58   25.83   25.94
>>>>
>>>> This behaviour of `runif` goes away when we use `kind
>>>> "Knuth-TAOCP-2002"`, and we get values that appear to
be much more
>>>> evenly
>>>> spread out.
>>>>
>>>>     random_values = sapply(seeds, function(x) {
>>>>       set.seed(x, kind = "Knuth-TAOCP-2002")
>>>>       y = runif(1, 17, 26)
>>>>       return(y)
>>>>     })
>>>>
>>>> *Output omitted.*
>>>>
>>>> ---
>>>>
>>>> **The most interesting thing here is that this does not happen
on
>>>> Windows
>>>> -- only happens on Ubuntu** (`sessionInfo` output for Ubuntu
& Windows
>>>> below).
>>>>
>>>> # Windows output: #
>>>>
>>>>     > seeds = c(
>>>>     +   "86548915", "86551615",
"86566163", "86577411", "86584144",
>>>>     +   "86584272", "86620568",
"86724613", "86756002", "86768593",
>>>> "86772411",
>>>>     +   "86781516", "86794389",
"86805854", "86814600", "86835092",
>>>> "86874179",
>>>>     +   "86876466", "86901193",
"86987847", "86988080")
>>>>     >
>>>>     > random_values = sapply(seeds, function(x) {
>>>>     +   set.seed(x)
>>>>     +   y = runif(1, 17, 26)
>>>>     +   return(y)
>>>>     + })
>>>>     >
>>>>     > summary(random_values)
>>>>        Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>>>>       17.32   20.14   23.00   22.17   24.07   25.90
>>>>
>>>> Can someone help understand what is going on?
>>>>
>>>> Ubuntu
>>>> ------
>>>>
>>>>     R version 3.4.0 (2017-04-21)
>>>>     Platform: x86_64-pc-linux-gnu (64-bit)
>>>>     Running under: Ubuntu 16.04.2 LTS
>>>>
>>>>     Matrix products: default
>>>>     BLAS: /usr/lib/libblas/libblas.so.3.6.0
>>>>     LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
>>>>
>>>>     locale:
>>>>     [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C
>>>>      [3] LC_TIME=en_US.UTF-8           LC_COLLATE=en_US.UTF-8
>>>>      [5] LC_MONETARY=en_US.UTF-8       LC_MESSAGES=en_US.UTF-8
>>>>      [7] LC_PAPER=en_US.UTF-8          LC_NAME=en_US.UTF-8
>>>>      [9] LC_ADDRESS=en_US.UTF-8        LC_TELEPHONE=en_US.UTF-8
>>>>     [11] LC_MEASUREMENT=en_US.UTF-8   
LC_IDENTIFICATION=en_US.UTF-8
>>>>
>>>>     attached base packages:
>>>>     [1] parallel  stats     graphics  grDevices utils    
datasets
>>>> methods   base
>>>>
>>>>     other attached packages:
>>>>     [1] RMySQL_0.10.8               DBI_0.6-1
>>>>      [3] jsonlite_1.4                tidyjson_0.2.2
>>>>      [5] optiRum_0.37.3              lubridate_1.6.0
>>>>      [7] httr_1.2.1                  gdata_2.18.0
>>>>      [9] XLConnect_0.2-12            XLConnectJars_0.2-12
>>>>     [11] data.table_1.10.4           stringr_1.2.0
>>>>     [13] readxl_1.0.0                xlsx_0.5.7
>>>>     [15] xlsxjars_0.6.1              rJava_0.9-8
>>>>     [17] sqldf_0.4-10                RSQLite_1.1-2
>>>>     [19] gsubfn_0.6-6                proto_1.0.0
>>>>     [21] dplyr_0.5.0                 purrr_0.2.4
>>>>     [23] readr_1.1.1                 tidyr_0.6.3
>>>>     [25] tibble_1.3.0                tidyverse_1.1.1
>>>>     [27] rBayesianOptimization_1.1.0 xgboost_0.6-4
>>>>     [29] MLmetrics_1.1.1             caret_6.0-76
>>>>     [31] ROCR_1.0-7                  gplots_3.0.1
>>>>     [33] effects_3.1-2               pROC_1.10.0
>>>>     [35] pscl_1.4.9                  lattice_0.20-35
>>>>     [37] MASS_7.3-47                 ggplot2_2.2.1
>>>>
>>>>     loaded via a namespace (and not attached):
>>>>     [1] splines_3.4.0      foreach_1.4.3      AUC_0.3.0
>>>> modelr_0.1.0
>>>>      [5] gtools_3.5.0       assertthat_0.2.0   stats4_3.4.0
>>>>  cellranger_1.1.0
>>>>      [9] quantreg_5.33      chron_2.3-50       digest_0.6.10
>>>> rvest_0.3.2
>>>>     [13] minqa_1.2.4        colorspace_1.3-2   Matrix_1.2-10
>>>> plyr_1.8.4
>>>>     [17] psych_1.7.3.21     XML_3.98-1.7       broom_0.4.2
>>>> SparseM_1.77
>>>>     [21] haven_1.0.0        scales_0.4.1       lme4_1.1-13
>>>> MatrixModels_0.4-1
>>>>     [25] mgcv_1.8-17        car_2.1-5          nnet_7.3-12
>>>> lazyeval_0.2.0
>>>>     [29] pbkrtest_0.4-7     mnormt_1.5-5       magrittr_1.5
>>>>  memoise_1.0.0
>>>>     [33] nlme_3.1-131       forcats_0.2.0      xml2_1.1.1
>>>>  foreign_0.8-69
>>>>     [37] tools_3.4.0        hms_0.3            munsell_0.4.3
>>>> compiler_3.4.0
>>>>     [41] caTools_1.17.1     rlang_0.1.1        grid_3.4.0
>>>>  nloptr_1.0.4
>>>>     [45] iterators_1.0.8    bitops_1.0-6       tcltk_3.4.0
>>>> gtable_0.2.0
>>>>     [49] ModelMetrics_1.1.0 codetools_0.2-15   reshape2_1.4.2
>>>>  R6_2.2.0
>>>>
>>>>     [53] knitr_1.15.1       KernSmooth_2.23-15 stringi_1.1.5
>>>> Rcpp_0.12.11
>>>>
>>>>
>>>>
>>>> Windows
>>>> -------
>>>>
>>>>     > sessionInfo()
>>>>     R version 3.3.2 (2016-10-31)
>>>>     Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>>     Running under: Windows >= 8 x64 (build 9200)
>>>>
>>>>     locale:
>>>>     [1] LC_COLLATE=English_India.1252 
LC_CTYPE=English_India.1252
>>>> LC_MONETARY=English_India.1252
>>>>     [4] LC_NUMERIC=C                  
LC_TIME=English_India.1252
>>>>
>>>>     attached base packages:
>>>>     [1] graphics  grDevices utils     datasets  grid      stats
>>>>  methods   base
>>>>
>>>>     other attached packages:
>>>>      [1] bindrcpp_0.2         h2o_3.14.0.3        
ggrepel_0.6.5
>>>> eulerr_1.1.0         VennDiagram_1.6.17
>>>>      [6] futile.logger_1.4.3  scales_0.4.1         FinCal_0.6.3
>>>>  xml2_1.0.0           httr_1.3.0
>>>>     [11] wesanderson_0.3.2    wordcloud_2.5       
RColorBrewer_1.1-2
>>>>  htmltools_0.3.6      urltools_1.6.0
>>>>     [16] timevis_0.4          dtplyr_0.0.1         magrittr_1.5
>>>>  shiny_1.0.5          RODBC_1.3-14
>>>>     [21] zoo_1.8-0            sqldf_0.4-10        
RSQLite_1.1-2
>>>> gsubfn_0.6-6         proto_1.0.0
>>>>     [26] gdata_2.17.0         stringr_1.2.0       
XLConnect_0.2-12
>>>>  XLConnectJars_0.2-12 data.table_1.10.4
>>>>     [31] xlsx_0.5.7           xlsxjars_0.6.1       rJava_0.9-8
>>>> readxl_0.1.1         googlesheets_0.2.1
>>>>     [36] jsonlite_1.5         tidyjson_0.2.1      
RMySQL_0.10.9
>>>> RPostgreSQL_0.4-1    DBI_0.5-1
>>>>     [41] dplyr_0.7.2          purrr_0.2.3          readr_1.1.1
>>>> tidyr_0.7.0          tibble_1.3.3
>>>>     [46] ggplot2_2.2.0        tidyverse_1.0.0     
lubridate_1.6.0
>>>>
>>>>     loaded via a namespace (and not attached):
>>>>      [1] gtools_3.5.0         assertthat_0.2.0    
triebeard_0.3.0
>>>> cellranger_1.1.0     yaml_2.1.14
>>>>      [6] slam_0.1-40          lattice_0.20-34      glue_1.1.1
>>>>  chron_2.3-48         digest_0.6.12.1
>>>>     [11] colorspace_1.3-1     httpuv_1.3.5         plyr_1.8.4
>>>>  pkgconfig_2.0.1      xtable_1.8-2
>>>>     [16] lazyeval_0.2.0       mime_0.5            
memoise_1.0.0
>>>> tools_3.3.2          hms_0.3
>>>>     [21] munsell_0.4.3        lambda.r_1.1.9       rlang_0.1.1
>>>> RCurl_1.95-4.8       labeling_0.3
>>>>     [26] bitops_1.0-6         tcltk_3.3.2          gtable_0.2.0
>>>>  reshape2_1.4.2       R6_2.2.0
>>>>     [31] bindr_0.1            futile.options_1.0.0
stringi_1.1.2
>>>> Rcpp_0.12.12.1
>>>>
>>>>   [1]:
http://stat.ethz.ch/R-manual/R-devel/library/stats/html/Unif
>>>> orm.html
>>>>
>>>>         [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>
>>>
>>>
>>
>
	[[alternative HTML version deleted]]

Daniel Nordlund

2017-Nov-05 02:20 UTC

head link

[Rd] Extreme bunching of random values from runif with Mersenne-Twister seed

Tirthankar,

"random number generators" do not produce random numbers.  Any given 
generator produces a fixed sequence of numbers that appear to meet 
various tests of randomness.  By picking a seed you enter that sequence 
in a particular place and subsequent numbers in the sequence appear to 
be unrelated.  There are no guarantees that if YOU pick a SET of seeds 
they won't produce a set of values that are of a similar magnitude.

You can likely solve your problem by following Radford Neal's advice of 
not using the the first number from each seed.  However, you don't need 
to use anything more than the second number.  So, you can modify your 
function as follows:

function(x) {
       set.seed(x, kind = "default")
       y = runif(2, 17, 26)
       return(y[2])
     }

Hope this is helpful,

Dan

-- 
Daniel Nordlund
Port Townsend, WA  USA


On 11/3/2017 11:30 AM, Tirthankar Chakravarty wrote:> Bill,
> 
> Appreciate the point that both you and Serguei are making, but the sequence
> in question is not a selected or filtered set. These are values as observed
> in a sequence from a  mechanism described below. The probabilities required
> to generate this exact sequence in the wild seem staggering to me.
> 
> T
> 
> On Fri, Nov 3, 2017 at 11:27 PM, William Dunlap <wdunlap at
tibco.com> wrote:
> 
>> Another other generator is subject to the same problem with the same
>> probabilitiy.
>>
>>> Filter(function(s){set.seed(s,
kind="Knuth-TAOCP-2002");runif(1,17,26)>25.99},
>> 1:10000)
>>   [1]  280  415  826 1372 2224 2544 3270 3594 3809 4116 4236 5018 5692
7043
>> 7212 7364 7747 9256 9491 9568 9886
>>
>>
>>
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com
>>
>> On Fri, Nov 3, 2017 at 10:31 AM, Tirthankar Chakravarty <
>> tirthankar.lists at gmail.com> wrote:
>>
>>>
>>> Bill,
>>>
>>> I have clarified this on SO, and I will copy that clarification in
here:
>>>
>>> "Sure, we tested them on other 8-digit numbers as well &
we could not
>>> replicate. However, these are honest-to-goodness numbers generated
by a
>>> non-adversarial system that has no conception of these numbers
being used
>>> for anything other than a unique key for an entity -- these are not
a
>>> specially constructed edge case. Would be good to know what seeds
will and
>>> will not work, and why."
>>>
>>> These numbers are generated by an application that serves a form,
and
>>> associates form IDs in a sequence. The application calls our API
depending
>>> on the form values entered by users, which in turn calls our R code
that
>>> executes some code that needs an RNG. Since the API has to be
stateless, to
>>> be able to replicate the results for possible debugging, we need to
draw
>>> random numbers in a way that we can replicate the results of the
API
>>> response -- we use the form ID as seeds.
>>>
>>> I repeat, there is no design or anything adversarial about the way
that
>>> these numbers were generated -- the system generating these numbers
and
>>> the users entering inputs have no conception of our use of an RNG
-- this
>>> is meant to just be a random sequence of form IDs. This issue was
>>> discovered completely by chance when the output of the API was
observed to
>>> be highly non-random. It is possible that it is a 1/10^8 chance,
but that
>>> is hard to believe, given that the API hit depends on user input.
Note also
>>> that the issue goes away when we use a different RNG as mentioned
below.
>>>
>>> T
>>>
>>> On Fri, Nov 3, 2017 at 9:58 PM, William Dunlap <wdunlap at
tibco.com> wrote:
>>>
>>>> The random numbers in a stream initialized with one seed should
have
>>>> about the desired distribution.  You don't win by changing
the seed all the
>>>> time.  Your seeds caused the first numbers of a bunch of
streams to be
>>>> about the same, but the second and subsequent entries in each
stream do
>>>> look uniformly distributed.
>>>>
>>>> You didn't say what your 'upstream process' was,
but it is easy to come
>>>> up with seeds that give about the same first value:
>>>>
>>>>> Filter(function(s){set.seed(s);runif(1,17,26)>25.99},
1:10000)
>>>>   [1]  514  532 1951 2631 3974 4068 4229 6092 6432 7264 9090
>>>>
>>>>
>>>>
>>>> Bill Dunlap
>>>> TIBCO Software
>>>> wdunlap tibco.com
>>>>
>>>> On Fri, Nov 3, 2017 at 12:49 AM, Tirthankar Chakravarty <
>>>> tirthankar.lists at gmail.com> wrote:
>>>>
>>>>> This is cross-posted from SO (https://stackoverflow.com/q/4
>>>>> 7079702/1414455),
>>>>> but I now feel that this needs someone from R-Devel to help
understand
>>>>> why
>>>>> this is happening.
>>>>>
>>>>> We are facing a weird situation in our code when using
R's [`runif`][1]
>>>>> and
>>>>> setting seed with `set.seed` with the `kind = NULL` option
(which
>>>>> resolves,
>>>>> unless I am mistaken, to `kind = "default"`; the
default being
>>>>> `"Mersenne-Twister"`).
>>>>>
>>>>> We set the seed using (8 digit) unique IDs generated by an
upstream
>>>>> system,
>>>>> before calling `runif`:
>>>>>
>>>>>      seeds = c(
>>>>>        "86548915", "86551615",
"86566163", "86577411", "86584144",
>>>>>        "86584272", "86620568",
"86724613", "86756002", "86768593",
>>>>> "86772411",
>>>>>        "86781516", "86794389",
"86805854", "86814600", "86835092",
>>>>> "86874179",
>>>>>        "86876466", "86901193",
"86987847", "86988080")
>>>>>
>>>>>      random_values = sapply(seeds, function(x) {
>>>>>        set.seed(x)
>>>>>        y = runif(1, 17, 26)
>>>>>        return(y)
>>>>>      })
>>>>>
>>>>> This gives values that are **extremely** bunched together.
>>>>>
>>>>>      > summary(random_values)
>>>>>         Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>>>>>        25.13   25.36   25.66   25.58   25.83   25.94
>>>>>
>>>>> This behaviour of `runif` goes away when we use `kind
>>>>> "Knuth-TAOCP-2002"`, and we get values that
appear to be much more
>>>>> evenly
>>>>> spread out.
>>>>>
>>>>>      random_values = sapply(seeds, function(x) {
>>>>>        set.seed(x, kind = "Knuth-TAOCP-2002")
>>>>>        y = runif(1, 17, 26)
>>>>>        return(y)
>>>>>      })
>>>>>
>>>>> *Output omitted.*
>>>>>
>>>>> ---
>>>>>
>>>>> **The most interesting thing here is that this does not
happen on
>>>>> Windows
>>>>> -- only happens on Ubuntu** (`sessionInfo` output for
Ubuntu & Windows
>>>>> below).
>>>>>
>>>>> # Windows output: #
>>>>>
>>>>>      > seeds = c(
>>>>>      +   "86548915", "86551615",
"86566163", "86577411", "86584144",
>>>>>      +   "86584272", "86620568",
"86724613", "86756002", "86768593",
>>>>> "86772411",
>>>>>      +   "86781516", "86794389",
"86805854", "86814600", "86835092",
>>>>> "86874179",
>>>>>      +   "86876466", "86901193",
"86987847", "86988080")
>>>>>      >
>>>>>      > random_values = sapply(seeds, function(x) {
>>>>>      +   set.seed(x)
>>>>>      +   y = runif(1, 17, 26)
>>>>>      +   return(y)
>>>>>      + })
>>>>>      >
>>>>>      > summary(random_values)
>>>>>         Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>>>>>        17.32   20.14   23.00   22.17   24.07   25.90
>>>>>
>>>>> Can someone help understand what is going on?
>>>>>
>>>>> Ubuntu
>>>>> ------
>>>>>
>>>>>      R version 3.4.0 (2017-04-21)
>>>>>      Platform: x86_64-pc-linux-gnu (64-bit)
>>>>>      Running under: Ubuntu 16.04.2 LTS
>>>>>
>>>>>      Matrix products: default
>>>>>      BLAS: /usr/lib/libblas/libblas.so.3.6.0
>>>>>      LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
>>>>>
>>>>>      locale:
>>>>>      [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C
>>>>>       [3] LC_TIME=en_US.UTF-8          
LC_COLLATE=en_US.UTF-8
>>>>>       [5] LC_MONETARY=en_US.UTF-8      
LC_MESSAGES=en_US.UTF-8
>>>>>       [7] LC_PAPER=en_US.UTF-8          LC_NAME=en_US.UTF-8
>>>>>       [9] LC_ADDRESS=en_US.UTF-8       
LC_TELEPHONE=en_US.UTF-8
>>>>>      [11] LC_MEASUREMENT=en_US.UTF-8   
LC_IDENTIFICATION=en_US.UTF-8
>>>>>
>>>>>      attached base packages:
>>>>>      [1] parallel  stats     graphics  grDevices utils    
datasets
>>>>> methods   base
>>>>>
>>>>>      other attached packages:
>>>>>      [1] RMySQL_0.10.8               DBI_0.6-1
>>>>>       [3] jsonlite_1.4                tidyjson_0.2.2
>>>>>       [5] optiRum_0.37.3              lubridate_1.6.0
>>>>>       [7] httr_1.2.1                  gdata_2.18.0
>>>>>       [9] XLConnect_0.2-12            XLConnectJars_0.2-12
>>>>>      [11] data.table_1.10.4           stringr_1.2.0
>>>>>      [13] readxl_1.0.0                xlsx_0.5.7
>>>>>      [15] xlsxjars_0.6.1              rJava_0.9-8
>>>>>      [17] sqldf_0.4-10                RSQLite_1.1-2
>>>>>      [19] gsubfn_0.6-6                proto_1.0.0
>>>>>      [21] dplyr_0.5.0                 purrr_0.2.4
>>>>>      [23] readr_1.1.1                 tidyr_0.6.3
>>>>>      [25] tibble_1.3.0                tidyverse_1.1.1
>>>>>      [27] rBayesianOptimization_1.1.0 xgboost_0.6-4
>>>>>      [29] MLmetrics_1.1.1             caret_6.0-76
>>>>>      [31] ROCR_1.0-7                  gplots_3.0.1
>>>>>      [33] effects_3.1-2               pROC_1.10.0
>>>>>      [35] pscl_1.4.9                  lattice_0.20-35
>>>>>      [37] MASS_7.3-47                 ggplot2_2.2.1
>>>>>
>>>>>      loaded via a namespace (and not attached):
>>>>>      [1] splines_3.4.0      foreach_1.4.3      AUC_0.3.0
>>>>> modelr_0.1.0
>>>>>       [5] gtools_3.5.0       assertthat_0.2.0  
stats4_3.4.0
>>>>>   cellranger_1.1.0
>>>>>       [9] quantreg_5.33      chron_2.3-50      
digest_0.6.10
>>>>> rvest_0.3.2
>>>>>      [13] minqa_1.2.4        colorspace_1.3-2  
Matrix_1.2-10
>>>>> plyr_1.8.4
>>>>>      [17] psych_1.7.3.21     XML_3.98-1.7       broom_0.4.2
>>>>> SparseM_1.77
>>>>>      [21] haven_1.0.0        scales_0.4.1       lme4_1.1-13
>>>>> MatrixModels_0.4-1
>>>>>      [25] mgcv_1.8-17        car_2.1-5          nnet_7.3-12
>>>>> lazyeval_0.2.0
>>>>>      [29] pbkrtest_0.4-7     mnormt_1.5-5      
magrittr_1.5
>>>>>   memoise_1.0.0
>>>>>      [33] nlme_3.1-131       forcats_0.2.0      xml2_1.1.1
>>>>>   foreign_0.8-69
>>>>>      [37] tools_3.4.0        hms_0.3           
munsell_0.4.3
>>>>> compiler_3.4.0
>>>>>      [41] caTools_1.17.1     rlang_0.1.1        grid_3.4.0
>>>>>   nloptr_1.0.4
>>>>>      [45] iterators_1.0.8    bitops_1.0-6       tcltk_3.4.0
>>>>> gtable_0.2.0
>>>>>      [49] ModelMetrics_1.1.0 codetools_0.2-15  
reshape2_1.4.2
>>>>>   R6_2.2.0
>>>>>
>>>>>      [53] knitr_1.15.1       KernSmooth_2.23-15
stringi_1.1.5
>>>>> Rcpp_0.12.11
>>>>>
>>>>>
>>>>>
>>>>> Windows
>>>>> -------
>>>>>
>>>>>      > sessionInfo()
>>>>>      R version 3.3.2 (2016-10-31)
>>>>>      Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>>>      Running under: Windows >= 8 x64 (build 9200)
>>>>>
>>>>>      locale:
>>>>>      [1] LC_COLLATE=English_India.1252 
LC_CTYPE=English_India.1252
>>>>> LC_MONETARY=English_India.1252
>>>>>      [4] LC_NUMERIC=C                  
LC_TIME=English_India.1252
>>>>>
>>>>>      attached base packages:
>>>>>      [1] graphics  grDevices utils     datasets  grid     
stats
>>>>>   methods   base
>>>>>
>>>>>      other attached packages:
>>>>>       [1] bindrcpp_0.2         h2o_3.14.0.3        
ggrepel_0.6.5
>>>>> eulerr_1.1.0         VennDiagram_1.6.17
>>>>>       [6] futile.logger_1.4.3  scales_0.4.1        
FinCal_0.6.3
>>>>>   xml2_1.0.0           httr_1.3.0
>>>>>      [11] wesanderson_0.3.2    wordcloud_2.5       
RColorBrewer_1.1-2
>>>>>   htmltools_0.3.6      urltools_1.6.0
>>>>>      [16] timevis_0.4          dtplyr_0.0.1        
magrittr_1.5
>>>>>   shiny_1.0.5          RODBC_1.3-14
>>>>>      [21] zoo_1.8-0            sqldf_0.4-10        
RSQLite_1.1-2
>>>>> gsubfn_0.6-6         proto_1.0.0
>>>>>      [26] gdata_2.17.0         stringr_1.2.0       
XLConnect_0.2-12
>>>>>   XLConnectJars_0.2-12 data.table_1.10.4
>>>>>      [31] xlsx_0.5.7           xlsxjars_0.6.1      
rJava_0.9-8
>>>>> readxl_0.1.1         googlesheets_0.2.1
>>>>>      [36] jsonlite_1.5         tidyjson_0.2.1      
RMySQL_0.10.9
>>>>> RPostgreSQL_0.4-1    DBI_0.5-1
>>>>>      [41] dplyr_0.7.2          purrr_0.2.3         
readr_1.1.1
>>>>> tidyr_0.7.0          tibble_1.3.3
>>>>>      [46] ggplot2_2.2.0        tidyverse_1.0.0     
lubridate_1.6.0
>>>>>
>>>>>      loaded via a namespace (and not attached):
>>>>>       [1] gtools_3.5.0         assertthat_0.2.0    
triebeard_0.3.0
>>>>> cellranger_1.1.0     yaml_2.1.14
>>>>>       [6] slam_0.1-40          lattice_0.20-34     
glue_1.1.1
>>>>>   chron_2.3-48         digest_0.6.12.1
>>>>>      [11] colorspace_1.3-1     httpuv_1.3.5        
plyr_1.8.4
>>>>>   pkgconfig_2.0.1      xtable_1.8-2
>>>>>      [16] lazyeval_0.2.0       mime_0.5            
memoise_1.0.0
>>>>> tools_3.3.2          hms_0.3
>>>>>      [21] munsell_0.4.3        lambda.r_1.1.9      
rlang_0.1.1
>>>>> RCurl_1.95-4.8       labeling_0.3
>>>>>      [26] bitops_1.0-6         tcltk_3.3.2         
gtable_0.2.0
>>>>>   reshape2_1.4.2       R6_2.2.0
>>>>>      [31] bindr_0.1            futile.options_1.0.0
stringi_1.1.2
>>>>> Rcpp_0.12.12.1
>>>>>
>>>>>    [1]:
http://stat.ethz.ch/R-manual/R-devel/library/stats/html/Unif
>>>>> orm.html
>>>>>
>>>>>          [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> R-devel at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>>
>>>>
>>>>
>>>
>>
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

Duncan Murdoch

2017-Nov-05 14:17 UTC

head link

[Rd] Extreme bunching of random values from runif with Mersenne-Twister seed

On 04/11/2017 10:20 PM, Daniel Nordlund wrote:> Tirthankar,
> 
> "random number generators" do not produce random numbers.  Any
given
> generator produces a fixed sequence of numbers that appear to meet
> various tests of randomness.  By picking a seed you enter that sequence
> in a particular place and subsequent numbers in the sequence appear to
> be unrelated.  There are no guarantees that if YOU pick a SET of seeds
> they won't produce a set of values that are of a similar magnitude.
> 
> You can likely solve your problem by following Radford Neal's advice of
> not using the the first number from each seed.  However, you don't need
> to use anything more than the second number.  So, you can modify your
> function as follows:
> 
> function(x) {
>         set.seed(x, kind = "default")
>         y = runif(2, 17, 26)
>         return(y[2])
>       }
> 
> Hope this is helpful,
That's assuming that the chosen seeds are unrelated to the function 
output, which seems unlikely on the face of it.  You can certainly 
choose a set of seeds that give high values on the second draw just as 
easily as you can choose seeds that give high draws on the first draw.

The interesting thing about this problem is that Tirthankar doesn't 
believe that the seed selection process is aware of the function output. 
  I would say that it must be, and he should be investigating how that 
happens if he is worried about the output, he shouldn't be worrying 
about R's RNG.

Duncan Murdoch

Maybe Matching Threads

Search for more possibly parallel threads

R devel - Nov 2017 - Extreme bunching of random values from runif with Mersenne-Twister seed

[Rd] Extreme bunching of random values from runif with Mersenne-Twister seed

[Rd] Extreme bunching of random values from runif with Mersenne-Twister seed

[Rd] Extreme bunching of random values from runif with Mersenne-Twister seed

Maybe Matching Threads