Berry, Charles
2019-Mar-03 20:20 UTC
[Rd] bug: sample( x, size, replace = TRUE, prob= skewed.probs) produces uniform sample
When `length( skewed.probs ) > 200' uniform samples are generated in R-devel. R-3.5.1 behaves as expected. `epsilon` can be a lot bigger than illustrated and still the uniform distribution is produced. Chuck> set.seed(123) > > epsilon <- 1e-10 > > ## uniform to 200 then small > p200 <- prop.table( rep( c(1, epsilon), c(200, 999-200))) > ## uniform to 201 then small > p201 <- prop.table( rep( c(1, epsilon), c(201, 999-201))) > > brks <- c(0,99,199,200,201,Inf) > tab200 <- sample( length(p200), 10000, prob=p200, replace=TRUE) > tab201 <- sample( length(p201), 10000, prob=p201, replace=TRUE) > > cbind(+ s200=table(cut(tab200, brks)), + p200=round(xtabs(p200 ~ cut( seq_along(p200), brks)) * 10000 ,1), + s201=table(cut(tab201, brks )), + p201=round(xtabs(p201 ~ cut( seq_along(p201), brks)) * 10000 ,1)) s200 p200 s201 p201 (0,99] 5017 4950 984 4925.4 (99,199] 4925 5000 959 4975.1 (199,200] 58 50 9 49.8 (200,201] 0 0 6 49.8 (201,Inf] 0 0 8042 0.0> > > > > sessionInfo()R Under development (unstable) (2019-03-02 r76189) Platform: x86_64-apple-darwin18.2.0 (64-bit) Running under: macOS Mojave 10.14.3 Matrix products: default BLAS: /Users/cberry/projects/R/R-devel/lib/libRblas.dylib LAPACK: /Users/cberry/projects/R/R-devel/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_3.6.0>
Tierney, Luke
2019-Mar-03 21:44 UTC
[Rd] bug: sample( x, size, replace = TRUE, prob= skewed.probs) produces uniform sample
Thanks. We'll need to look into how best to address this. Best, luke On Sun, 3 Mar 2019, Berry, Charles wrote:> When `length( skewed.probs ) > 200' uniform samples are generated in R-devel. > > R-3.5.1 behaves as expected. > > `epsilon` can be a lot bigger than illustrated and still the uniform distribution is produced. > > > Chuck > >> set.seed(123) >> >> epsilon <- 1e-10 >> >> ## uniform to 200 then small >> p200 <- prop.table( rep( c(1, epsilon), c(200, 999-200))) >> ## uniform to 201 then small >> p201 <- prop.table( rep( c(1, epsilon), c(201, 999-201))) >> >> brks <- c(0,99,199,200,201,Inf) >> tab200 <- sample( length(p200), 10000, prob=p200, replace=TRUE) >> tab201 <- sample( length(p201), 10000, prob=p201, replace=TRUE) >> >> cbind( > + s200=table(cut(tab200, brks)), > + p200=round(xtabs(p200 ~ cut( seq_along(p200), brks)) * 10000 ,1), > + s201=table(cut(tab201, brks )), > + p201=round(xtabs(p201 ~ cut( seq_along(p201), brks)) * 10000 ,1)) > s200 p200 s201 p201 > (0,99] 5017 4950 984 4925.4 > (99,199] 4925 5000 959 4975.1 > (199,200] 58 50 9 49.8 > (200,201] 0 0 6 49.8 > (201,Inf] 0 0 8042 0.0 >> >> >> >> >> sessionInfo() > R Under development (unstable) (2019-03-02 r76189) > Platform: x86_64-apple-darwin18.2.0 (64-bit) > Running under: macOS Mojave 10.14.3 > > Matrix products: default > BLAS: /Users/cberry/projects/R/R-devel/lib/libRblas.dylib > LAPACK: /Users/cberry/projects/R/R-devel/lib/libRlapack.dylib > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > loaded via a namespace (and not attached): > [1] compiler_3.6.0 >> > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tierney at uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
Tierney, Luke
2019-Mar-06 23:17 UTC
[Rd] bug: sample( x, size, replace = TRUE, prob= skewed.probs) produces uniform sample
This is now fixed in R-devel. Best, luke On Sun, 3 Mar 2019, Berry, Charles wrote:> When `length( skewed.probs ) > 200' uniform samples are generated in R-devel. > > R-3.5.1 behaves as expected. > > `epsilon` can be a lot bigger than illustrated and still the uniform distribution is produced. > > > Chuck > >> set.seed(123) >> >> epsilon <- 1e-10 >> >> ## uniform to 200 then small >> p200 <- prop.table( rep( c(1, epsilon), c(200, 999-200))) >> ## uniform to 201 then small >> p201 <- prop.table( rep( c(1, epsilon), c(201, 999-201))) >> >> brks <- c(0,99,199,200,201,Inf) >> tab200 <- sample( length(p200), 10000, prob=p200, replace=TRUE) >> tab201 <- sample( length(p201), 10000, prob=p201, replace=TRUE) >> >> cbind( > + s200=table(cut(tab200, brks)), > + p200=round(xtabs(p200 ~ cut( seq_along(p200), brks)) * 10000 ,1), > + s201=table(cut(tab201, brks )), > + p201=round(xtabs(p201 ~ cut( seq_along(p201), brks)) * 10000 ,1)) > s200 p200 s201 p201 > (0,99] 5017 4950 984 4925.4 > (99,199] 4925 5000 959 4975.1 > (199,200] 58 50 9 49.8 > (200,201] 0 0 6 49.8 > (201,Inf] 0 0 8042 0.0 >> >> >> >> >> sessionInfo() > R Under development (unstable) (2019-03-02 r76189) > Platform: x86_64-apple-darwin18.2.0 (64-bit) > Running under: macOS Mojave 10.14.3 > > Matrix products: default > BLAS: /Users/cberry/projects/R/R-devel/lib/libRblas.dylib > LAPACK: /Users/cberry/projects/R/R-devel/lib/libRlapack.dylib > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > loaded via a namespace (and not attached): > [1] compiler_3.6.0 >> > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tierney at uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
Apparently Analagous Threads
- Identifying last record in individual growth data over different time intervalls
- nlm's Hessian update method
- Multiple Static Ip''s on a adls connection
- [PATCH] D17497: Support arbitrary address space for intrinsics
- Fwd: [PATCH] D17497: Support arbitrary address space for intrinsics